# Recognition of Motor Imagery Electroencephalography Using Independent Component Analysis and Machine Classifiers.

# Figures

Recognition of Motor Imagery Electroencephalography Using

Independent Component Analysis and Machine Classifiers

Chih-I Hung

1,4

, Po-Lei Lee

4

, Yu-Te Wu

1,4,*

, Hui-Yun Chen

1,4

, Li-Fen Chen

3,4

Tzu-Chen Yeh

2,3,4

, Jen-Chuen Hsieh

2,3,4

1

Institute of Radiological Sciences,

2

Institute

of Neuroscience,

3

Center for Neuroscience

National Yang-Ming University

No.155, Sec. 2, Linong St., Beitou District,

112, Taipei, Taiwan,

4

Integrated Brain Research Laboratory, Dept. of

Medical Research and Education,

Taipei Veterans General Hospital,

No.201, Sec. 2, Shihpai Rd., Beitou District

112, Taipei, Taiwan

e-mail : runtothewater@pie.com.tw; pllee2@vghtpe.gov.tw; ytwu@ym.edu.tw;

airrb@pchome.com.tw;

lfchen3@vghtpe.gov.tw; tcyeh@vghtpe.gov.tw; jchsieh@vghtpe.gov.tw

ABSTRACT

Motor imagery electroencephalography (EEG), which embodies cortical potentials during mental simulation of

left or right finger lifting tasks, can be used as neural input signals to activate brain computer interface (BCI).

The effectiveness of such an EEG-based BCI system relies on two indispensable features: distinguishable

patterns of brain signals and accurate classifiers. This work aims to extract a reliable neural feature, termed as

beta rebound map, out of motor imagery EEG by means of independent component analysis, and employ four

classifiers to investigate the efficacy of beta rebound map. Results demonstrated that, with the use of ICA, the

recognition rates of four classifiers, linear discriminant analysis (LDA), back-propagation neural network (BP-

NN), radial-basis function neural network (RBF-NN), and support vector machine (SVM) improved

significantly from 54%, 54%, 57.3% and 55% to 69.8%, 75.5%, 76.5% and 77.3%, respectively. In addition,

the areas under the ROC curve, which assess the quality of classification over a wide range of misclassification

costs, also improved greatly from .65, .60, .62, and .64 to .78, .73, .77 and .75, respectively.

Keywords

Electroencephalography (EEG), Independent component analysis (ICA), brain computer interface (BCI), beta

rebound, linear discriminant analysis (LDA), back-propagation neural network (BP-NN), radial-basis function

neural network (RBF-NN), support vector machine (SVM)

1. INTRODUCTION

In recent years, great progress in neuroscience has

inspired studies in developing brain computer

interface (BCI) [Mul99a] [Pfu98a] [Pfu00a] [Pol98a],

a novel technique in assisting people to communicate

with external environments or trigger surrounding

devices by means of their brain signals. These

systems are particularly useful for ones who suffer

from amyotrophic lateral sclerosis or locked-in

syndrome and are unable to produce any motor

activity. Their cognition or sensor functions,

however, may be intact so that they can be trained to

perform mental tasks, for example, in simulating

right or left hand or foot movements without any

overt motor output. The success of BCI systems

relies on two integral parts: distinguishable neural

patterns and effective classifiers. This work aims to

extract a reliably distinguishable feature from the

motor imagery EEG recording by means of

independent component analysis and employ

machine classifiers to investigate the efficacy of

extracted pattern.

Permission to make digital or hard copies of all or part of

this work for personal or classroom use is granted without

fee provided that copies are not made or distributed for

profit or commercial advantage and that copies bear this

notice and the full citation on the first page. To copy

otherwise, or republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee.

WSCG’2004, February 2-6, 2004, Plzen, Czech Republic.

Copyright UNION Agency – Science Press

It has been pointed out that imagination of hand

movement elicits rhythmic EEG patterns in the

primary sensorimotor areas similar to that from a real

hand movement [Pfu96a]. When a specific

movement or imagined movement is performed, it

composes of three phases: planning, execution and

recovery. The planning and execution results in

localized alpha and lower beta bands amplitude

attenuation or event-related desynchronization (ERD)

which can be viewed as an EEG correlate of an

activated cortical motor network, while the recovery

phase produces focal mu and beta amplitude

enhancement or event-related synchronization (ERS)

which may reflect deactivation/inhibition in the

underlying cortical network.

Several BCI systems have been proposed based on

the induced ERD when subjects performed imagery

hand or foot movements [Pfu98a] [Pfu00a].

Pfurtscheller et al. used a learning vector

quantization to classified ERD signals on-line in a

subject specific band which was determined by

distinctive sensitive learning vector quantization.

They also adopted adaptive autoregressive model to

analyzed ERD signal off-line and applied linear

discrimination analysis to improve the detection of

imagined left and right hand movements. The

reported error rates varied 5.8 and 32.8%. Muller-

Gerking et al. applied common spatial filter to detect

real (not imagined) left, right hand or right foot

movements in single trial and reported 84%, 90%

and 94% accuracies for three subjects, respectively

[Mul99a].

Although the ERD elicited by imagined movement

has been extensively used as a feature pattern in BCI

systems, we have observed that not every subject can

produce discernible ERD during the imagery

movement, whereas the beta ERS was persistently

appeared for each subject. This motivated us to adopt

ERS, rather than the ERD, as the feature pattern.

The peaked ERS of imaged left or right hand

movement, referred to as beta rebound, exhibits on

bilateral sensorimotor areas but with distinct patterns.

When the imagination of right hand movement is

executed, the beta rebound over left hemisphere

produces stronger amplitude than that on the right

hemisphere, and the vice versa.

The recorded EEG signals were inevitably

contaminated by system noise, artifacts, spontaneous

EEG, etc. Following our previous works for

MEG/EEG de-noise [Lee03a], we employed the

Independent Component Analysis (ICA) technique to

decompose each pre-processed epoch into a set of

temporally independent components along with

corresponding spatial maps, and selected the task-

related components by matching designed spatial

templates with the decomposed spatial maps. As a

result, the signal-to-noise ratio of each EEG single-

trial was improved, which lead to the promotion of

classifiers’ performance.

This paper is organized as follows. Section 2 reports

our experimental paradigm for motor imagery task

and EEG recording configuration. Section 3 presents

the extracted features, with and without applying

ICA, based on peaked beat ERS and termed as beta

rebound maps. Section 4 reviews four classifiers in

this study. Section 5 summarizes the classification

results and Section 6 concludes this study.

2. EXPERIMENTAL PARADIGM FOR

MOTOR IMAGERY

Four right-handed healthy subjects (two males and

two females), aged between 20 and 28, participated

in this study. Each subject was naive to the

experiment and trained only twenty minutes prior to

the first session. During each session, the subject was

asked to perform 100 trials of imagery right index

finger lifting, followed by another 100 trials of

imagery left index finger lifting. The length of each

trial was ten seconds. Each trial began with one-

second presentation of random noise during which

subjects were allowed to blink his/her eyes (A in Fig.

1). The subject was then instructed to stare at the

fixation cross in the center of the monitor from 2s

and started to image right or left index finger lifting

right after he/she heard an acoustic cue “beep” (with

frequency 1k Hz and 10ms duration) at 5s (B in Fig.

1.). The inter-stimulus interval was 10 second.

Figure 1. Timing of two consecutive trials of the

motor imagery task.

A 64-channels electroencephalography (EEG) 10-20

system (with an electro-cap) was used to record the

cortical potentials. The configuration of standard 1-

20 system is shown in Fig. 2. The vertical and

horizontal electro-oculograms (VEOG and HEOG)

were applied to reject bad epochs induced by eye

blinking during the recording. The data were

digitized at 250 Hz. Since we focused on beta-

activities, the signals were further bandpass-filtered

with 6-50 Hz to remove the dc drifts and 60 Hz noise.

0

123456789

10 11

AB

beep

C

A

B

beep

12 13 13 14 15 16 17 18 19 s

Throughout the recordings, the surface

electromyogram (EMG) was monitored from the m.

extensor digitorum communis (digitized at 2 KHz)

for the detection of motion status. Data of four

sessions were collected for each subject. Signals

from 3s to 10s (C in Fig. 1.) in each trial (excluding

bad epochs) were extracted for further classifiers

training and testing. Figure 3 exhibits such a pre-

processed epoch from sensorimotor area (channel C3

in 10-20 system).

Figure 2. The configure of standard 10-20 system

with 64 channels.

Figure 3. A pre-processed epoch recorded at C3.

3. FEATURE EXTRACTION WITH

AND WITHOUT ICA

Extraction of reliable feature from measured data is

vital in facilitating the subsequent classification

procedure. Since the measured signals were

inevitably contaminated by system noise, artifacts,

spontaneous EEG, etc., we employed the ICA

technique to decompose each pre-processed epoch

into a set of temporally independent components

along with corresponding spatial maps, and selected

the task-related components by matching designed

spatial templates with the decomposed spatial maps.

Two types of feature, one using ICA to extract task-

related components and the other without using ICA,

were created from pre-processed data for the purpose

of comparison with their efficacies. The detailed

steps for feature extraction with ICA were described

in the following:

Step 1: Signal decomposition by using ICA. We

first arranged each pre-processed epoch across m

channels (m=62) and n sampled points (n=1750) into

an

nm

×

matrix X. The i

th

row contains the

observed signal from i

th

EEG channel, and the j

th

column vector contains the observed samples at the

j

th

time point across all channels. In the present study,

all calculations were performed using the FastICA

algorithm [Cov65a] [Cov98a]. The FastICA

technique first removed means of the row vectors in

the X matrix followed by a whitening procedure to

transform the covariance matrix of the zero-mean

data into an identity matrix. The whitening process

was implemented using the Principal Component

Analysis. Only the first N most significant

eigenvectors (N=15 in our analysis) were preserved

in the subsequent ICA calculation. In the next step,

FastICA searched a matrix to further separate the

whitened data into a set of components which were

as mutually independent as possible. Combining with

previous whitening process, the matrix X can be

transformed into a matrix S via an un-mixing matrix

W, i.e.,

WXS

=

(1)

in which the rows of S were mutually independent.

Each column of

1−

W , i.e. mixing matrix, represents a

spatial map describing the relative projection weights

of the corresponding temporal components at each of

the EEG channels. They will be referred to as IC

spatial maps henceforth. Figure 4 shows 12 IC spatial

maps of 12 independent components (not shown)

decomposed from a single-trial imagery right hand

movement. The maps IC3, IC5, IC7 and IC9 were

highly related to motor imagery task and categorized

as task-related components, while the IC4 and IC6

maps were associated with the occipital alpha rhythm,

and IC1 map was the noise emanated from a bad

channel.

Step 2: Correlating the IC spatial maps with pre-

defined spatial templates to select task-related

components. Since the motor imagery task elicits

Nasion

fpz

cz

oz

pz

c4

t8

t7

c3

p4

p6

p3

p5

fp2 fp1

o2

o1

Inion

ground

af4 af

af8

af7

f

f4 f3

f7

f8

f1

f5

f2

f6

fcz fc

ft8ft

fcfc5 fc1 fc2 fc6

c5

c1

c2

c6

cpz

cp4

tp8tp7

cp3 cp5

cp1

cp2

cp6

p1 p2

p7 p8

poz

po6

po8

po3

po7

po3 po4

ref.2ref.1

bilateral activation in the vicinity of sensorimotor

areas, four spatial patterns encompassing C3, C4, Cz

and both C3 and C4 areas, respectively, were

considered as spatial templates (see Fig. 5) in

selecting the task-related spatial maps. Please note

that four spatial templates rather than single template

covering C3 and C4 were taken into account because

the task-related activities can be separated by ICA

and exhibited in multiple IC spatial maps. Each

template was correlated with 12 IC spatial maps of

single trial and the bets two matches were selected.

For example, the spatial maps IC3, IC5, IC7 and IC9

in Figure 4 were selected automatically due to their

high similarity. The task-related IC spatial maps as

well as the corresponding temporal components were

used to reconstruct the signal X by means of equation

(1).

Figure 4. The normalized IC spatial maps of a

single-trial imagery right hand movement.

Figure 5. Spatial templates used to select task-related

IC spatial maps.

Step 3: Computing the envelopes of beta

reactivity from reconstructed signals using the

Amplitude Modulation method. The optimal beta

frequency band encompassing the prominent and

relevant brain activities may vary across subjects and

subjects. To tackle this problem, we divided the beta

band, into five sub-frequency bands, 8~12, 12~16,

16~20, 20~24, and 24~ 28 Hz, and used them with

additional beta band 8~30 Hz to band-pass filter the

reconstructed signals. The Amplitude Modulation

(AM) method based on the Hilbert transform was

applied to detect the envelope of the filtered EEG

signals and quantify the event-related oscillatory

activities [Clo96a]. Each envelope, referred to as AM

waveform, was computed by (see Figure 6 (a))

22

))(()()( tMHtMtm

BPBP

+= (2)

where

)(tM

BP

is the single-trial band-passed EEG

signal, and

))(( tMH

BP

is its Hilbert transform.

Contrary to the classical measurement of ERS

reactivity and the original AM approach in which a

relative percentage as indexed to the initial baseline

was used [Clo96a], we computed the beta ERS

reactivity (termed as beta rebound) using the

amplitude difference between the maximum values

of beta ERD and beta ERS of the AM envelope.

Step 4: Extracting the beta rebound maps. The

imagery finger lifting task, similar to real finger

movement, induced larger beta rebound in the

contralateral sensorimotor area than that in the

ipsilateral one. In addition, the contralateral beta

rebound appeared earlier than the ipsilateral one. The

co-existence of prominent beta rebounds at C3 and

C4 and the constrained time lag between them

suggested that the topographical maps with

maximum rebounds at C3 and C4 were reliable

features. Specifically, we looked for two time points

at which both the AM waveforms of C3 and C4 have

maximum peaks but with time lag (

T

∆

in Figure

6(a)) less than 0.5 second. The topographical maps at

these two time points, referred to as beta rebound

maps (Figure 6. (c)), were concatenated into a

1124

×

column vector and used as a feature vector.

Using the same time points of the peaked beta

rebound resulted from steps1 ~ 4, we processed the

data using step 3 only, i.e. without using ICA. Figure

7 depicts the extracted beta rebound maps and

appears to be contaminated due to noise compared

with those in Fig. 6.

4. TWO-CLASS SUPERVISED

CLASIFICATION

In this section, four two-category classifiers used in

our study are briefly reviewed. They were linear

discriminant analysis (LDA), back-propagation

neural network (BP-NN), radial basis function

network (RBF-NN) and support vector machine

(SVM). The beta rebound maps, denoted by

i

x

v

, of

imagery right and left hand movement, each of them

is a

1124

×

column vector and, were divided into two

data sets, one for training and the other for testing the

classifiers. The numbers of beta rebound maps used

in the training and testing phases for each subject at

each session were 60 and 30. These beta rebound

C3 C4 C3 & C4 Cz

0

1

0

0

0

1

1

1

IC1 IC2 IC3 IC4

IC5 IC6 IC7 IC8

IC9 IC10

IC11 IC12

Imagery right hand movement

maps were randomized before being used. For the

sake of simplicity, we use the notation R and L to

denote the category of imagery right and left hand

movement, respectively, in the following discussion.

Figure 6. Computation of the beta rebound maps. (a)

The AM waveform of C3 and C4.

T

∆ was the time

lag between prominent beta rebounds at C3 and C4.

(b) Reconstructed signals of 62 channels (excluded

HEOG and VEOG) which were used to calculate the

AM waveforms in (a). (c) The beta rebound maps

created from reconstructed signals on 62 channels

indexed to the time points of peaked beta rebounds at

C3 and C4.

Figure 7. The computed beta rebound maps only

using steps 3 without applying ICA.

Classifiers

4.1.1 LDA

The idea of LDA is to seek a vector

w

r

so that two

projected clusters of R and L feature vectors

i

x

v

’s

on

w

r

can be well separated from each other while

keeping small variance of each cluster. This can be

done by maximizing the so-called Fisher’s criterion

wwS

wSw

wJ

w

b

'

)(

=

with respect to

w

r

, where

b

S is the between-class

scatter matrix:

)')((

LRLRb

mmmmS

−

−

=

and

w

S is the within-class scatter matrix:

∑

∑

∈∈

−−+−−=

Lx

LL

Rx

RRw

mxmxmxmxS )')(()')((

r

r

r

r

in which two summations run over all the training

samples of classes

R and L , respectively, and

R

m

and

L

m represent the group mean of classes R and L,

respectively. The optimal

w

r

is the eigenverctor

corresponding to the largest eigenvalue of

Bw

SS

1−

.

After

w

r

is obtained by means of the training data, we

projected the test samples on it, and then classified

the projected points by the k-nearest-neighbor

decision rule.

4.1.2 BP-NN

The BP-NN was trained in a supervised manner

based on the error-correction learning rule. The

hierarchy of a BPNN in our implementation is

depicted in Figure 8, which consists of one input

layer, one hidden layer, and one output layer. The

training phase was accomplished by iterating two

passes: the forward and backward passes. In the

forward pass of the back-propagation learning, as

show in the Figure 8, the output of the BP-NN at

iteration n was computed by

))(()( nvny

ϕ

=

where

)(

⋅

ϕ

was the activation function and )(nv was

the induced local field of output neuron

∑

=

=

m

i

ii

nonwnv

1

)()()(

in which m was the total number of the inputs

applied to output neuron,

i

w was the weight

connecting neuron

i to the output neuron, and

)(no

i

was the output signal of neuron i . The error

signal,

)(ne , between )(ny and the desired

C3

C4

(c) Rebound map at C3 and C4 peak

(a) AM waveform of C3 and C4

(b) Reconstructed signal of 62 channels

T

∆

4

6 8

10 s

4

6 8

10 s

beep

beep

15

-15

u

V

E

R

S

%

0

200

0

1

0

0.5

Rebound map of the same trial in Figure 6 without

applying ICA.

0.4

1

0.4

0.7

output, )(nd , was computed at each iteration. If the

error met the stopping criterion, the training

procedure was terminated. Otherwise, it was

minimized in the subsequent backward pass to

update the synaptic weighting )(nw

i

)()()]1([)()1( nonnwnwnw

iiii

ηδ

α

+

−

+=+

where

α

was the momentum constant, and )(n

δ

is

the local gradients of the output layer in the network,

given by

))((')()( nvnen

ϕ

δ

= . In the testing phase,

input feature vectors,

x

v

’s, can be linearly classified

according to the value of

)(ny in the output layer.

Figure 8. The hierarchy of BP neural network.

4.1.3 RBF-NN

The RBF neural network [Hay94a] uses a nonlinear

function to map the input data into high-dimension

space so that they are more likely to be linearly

separable than in the low-dimension space [Cov65a]

[Cov91a] [Cov88a]. The hierarchy of (regularization)

RBF neural network is depicted in Figure9, which

consists of one input layer, one hidden layer, and one

output layer.

Each RBF network is designed to have a nonlinear

trans- formation from the input layer to the hidden

layer, followed by a linear mapping from the hidden

layer to the output layer. The mapping between the

input and output space is expressed by:

∑

=

−=

N

i

ii

xxwxF

1

)()(

rrr

ϕ

(4.1)

where

2

)(

i

xx

i

exx

vv

vv

−−

=−

ϕ

and

i

w represents the

weighting from the i

th

hidden neuron to output

neuron, and

i

x

v

represents the i

th

known feature

vector with dimension m, i =1, 2, …N. The distance

between input vector,

x

v

, and center,

i

x

v

, is mapped

into high-dimension space by means of a Gaussian

function (

i

xx

v

v

−(

ϕ

) in this study. In the phase of

supervised learning, training feature vectors

i

x

v

, i =1,

2, …N, and output desired output

ii

dxF =)(

r

which

is either 1 or -1 in our design, are given. For the sake

of simplicity, the training feature vectors are used as

centers. With the known N input feature vectors and

the corresponding designed outputs, the weighting

i

w can be computed from the input-output

relationship in equation (4.1):

dGw

=

(4.2)

where

−−−

−−−

−−−

=

)()()(

)()()(

)()()(

11

22212

12111

NNNN

N

N

xxxxxx

xxxxxx

xxxxxx

G

vv

L

vvvv

MOLL

vv

L

vvvv

v

v

L

v

v

v

v

ϕϕϕ

ϕϕϕ

ϕϕϕ

,

=

N

w

w

w

w

L

2

1

,

=

N

d

d

d

d

L

2

1

By solving the linear system (4.2), the resultant

weighting w vector is

dGw

+

= (4.3)

where

TT

GGGG

1

)(

−+

= is the pseudoinverse matrix

of G. Compared with other neural network which

uses gradient-based optimization process to estimate

the weightings, for example, the back-propagation

recurrent neural network, the RBF neural network

solve for a set of linear equations to avoid trapping in

a local minimum and greatly reduce the training time.

In the testing phase, input feature vectors,

x

v

’s, can

be linearly classified based on the values of

)(xF

r

’s.

Figure 9. The hierarchy of RBF neural network.

4.1.4 SVM

The basic idea of support vector machine hinges

on two mathematical operations: (1) With an

appropriate nonlinear mapping

(.)

ϕ

of an input

vector into a high-dimensional feature space, data

i

x

r

Input layer

Hidden layer

Output

layer

)(ny

MM

)(

⋅

ϕ

)(nv

1

w

2

w

i

w

)(

1

no

)(

2

no

)(no

i

Forward pass

Backward pass

adjust synaptic weighting

i

x

r

Input

layer

Hidden layer of N

radio-basis functions

Output

layer

)(xF

r

MM

ϕ

1

w

2

w

N

w

adjust synaptic weighting

ϕ

ϕ

∑

=

−=

N

i

ii

xxwxF

1

)()(

r

r

r

ϕ

)(

ii

xx

r

r

−

ϕ

from two categories can be linearly separated by a

hyperplane [Cov65a], (2) Construction of an optimal

hyperplane for separating the features in (1). Let

x

v

denote a vector drawn from the input space, assumed

to be of dimension m

0

and let

1

1

)}({

m

j

j

x

=

v

ϕ

denote a

set of nonlinear transformations from the input space

to the feature space: m

1

is the dimension of the

feature space. Given such a set of nonlinear

transformations, we may define a hyperplane acting

as the decision surface as follows:

∑

=

=

1

0

0)(

m

j

jj

xw

r

ϕ

(4.4)

where

},...,,{

1

10 m

wwww = denotes a set of linear

weights connecting the feature space to the output

space. And it is assumed that

1)(

0

=x

r

ϕ

for all

x

r

, so

that

0

w denotes the bias. Equation (4.4) defines the

decision surface computed in the feature space in

terms of the linear weights of the machine. Define

the vector

T

m

xxxx )](),...,(),([)(

1

10

r

r

r

r

ϕϕϕϕ

= , and

T

m

wwww ],...,,[

1

10

= we rewrite the decision

surface in the compact form:

0)( =xw

T

r

ϕ

(4.5)

Given the training feature samples

)(

i

x

v

ϕ

corresponds

to the input pattern

i

x

r

, and the corresponding desired

response

,...,Nid

i

1 , = , which is either 1 or -1 in our

design, it has been shown that [Hay94a] the optimal

weight vector

w can be expressed as

∑

=

=

N

i

iii

xdw

1

)(

r

ϕα

(4.6)

where

N

ii 1

}{

=

α

is the optimal Lagrange multipliers

resulted from maximizing the subject function

∑∑∑

===

−=

N

i

N

j

ji

T

jiji

N

i

i

xxddQ

111

)()(

2

1

)(

rr

ϕϕαααα

(4.7)

subject to the constraints (1)

0

1

=

∑

=

N

i

ii

d

α

, and (2)

C

i

≤≤

α

0

, where C is a user-specified constant.

Substituting equation (4.6) into (4.5), we obtain the

optimal hyperplane

0)()(

1

=

∑

=

xxd

N

i

i

T

ii

rr

ϕϕα

(4.8)

which will be used for linearly separating the testing

data, i.e. for any testing sample

x, if

0)()(

1

≥

∑

=

xxd

N

i

i

T

ii

rr

ϕϕα

then

x is classified into the subset having the training

response

1

=

i

d , otherwise it is classified into the

other subset with

1

=

i

d . In our implementation, we

chose the radial basis function in defining the inner-

product kernel

)()( xx

i

T

r

r

ϕϕ

as follows:

)0005.0exp()()(),(

2

ii

T

i

xxxxxxK

r

r

r

r

r

r

−=≡

ϕϕ

.

According to equation (4.8), once the number of

nonzero Lagrange multipliers,

i

α

, is determined, the

number of radial-basis functions and their centers are

determined automatically. This differs from the

design of the conventional neural network, for

example, the back-propagation neural network or

radial-basis function network [Hay94a], where the

numbers of hidden layers or of hidden neuron are

usually determined heuristically.

5. RESULTS

Table 1 summarizes the averaged recognition results

for detecting the right and left imagined finger lifting

in four subjects (denoted by s1 ~ s4). With the use of

ICA in the extraction of the beta rebound maps, each

classifier has superior performance regardless of

subjects and the overall averaged recognition score

improved significantly from 55.0% to 74.8%. In

addition, the SVM outperformed other classifiers.

Classifier ICA s1 s2 s3 s4 mean

LDA without 58 55 57 51 54

with 63 79 74 63 69.8

BP-NN without 63 52 50 51 54

with 72 84 79 67 75.5

RBF-NN without 66 59 54 50 57.3

with 75 86 79 66 76.5

SVM without 66 53 50 51 55

with 72 87 77 73 77.3

Table 1. Averaged recognition rates (in percentages)

over four sessions resulted from different classifiers

with and without using ICA for feature extraction.

The receiver operating characteristics (ROC) curve, a

plot of true-positive rate versus false-positive rate,

provides another way to evaluate the performance of

binary detection classifiers. The area under the ROC

curve, which can be interpreted as the probability of

a random sample being assigned to positive class

than that to negative class, assesses the quality of

classification over a range of misclassification costs.

Table 2 reports that the use of ICA improved the

performance of each classifier and the overall

averaged ROC area increased from 0.63 to 0.75.

Classifier ICA s1 s2 s3 s4 mean

LDA without .71 .64 .58 .67 .65

with .75 .86 .74 .68 .78

BP without .65 .56 .61 .58 .60

with .68 .78 .74 .71 .73

RBF without .73 .60 .54 .62 .62

with .65 .91 .77 .74 .77

SVM without .64 .61 .66 .63 .64

with .69 .87 .77 .65 .75

Table 2. Averaged ROC areas over four sessions

resulted from different classifiers with and without

using ICA for feature extraction. The numbers of

beta rebound maps used for training and testing for

each subject at each session were 60 and 30.

6. CONCLUSIONS

We have presented a novel method using ICA in

extracting a reliable feature, the beta rebound map,

from the peaked ERS of motor imagery EEG. With a

minimum training for each subject (20 minutes only),

satisfactory classification rates from four classifiers

have been achieved. This demonstrated the suitability

of beta rebound map as neural input signals in the

application of BCI systems.

7. ACKNOWLEDGMENTS

The study was funded by the Taipei Veterans

General Hospital, Taiwan 91380, the Ministry of

Education of Taiwan (89BFA221401), and the

National Science of Council, Taiwan (NSC-92-2218-

E-010-016).

8. REFERENCES

[Clo96a] Clochon, P., Fontbonne, J. M., Etevenon, P.

A new method for quantifying EEG event-related

desynchronization: amplitude envelope analysis,

Electroencephalography & Clinical Neuro-

physiology

, 98: 126-129, 1996.

[Cov65a] Cover, T. M. Geometrical and statistical

properties of systems of linear inequalities with

applications in pattern recognition,

IEEE

transactions on electronic computers

, EC-14:

326-334, 1965.

[Cov91a] Cover, T. M., Thomas, J. A. Elements of

Information Theory. New York: Wiley, 1991.

[Cov88a] Cover, T. M., Capacity problems for linear

machines. Washington, DC: Thompson Book,

Pattern Recognition: 293-289, 1988.

[Hay94a] Haykin, S. Neural Network: A

Comprehensive Foundation. New York:

Macmillan College Publishing Company, 1994.

[Lee03a] Lee, P.L., Wu, Y.T., Chen, L.F.,

Chen ,Y.S., Cheng, C.M., Yeh, T.C., Ho, L.T.,

Chang, M.S., Hsieh, J.C. ICA-based

spatiotemporal approach for single-trial analysis

of post-movement MEG beta synchronization,

NeuroImage, in press, 2003.

[Mul99a] Muller-Gerking, J., Pfurtscheller, G.,

Flyvbjerg, H. Designing optimal spatial filters for

single-trial EEG classification in a movement

task,

Clinical neurophysiology, 110:787-798,

1999.

[Pfu96a] Pfurtscheller, G., Stancak Jr, A., Neuper, C.

Post-movement beta synchronization. A correlate

of an idling motor area?,

Electroencephalography & Clinical Neuro-

physiology

, 98:281-293, 1996.

[Pfu98a] Pfurtscheller, G., Neuper, C., A., Schlogl,

Lugger, K. Separability of EEG Signals

Recorded During Right and Left Motor Imagery

Using Adaptive Autoregressive Parameters,

IEEE

transactions on Rehabilitation Engineering

, Vol.

6, No. 3: 316-325, 1998.

[Pfu00a] Pfurtscheller, G., Guger, C., Muller, G.

Krausz, G., Neuper, C. Brain oscillations control

hand orthosis in a tetraplegic,

Neuroscience

letters

, 292: 211-214, 2000.

- CitationsCitations21
- ReferencesReferences7

- "Other studies use ICA as a denoising technique or as a feature extractor for improving the performance of a separate classifier. For example, in [4] ICA is used to remove ocular artefacts, while [5] extracts task-related independent components prior the application of several classifiers. In contrast to these approaches, in [10] the authors introduce a combination of Hidden Markov Models and Independent Component Analysis as a generative model of the EEG data and give a demonstration of how this model can be applied directly to the detection of when switching occurs between the two mental conditions of baseline activity and imaginary movement. "

[Show abstract] [Hide abstract]**ABSTRACT:**In this paper we investigate the use of a temporal extension of independent component analysis (ICA) for the discrimination of three mental tasks for asynchronous EEG-based brain computer interface systems. ICA is most commonly used with EEG for artifact identification with little work on the use of ICA for direct discrimination of different types of EEG signals. In a recent work we have shown that, by viewing ICA as a generative model, we can use Bayes' rule to form a classifier obtaining state-of-the-art results when compared to more traditional methods based on using temporal features as inputs to off-the-shelf classifiers. However, in that model no assumption on the temporal nature of the independent components was made. In this work we model the hidden components with an autoregressive process in order to investigate whether temporal information can bring any advantage in terms of discrimination of spontaneous mental tasks- [Show abstract] [Hide abstract]
**ABSTRACT:**This thesis explores latent-variable probabilistic models for the analysis and classification of electroenchephalographic (EEG) signals used in Brain Computer Interface (BCI) systems. The first part of the thesis focuses on the use of probabilistic methods for classification. We begin with comparing performance between 'black-box' generative and discriminative approaches. In order to take potential advantage of the temporal nature of the EEG, we use two temporal models: the standard generative hidden Markov model, and the discriminative input-output hidden Markov model. For this latter model, we introduce a novel 'apposite' training algorithm which is of particular benefit for the type of training sequences that we use. We also asses the advantage of using these temporal probabilistic models compared with their static alternatives. We then investigate the incorporation of more specific prior information about the physical nature of EEG signals into the model structure. In particular, a common successful assumption in EEG research is that signals are generated by a linear mixing of independent sources in the brain and other external components. Such domain knowledge is conveniently introduced by using a generative model, and leads to a generative form of Independent Components Analysis (gICA). We analyze whether or not this approach is advantageous in terms of performance compared to a more standard discriminative approach, which uses domain knowledge by extracting relevant features which are subsequently fed into classifiers. The user of a BCI system may have more than one way to perform a particular mental task. Furthermore, the physiological and psychological conditions may change from one recording session and/or day to another. As a consequence, the corresponding EEG signals may change significantly. As a first attempt to deal with this effect, we use a mixture of gICA in which the EEG signal is split into different regimes, each regime corresponding to a potentially different realization of the same mental task. An arguable limitation of the gICA model is the fact that the temporal nature of the EEG signal is not taken into account. Therefore, we analyze an extension in which each hidden component is modeled with an autoregressive process. The second part of the thesis focuses on analyzing the EEG signal and, in particular, on extracting independent dynamical processes from multiple channels. In BCI research, such a decomposition technique can be applied, for example, to denoise EEG signals from artifacts and to analyze the source generators in the brain, thereby aiding the visualization and interpretation of the mental state. In order to do this, we introduce a specially constrained form of the linear Gaussian state-space model which satisfies several properties, such as flexibility in the specification of the number of recovered independent processes and the possibility to obtain processes in particular frequency ranges. We then discuss an extension of this model to the case in which we don't know a priori the correct number of hidden processes which have generated the observed time-series and the prior knowledge about their frequency content is not precise. This is achieved using an approximate variational Bayesian analysis. The resulting model can automatically determine the number and appropriate complexity of the underlying dynamics, with a preference for the simplest solution, and estimates processes with preferential spectral properties. An important contribution from our work is a novel 'sequential' algorithm for performing smoothed inference, which is numerically stable and simpler than others previously published. Riassunto Questa tesi esplora l'utilizzo di modelli probabilistici a variabili nascoste per l'analisi e la classificazione dei segnali elettroencefalografici (EEG) usati in sistemi Brain Computer Interface (BCI). La prima parte della tesi esplora l'utilizzo di modelli probabilistici per la classificazione. Iniziamo con l'analizzare la differenza tra modelli generativi e discriminativi. Allo scopo di tenere in considerazione la natura temporale del segnale EEG, utilizziamo due modelli dinamici: il modello generativo hidden Markov model e il modello discriminativo input-output hidden Markov model. Per quest'ultimo modello, introduciamo un nuovo algoritmo di apprendimento che è di particolare beneficio per il tipo di sequenze EEG utilizzate. Analizziamo inoltre il vantaggio nell'utilizzare questi modelli dinamici verso i loro equivalenti statici. In seguito, analizziamo l'introduzione di informazione più specifica circa la struttura del segnale EEG. In particolare, un'assunzione comune nell'ambito di ricerca relativa al segnale EEG è il fatto che il segnale sia generato da una trasformazione lineare di sorgenti indipendenti nel cervello e altre componenti esterne. Questa informazione è introdotta nella struttura di un modello generativo e conduce ad una forma generativa di Independent Component Analysis (gICA) che viene utillizzata direttamente per classificare il segnale. Questo modello viene confrontato con un approccio discriminativo più comunemente usato, in cui dal segnale EEG viene estratta informazione rilevante successivamente donata ad un classificatore. All'inizio, gli utilizzatori di un sistema BCI possono avere molteplici modi realizzare uno stato mentale. Inoltre le condizione psicologiche e fisiologiche possono cambiare da una sessione di registrazione all'altra e da un giorno all'altro. Di conseguenza, il segnale EEG corrispondente può variare sensibilmente. Come primo tentativo di risolvere questo problema, utilizziamo una mistura di modelli gICA in cui il segnale EEG è suddiviso in diversi regimi, ognuno dei quali corrisponde ad un diverso modo di realizzare uno stato mentale. Potenzialmente, un limite del modello gICA è il fatto che la natura temporale del segnale EEG non è presa in considerazione. Di conseguenza, analizziamo un'estensione di questo modello in cui ogni componente indipendente viene modellata utilizzanto un modello autoregressivo. Il resto della tesi concerne l'analisi dei segnali EEG e, in particolare, l'estrazione di processi dinamici indipendenti da più elettrodi. Nel campo di ricerca sul BCI, un tale metodo di decomposizione ha varie possibili applicazioni. In particolare, può essere utilizzato per rimuovere artefatti dal segnale, per analizzare le sorgenti nel cervello e in definitiva per aiutare la visualizzazione e l'interpretazione del segnale. Introduciamo una forma particolare di linear Gaussian state-space model che soddisfa varie proprietà, come la possibilità di specificare un numero arbitrario di processi indipendenti e la possibilità di ottenere processi in particolari bande di frequenza. Discutiamo poi un'estensione di questo modello per il caso in cui non conosciamo a priori il numero corretto di processi che hanno generato la serie temporale e la conoscenza circa il loro contenuto di frequenza non è precisa. Quest'estensione è fatta utilizzando un'analisi di Bayes. Il modello che ne deriva può automaticamente determinare il numero e la complessità della dinamica nascosta, con una preferenza per la soluzione più semplice, ed è in grado di trovare processi indipendenti con particolare contenuto di frequenza. Un contributo importante in questo lavoro è lo sviluppo di un nuovo algoritmo per realizzare l'inferenza che è numericamente stabile e più semplice che altri presenti in letteratura. - [Show abstract] [Hide abstract]
**ABSTRACT:**In this paper a novel approach for independent component analysis (ICA) model order estimation of movement electroencephalogram (EEG) signals is described. The application is targeted to the brain-computer interface (BCI) EEG preprocessing. The previous work has shown that it is possible to decompose EEG into movement-related and non-movement-related independent components (ICs). The selection of only movement related ICs might lead to BCI EEG classification score increasing. The real number of the independent sources in the brain is an important parameter of the preprocessing step. Previously, we used principal component analysis (PCA) for estimation of the number of the independent sources. However, PCA estimates only the number of uncorrelated and not independent components ignoring the higher-order signal statistics. In this work, we use another approach - selection of highly correlated ICs from several ICA runs. The ICA model order estimation is done at significance level ÃŽÂ± = 0.05 and the model order is less or more dependent on ICA algorithm and its parameters.

## People who read this publication also read

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

This publication is from a journal that may support self archiving.

Learn more