ArticlePDF Available

Abstract and Figures

The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.
Content may be subject to copyright.
REGULARISED CSP FOR SENSOR SELECTION IN BCI
J. Farquhar, N. J. Hill, T. N. Lal, B. Sch¨olkopf
Max Planck Institute for Biological Cybernetics, T¨ubingen, Germany
e-mail: jdrf@tuebingen.mpg.de
SUMMARY: The Common Spatial Pattern (CSP) al-
gorithm is a highly successful method for efficiently
calculating spatial filters for brain signal classification.
Spatial filtering can improve classification performance
considerably, but demands that a large number of elec-
trodes be mounted, which is inconvenient in day-to-day
BCI usage. The CSP algorithm is also known for its
tendency to overfit, i.e. to learn the noise in the train-
ing set rather than the signal. Both problems motivate
an approach in which spatial filters are sparsified. We
briefly sketch a reformulation of the problem which al-
lows us to do this, using 1-norm regularisation. Focus-
ing on the electrode selection issue, we present prelim-
inary results on EEG data sets that suggest that effec-
tive spatial filters may be computed with as few as 10–
20 electrodes, hence offering the potential to simplify
the practical realisation of BCI systems significantly.
INTRODUCTION
BCI data sets typically consist of multiple time-series
that are highly correlated, particularly so when mea-
sured by EEG, since EEG signals suffer from a high
degree of spatial blurring. When transduction is based
on a nonlinear transformation of the time-series, such
as one that extracts band-power for the detection of
Event-Related Desynchronisation (ERD), a spatial fil-
tering preprocessing stage that performs source sep-
aration before nonlinear feature extraction will often
improve results (see for example [1]). This can be done
by Independent Component Analysis, or in some cases
by the computationally much cheaper Common Spa-
tial Pattern (CSP) method [2] and related algorithms
[3, 4, 5].
One practical problem with spatial filtering is that it
typically requires a large number of electrodes to be
applied, whereas in everyday clinical application it is
desirable to have to apply only a few. An additional
problem associated with the supervised CSP algorithm
in particular is its tendency to overfit, leading to poor
generalisation (for illustration and discussion of this ef-
fect see [1, 4, 5]). This is a particular problem when the
number of electrodes is large, and when the number of
available trials is small.
Both problems argue for an approach which can spar-
sify the spatial filters that one computes, i.e. to force
them to be based on a small number of electrodes, and
to trade this characteristic off against performance on
the training data. The goal is twofold: firstly to iden-
tify (based on an initial setting with a full EEG cap)
which electrodes should be attached in future sessions
and which can be omitted; secondly to regularise the
computation of spatial filters, leading to improved gen-
eralisation in cases where overfitting is a problem. Reg-
ularisation by sparsification is a common approach in
machine learning, and was described in the context of
a CSP-like algorithm by Dornhege et al. [5]. The lat-
ter authors apply regularisation in the domain of the
temporal FIR filters used in their algorithm. Here we
apply the same principle to the spatial filters them-
selves, focusing on the question: what is the tradeoff
between number of electrodes and performance, within
the CSP framework?
THE RCSP ALGORITHM
CSP operates on the covariance matrix ΣTbetween the
dchannels, computed using all trials, and the class-
covariance matrix Σcwhich is computed using only
trials from a given class c. Each filter is a vector w
of length d, found by maximising the variance in one
class whilst simultaneously minimising the variance in
the other class(es). Equivalently, CSP can be seen as
maximising the Rayleigh quotient which is the ratio of
the variance of the filtered signal in class cto its vari-
ance overall. In addition to this criterion, we add a
regularisation term incorporating a cost hyperparame-
ter C. As is common in regularisation-by-sparsification
approaches, our Cis a penalty term on the L1-norm
(i.e. the sum of the absolute values of the elements) of
w. Our wis therefore found by solving the following
unconstrained optimisation problem:
argmax
w
w>Σcw
w>ΣTwC|w|1
d|w|2
.(1)
The first term is the Rayleigh quotient: optimising
this alone (i.e. setting C= 0) can be shown to be
equivalent to solving the generalised eigenvalue prob-
lem Σcw=λΣTw, which gives the ordinary CSP solu-
tion. We obtain a solution to (1) using the conjugate
gradient method (see [6]). Once each filter is found,
subsequent filters are found by deflating Σcas follows:
ΣcΣcIw>wΣT
w>ΣTw,(2)
and then iterating the procedure. If Cis set to 0,
(1) and (2) together recover the ordinary CSP decom-
position in full. With C > 0, we call the algorithm
1
regularised CSP or rCSP, and its solutions are sparser,
i.e. the resulting wvectors have fewer non-zero entries,
meaning that fewer electrodes are used.
EXPERIMENTS
We tested the effect of varying Con the data from a
number of two-class motor imagery experiments with-
out feedback. 39-channel EEG was recorded from each
subject as they performed 400 trials of imagined left-
or right-hand movement. Regularised CSP was applied
using the 7–30 Hz band, and a linear Support Vector
Machine was used to classify the resulting variances
of the spatially filtered signals. Offline performance
was estimated using 2 repeats (with different ran-
dom seeds) of 10-fold cross-validation, and the SVM’s
own regularisation parameter was optimised using 10-
fold cross-validation nested within that (i.e. within the
training subset of each outer fold).
0 5 10 15 20 25 30 35 40
50
55
60
65
70
75
80
85
90
95
100
% correct classification
avg. number of electrodes
subject 1
subject 2
subject 3
subject 4
subject 5
8 filters
6 filters
4 filters
2 filters
Figure 1: Classification accuracy for 5 subjects, as a
function of number of electrodes required.
We varied the number of filters we wished to ex-
tract, n∈ {2,4,6,8}, and the cost parameter C
{0,0.01,0.02,0.05,0.1,0.2,0.5,1,2,5}. For each set-
ting, we plot classification accuracy (averaged across
the 20 outer folds) against the number of electrodes
required in total to implement the ncomputed filters
(also averaged across outer folds). We show results for
5 of the 6 subjects—the sixth subject showed similar
trends, but we omit his results for readability since the
curves overlap those of subjects 1 and 5.
Figure 1 gives a quantitative impression of the effect
of the number of electrodes needed. For some sub-
jects (for example, subjects 2 and 4) the curves are
surprisingly flat: using only two spatial filters, one can
reduce the number of electrodes to around 10 without
any appreciable drop in classification accuracy. For the
others, best performance was achieved with the maxi-
mum available number of electrodes, although close-to-
optimal performance may still be achieved with around
20. In practice, the optimal choice of Cand nshould,
as in most CSP implementations, be found for each
subject by cross-validation.
Note that these are only preliminary results—our sub-
jects started with a relatively small number of elec-
trodes, 39, which meant they were widely spaced rel-
ative to those, say, a 128-electrode cap. It is possible
that sparser electrode montages are effective if the can-
didate electrodes are more closely spaced.
CONCLUSIONS
Formulating the CSP problem as a Rayleigh quotient
optimisation allows us to modify the formulation easily,
with potential applications in both spatial and spatio-
spectral filtering. The current modification, rCSP, al-
lows automatic selection of a subset of electrodes dur-
ing the optimisation of the spatial filter, showing that
in some cases the number of electrodes can be reduced
to 20 or fewer with little loss in performance.
REFERENCES
[1] Hill N. J, Lal T. N, Schr¨oder M, Hinterberger T,
Widman G, Elger C. E, Sch¨olkopf B, and Bir-
baumer N. Classifying event-related desynchro-
nization in EEG, ECoG and MEG signals. In:
Dornhege G, del R. Millan J, Hinterberger T, Mc-
Farland D. J, and M¨uller K.-R (Eds.), Towards
Brain-Computer Interfacing. MIT Press, Cam-
bridge, MA (2006). In press.
[2] Koles Z. J, Lazar M. S, and Zhou S. Z. Spatial
patterns underlying population differences in the
background EEG. Brain Topography 2(4), 275–284
(1990).
[3] Wang Y, Berg P, and Scherg M. Common spa-
tial subspace decomposition applied to analysis of
brain responses under multiple task conditions: a
simulation study. Clinical Neurophysiology 110(4),
604–614 (1999).
[4] Lemm S, Blankertz B, Curio G, and M¨uller K.-
R. Spatio-spectral filters for robust classification of
single trial EEG. IEEE Transactions on Biomedical
Engineering 52(9), 993 – 1002 (2004).
[5] Dornhege G, Blankertz B, Krauledat M, Losch F,
Curio G, and M¨uller K.-R. Optimizing spatio-
temporal filters for improving brain-computer in-
terfacing. In: Weiss Y, Scolkopf B, and Platt J
(Eds.), Advances in Neural Information Processing
Systems 18. MIT Press, Cambridge, MA (2006).
[6] Bishop C. M. Neural Networks for Pattern Recog-
nition. Oxford University Press, (1995).
2
... Recent years, common spatial patterns (CSP) is widely used as a spatial filtering algorithm in BCI systems [28], [29]. The spatial filter coefficients of CSP are also utilized in [25], [30], [31] for EEG channel selection, l 1norm [32], [33] and l 1 /l 2 -norm [16], [17] regularization are introduced to obtain sparser CSP filters, which implicitly select channels. ...
Preprint
Many electroencephalogram (EEG)-based brain-computer interface (BCI) systems use a large amount of channels for higher performance, which is time-consuming to set up and inconvenient for practical applications. Finding an optimal subset of channels without compromising the performance is a necessary and challenging task. In this article, we proposed a cross-correlation based discriminant criterion (XCDC) which assesses the importance of a channel for discriminating the mental states of different motor imagery (MI) tasks. The performance of XCDC is evaluated on two motor imagery EEG datasets. In both datasets, XCDC significantly reduces the amount of channels without compromising classification accuracy compared to the all-channel setups. Under the same constraint of accuracy, the proposed method requires fewer channels than existing channel selection methods based on Pearson's correlation coefficient and common spatial pattern. Visualization of XCDC shows consistent results with neurophysiological principles.
Article
Brain responsiveness to stimulation fluctuates with rapidly shifting cortical excitability state, as reflected by oscillations in the electroencephalogram (EEG). For example, the amplitude of motor-evoked potentials (MEPs) elicited by transcranial magnetic stimulation (TMS) of motor cortex changes from trial to trial. To date, individual estimation of the cortical processes leading to this excitability fluctuation has not been possible. Here, we propose a data-driven method to derive individually optimized EEG classifiers in healthy humans using a supervised learning approach that relates pre-TMS EEG activity dynamics to MEP amplitude. Our approach enables considering multiple brain regions and frequency bands, without defining them a priori, whose compound phase-pattern information determines the excitability. The individualized classifier leads to an increased classification accuracy of cortical excitability states from 57% to 67% when compared to μ-oscillation phase extracted by standard fixed spatial filters. Results show that, for the used TMS protocol, excitability fluctuates predominantly in the μ-oscillation range, and relevant cortical areas cluster around the stimulated motor cortex, but between subjects there is variability in relevant power spectra, phases, and cortical regions. This novel decoding method allows causal investigation of the cortical excitability state, which is critical also for individualizing therapeutic brain stimulation.
Chapter
Development of brain–computer interfaces (BCIs) during the last three decades has enabled communications or control over external devices such as computers and artificial prostheses with the electrical activity of the human nervous system. The BCI applications are being more extended to human use especially for rehabilitation purposes. The correspondence between electroencephalogram (EEG) patterns and computer actions constitutes a machinelearning problem since the computer should learn how to recognize a given EEG pattern. Feature extraction and classification of the features for each particular body movement is the main objective in most of the BCI systems. Neuroprosthesis, as the major application of neurotechnology, together with BCI can help restore function for people with neuromuscular disorders such as amyotrophic lateral sclerosis, cerebral palsy, stroke, or spinal cord injury. Estimation of the cortical connectivity patterns provides a new tool in evaluation of the directivity of brain signals and localization of the movementrelated sources.
Chapter
Machine learning, often including training and testing, refers to a combination of learning from data and data clustering or classification/prediction. Availability of multichannel electroencephalography (EEG) recordings enables both data/sensor fusion and data factorization when presented in the form of multiway data. Principal component analysis and independent component analysis are two very common approaches to extract the dominant data features. The overarching objective of clustering algorithms is to separate the members of a dataset into distinct groups so that the groups have maximum distance between them and the members of each group or class have minimum average distance between them. Training using a large number of data ensamples and testing a new ensample are the two main stages of classification algorithms. Common spatial pattern (CSP) is a popular feature extraction and optimization method for classification of multichannel signals particularly EEG. CSP is one of the most successful feature extraction algorithms for brain–computer interfacing systems.
Article
Common Spatial Pattern (CSP) is a powerful feature extraction method in brain-computer interface (BCI) systems. However, the CSP method has some deficiencies that limit its beneficiary. First, this method is not useful when data is noisy, and it is necessary to have a large dataset because CSP is inclined to overfit. Second, the CSP method uses just the spatial information of the data, and it cannot incorporate the temporal and spectral information. In this paper, we propose a new CSP-based algorithm which is capable of employing the information in all dimensions of data. Also, by defining the regularization term for each mode of information, we can diminish the noise effects and overfitting aspects. We design a simple mathematical framework (called RCTP) to obtain multiple filters of each subspace of information simultaneously. We evaluated our method on 6 subject's data recorded in a Rapid Serial Presentation (RSVP) speller paradigm. The average accuracy of 91.7% and 90.2% is achieved for RCTP and RBCSP methods, respectively. By comparing the obtained results with those of the conventional CSP, it can be shown that the average test accuracy achieved by the proposed RCTP method is 32.1% higher than that of the conventional CSP method. The proposed method can achieve high classification accuracy by defining the regularization terms and using all information of the data.
Article
Objective: Many electroencephalogram (EEG)-based brain-computer interface (BCI) systems use a large amount of channels for higher performance, which is time-consuming to set up and inconvenient for practical applications. Finding an optimal subset of channels without compromising the performance is a necessary and challenging task. Approach: In this article, we proposed a cross-correlation based discriminant criterion (XCDC) which assesses the importance of a channel for discriminating the mental states of different motor imagery (MI) tasks. Channels are ranked and selected according to the proposed criterion. The efficacy of XCDC is evaluated on two motor imagery EEG datasets. On the two datasets, the proposed method reduces the channel number from 71 and 15 to under 18 and 11 respectively without compromising the classification accuracy on unseen data. Under the same constraint of accuracy, the proposed method requires fewer channels than existing channel selection methods based on Pearson's correlation coefficient and common spatial pattern. Visualization of XCDC shows consistent results with neurophysiological principles. Significance: This work proposes a quantitative criterion for assessing and ranking the importance of EEG channels in MI tasks and provides a practical method for selecting the ranked channels in the calibration phase of MI BCI systems, which alleviates the computational complexity and configuration difficulty in the subsequent steps, leading to real-time and more convenient BCI systems.
Article
Objective The main objective of this paper is to propose a novel technique, called filter bank maximum a-posteriori common spatial pattern (FB-MAP-CSP) algorithm, for online classification of multiple motor imagery activities using electroencephalography (EEG) signals. The proposed technique addresses the overfitting issue of CSP in addition to utilizing the spectral information of EEG signals inside the framework of filter banks while extending it to more than two conditions. Materials and methods The classification of motor imagery signals is based upon the detection of event-related de-synchronization (ERD) phenomena in the μ and β rhythms of EEG signals. Accordingly, two modifications in the existing MAP-CSP technique are presented: (i) The (pre-processed) EEG signals are spectrally filtered by a bank of filters lying in the μ and β brainwave frequency range, (ii) the framework of MAP-CSP is extended to deal with multiple (more than two) motor imagery tasks classification and the spatial filters thus obtained are calculated for each sub-band, separately. Subsequently, the most imperative features over all sub-bands are selected and un-regularized linear discriminant analysis is employed for classification of multiple motor imagery tasks. Results Publicly available dataset (BCI Competition IV Dataset I) is used to validate the proposed method i.e. FB-MAP-CSP. The results show that the proposed method yields superior classification results, in addition to be computationally more efficient in the case of online implementation, as compared to the conventional CSP based techniques and its variants for multiclass motor imagery classification. Conclusion The proposed FB-MAP-CSP algorithm is found to be a potential / superior method for classifying multi-condition motor imagery EEG signals in comparison to FBCSP based techniques.
Conference Paper
Full-text available
Brain-Computer Interface (BCI) systems create a novel communication channel from the brain to an output device by bypassing conventional motor output pathways of nerves and muscles. Therefore they could provide a new communication and control option for paralyzed patients. Modern BCI technology is essentially based on techniques for the clas- sification of single-trial brain signals. Here we present a n ovel technique that allows the simultaneous optimization of a spatial and a spectral filter enhancing discriminability of multi-channel EEG single-trials. The eval- uation of 60 experiments involving 22 different subjects demonstrates the superiority of the proposed algorithm. Apart from the enhanced clas- sification, the spatial and/or the spectral filter that are de termined by the algorithm can also be used for further analysis of the data, e .g., for source localization of the respective brain rhythms.
Article
A method is described which can be used to extract common spatial patterns underlying the EEGs from two human populations. These spatial patterns account, in the least-squares sense, maximally for the variance in the EEGs from one population and minimally for the variance in the other population and therefore would seem to be optimal for quantitatively discriminating between the individual EEGs in the two populations. By using this method, it is suggested that the problems associated with the more common approach to discriminating EEGs, significance probability mapping, can be avoided. The method is tested using EEGs from a population of normal subjects and using the EEGs from a population of patients with neurologic disorders. The results in most cases are excellent and the misclassification which occurs in some cases is attributed to the nonhomogeneity of the patient population particularly. The advantages of the method for feature selection, for automatically classifying the clinical EEG, and with respect to the reference-free nature of the selected features are discussed.
Article
A method, called common spatial subspace decomposition, is presented which can extract signal components specific to one condition from multiple magnetoencephalography/electroencephalography data sets of multiple task conditions. Signal matrices or covariance matrices are decomposed using spatial factors common to multiple conditions. The spatial factors and corresponding spatial filters are then dissociated into specific and common parts, according to the common spatial subspace which exists among the data sets. Finally, the specific signal components are extracted using the corresponding spatial filters and spatial factors. The relationship between this decomposition and spatio-temporal source models is described in this paper. Computer simulations suggest that this method can facilitate the analysis of brain responses under multiple task conditions and merits further application.