Conference PaperPDF Available


GOAL: The P300 Speller is probably the best known application in BCI [1]. Over the years, many improvements over the pioneering systems have been made and some performance comparisons exist [2]. To contribute to the improvement process, we propose an open access to a large database obtained from first-time users of the P300 speller application implemented within the BCI2000 platform [3]. The database is documented with associated classifier designs and objective performance measures, readily available for comparison and reference. We also propose a set of Matlab functions that help in the preparation of data for alternative classifier design and testing. 4th Figure 1. The database website. The database includes recordings from 30 healthy subjects (18 Males/ 12 Females, age 21-25) controlling various conditions (sleep duration, drugs, etc). Available on, akimpech server: dropbox copy: Kaggle:
An Open-Access P300 Speller Database
Claudia Ledesma-Ramirez
, Erik Bojorges-Valdez
, Oscar Yáñez-Suarez
, Carolina Saavedra
, Laurent Bougrain
, Gerardo Gabriel Gentiletti
Laboratorio de Neuroimagenología, Universidad Autónoma Metropolitana (UAM), Mexico
Cortex team-project, Nancy University/INRIA Nancy Grand Est, France
Laboratorio de Ingeniería en Rehabilitación e Investigaciones Neuromusculares y Sensoriales, Universidad Nacional de Entre Ríos (UNER), Argentina
The P300 Speller is probably the best known
application in BCI [1]. Over the years, many
improvements over the pioneering systems have
been made and some performance comparisons
exist [2]. To contribute to the improvement
process, we propose an open access to a large
database obtained from first-time users of the
P300 speller application implemented within the
BCI2000 platform [3] (Figure 1). The database is
documented with associated classifier designs and
objective performance measures, readily available
for comparison and reference. We also propose a
set of Matlab functions that help in the preparation
of data for alternative classifier design and testing.
Figure 1. The database website.
[1] Farwell L. A. and Donchin E. “Talking off the top of your head: toward a
mental prosthesis utilizing event-related brain potentials.” Electroenceph. Clin.
Neurophysiol. Vol. 70, pp.510-23 (1988).
[2] Krusienski, D. J., Sellers E. W., Cabestaing F. “A comparison of classification
techniques for the P300 Speller.” Journal of Neural Engineering. Vol. 3, pp. 299-
305 (2006).
[3] Schalk G., Mc Farland D., Hinterberger T., Birbaumer N., Wolpaw J.
“BCI2000: A General-Porpose Brain-computer Interface (BCI) System. IEEE
Trans. Biomed. Eng. Vol. 51, pp. 1034-1043 (2004).
The database includes recordings from 30 healthy subjects
(18 Males/ 12 Females, age 21-25) controlling various
conditions (sleep duration, drugs, etc).
Each subject participated to 4 sessions with 15 sequences:
1) Three copy-spelling runs.
2) One copy-spelling run with feedback using a classifier
trained on data from session one.
3) Three free-spelling runs (user-selected words, around 15
characters per subject).
4) Variable free-spelling runs with reduced number of
sequences as indicated by bit-rate analysis.
10 channels (Fz, C3, Cz, C4, P3, Pz, P4, PO7, PO8, Oz) have
been recorded at 256 sps using the g.tec gUSBamp with
acquisition characteristics shown in Figure 2. The stimulus
is highlighted for 62.5 ms with an inter-stimuli interval of
125 ms.
We also propose a set of Matlab functions to extract and
average target and non-target responses specifying for
example the number of sequences to average and the
duration of the response and to save it in Matlab or ASCII
The database, a complete description of the parameters
used for the speller and the code are available at:
BCI Meeting
Notch Band-Pass
Chebyshev Chebyshev
4th order 8th order
58 - 62 Hz 0.1- 60 Hz
Table 1. Distribution of database cases as a
function of classifier accuracy and number of
averaged epochs (ne).
Figure 2. Recorded EEG
channels and filter parameters
Figure 3. Mean ROC area for all cases using
SWLDA analysis. Each x represents an individual
case, blue lines are standard deviation.
SWLDA (step-wise linear discriminant analysis) classifiers have been trained for each
subject. In order to provide the users of the database with an objective, comparable
measure of performance -that takes into account the choice of features and is
independent of the training/testing set- the relative (receiver) operating characteristic
or ROC curve has been selected. Summarized by the area under de curve (Az), the
ROC reflects intrinsic class separability: higher values of Az correspond to better
classifier designs.
As a reference, results for each subject are available on the web site. Accuracy using
SWLDA with 15 training sequences can be established in terms of an 86.7% of the
participants having 100% correct spelling, while the lowest percentage of correctly
detected characters reached by the rest of the database population was 85%. ROC
areas above 0.95 were reached by 76.7% of the population in about 10 sequences.
Thus, for 15 sequences the general performance is very good. Classifier features were
selected mainly from P08, Oz, PO7 and Pz electrodes and within the 100-290 ms
window. This shows that EP related to visual stimulation and its recognition play an
important role in the high accuracy of the classifier (See Table 1 and Figure 3).
This open-access P300 database includes recordings from 30 healthy
subjects. Data is available in BCI2000 and Matlab formats. A set of
Matlab functions for the extraction of the information that might be
needed for a given application is also included.
The database website provides, together with the data, a description
about conditions of each subject that has been recorded. Individual
results, accuracy, ROC area, and performace for every sequence count
are also reported. Given the individual accuracies and ROC areas for
the reference SWLDA classifier, it could be argued that overall data
quality is high.
We hope the work will contribute to better compare classifier
techniques as related to the P300 detection problem and applications,
by providing fair comparison grounds and reference data.
Figure 4. Impact of different preprocessing schemes (none,coiflet decomposition, b-
splines decomposition) on SVM classifier accuracies.
100% 95%
≤ 85%
15 26 1 2 0 1
14 22 4 3 1 0
13 25 2 2 1 0
11 19 3 5 0 3
10 20 3 3 4 0
8 18 3 2 3 4
5 8 2 7 6 7
3 3 0 3 6 18
... For training and testing of the proposed method, two EEG datasets of P300 BCI have been used [14], [15]. First is an open-access dataset that contains EEG recordings of 8 subjects, five male and three female, all suffering from ALS [14]. ...
... The second dataset used in this study is EEG recordings of healthy subjects -the Akimpech dataset [15]. This dataset contains EEG recordings of 30 subjects, recorded in 4 separate sessions. ...
... It is interesting to notice that the average performance of models drops for the Akimpech dataset [15] of healthy subjects, in comparison to the performance for the ALS patients' dataset [14]. Nevertheless, the performance on the second dataset shows the proposed model's ability to recognize the P300 component of a healthy subject. ...
Conference Paper
Full-text available
This study proposes a Double Input Convolutional Neural Network with Feature Concatenation (DiCNN-FC) for the classification task of the P300 speller. Two time-frequency representations of electroencephalography (EEG); namely, power and phase spectrograms; have been employed as input for the DiCNN-FC. Each spectrogram has been processed separately using convolutional layers and concatenated with each other for decision-making in the dense layers. The use of DiCNN- FC produces reliable results as the decision is made based on two sets of features. Two P300 datasets, one from amyotrophic lateral sclerosis (ALS) patients, and another from healthy subjects have been used to evaluate the performance of the proposed method. The performance comparison has been performed for two classical methods for P300 classification, namely Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA). The achieved results show the DiCNN-FC model’s ability to perform subject-independent P300 component identification based on single-trial data on both datasets.
... The Akimpech dataset (Ledesma Ramírez et al., 2010) included data from 30 volunteers. The signals of each subject were recorded during 4 sessions and stored in MATLAB files. ...
Full-text available
Background In the last decades, the P300 Speller paradigm was replicated in many experiments, and collected data were released to the public domain to allow research groups, particularly those in the field of machine learning, to test and improve their algorithms for higher performances of brain-computer interface (BCI) systems. Training data is needed to learn the identification of brain activity. The more training data are available, the better the algorithms will perform. The availability of larger datasets is highly desirable, eventually obtained by merging datasets from different repositories. The main obstacle to such merging is that all public datasets are released in various file formats because no standard way is established to share these data. Additionally, all datasets necessitate reading documents or scientific papers to retrieve relevant information, which prevents automating the processing. In this study, we thus adopted a unique file format to demonstrate the importance of having a standard and to propose which information should be stored and why. Methods We described our process to convert a dozen of P300 Speller datasets and reported the main encountered problems while converting them into the same file format. All the datasets are characterized by the same 6 × 6 matrix of alphanumeric symbols (characters and numbers or symbols) and by the same subset of acquired signals (8 EEG sensors at the same recording sites). Results and discussion Nearly a million stimuli were converted, relative to about 7000 spelled characters and belonging to 127 subjects. The converted stimuli represent the most extensively available platform for training and testing new algorithms on the specific paradigm – the P300 Speller. The platform could potentially allow exploring transfer learning procedures to reduce or eliminate the time needed for training a classifier to improve the performance and accuracy of such BCI systems.
... DaSalla et al., 2009) and(Ledesma-Ramirez et al., 2010), which contain EEG signals. Both datasets are explained in detail in "Pruning Criterion Validation Experiments"). ...
Full-text available
Extreme Learning Machines (ELMs) have become a popular tool for the classification of electroencephalography (EEG) signals for Brain Computer Interfaces. This is so mainly due to their very high training speed and generalization capabilities. Another important advantage is that they have only one hyperparameter that must be calibrated: the number of hidden nodes. While most traditional approaches dictate that this parameter should be chosen smaller than the number of available training examples, in this article we argue that, in the case of problems in which the data contain unrepresentative features, such as in EEG classification problems, it is beneficial to choose a much larger number of hidden nodes. We characterize this phenomenon, explain why this happens and exhibit several concrete examples to illustrate how ELMs behave. Furthermore, as searching for the optimal number of hidden nodes could be time consuming in enlarged ELMs, we propose a new training scheme, including a novel pruning method. This scheme provides an efficient way of finding the optimal number of nodes, making ELMs more suitable for dealing with real time EEG classification problems. Experimental results using synthetic data and real EEG data show a major improvement in the training time with respect to most traditional and state of the art ELM approaches, without jeopardising classification performance and resulting in more compact networks.
Full-text available
Brain-computer interface (BCI) speller is a system that provides an alternative communication for the disable people. The brain wave is translated into machine command through a BCI speller which can be used as a communication medium for the patients to express their thought without any motor movement. A BCI speller aims to spell characters by using the electroencephalogram (EEG) signal. Several types of BCI spellers are available based on the EEG signal. A standard BCI speller system consists of the following elements: BCI speller paradigm, data acquisition system and signal processing algorithms. In this work, a systematic review is provided on the BCI speller system and it includes speller paradigms, feature extraction, feature optimization and classification techniques for BCI speller. The advantages and limitations of different speller paradigm and machine learning algorithms are discussed in this article. Also, the future research directions are discussed which can overcome the limitations of present state-of-the-art techniques for BCI speller.
Full-text available
Event-related potential (ERP) is bioelectrical activity that occurs in the brain in response to specific events or stimuli, reflecting the electrophysiological changes in the brain during cognitive processes. ERP is important in cognitive neuroscience and has been applied to brain-computer interfaces (BCIs). However, because ERP signals collected on the scalp are weak, mixed with spontaneous electroencephalogram (EEG) signals, and their temporal and spatial features are complex, accurate ERP detection is challenging. Compared to traditional neural networks, the capsule network (CapsNet) replaces scalar-output neurons with vector-output capsules, allowing the various input information to be well preserved in the capsules. In this study, we expect to utilize CapsNet to extract the discriminative spatial-temporal features of ERP and encode them in capsules to reduce the loss of valuable information, thereby improving the ERP detection performance for BCI. Therefore, we propose ERP-CapsNet to perform ERP detection in a BCI speller application. The experimental results on BCI Competition datasets and the Akimpech dataset show that ERP-CapsNet achieves better classification performances than do the state-of-the-art techniques. We also use a decoder to investigate the attributes of ERPs encoded in capsules. The results show that ERP-CapsNet relies on the P300 and P100 components to detect ERP. Therefore, ERP-CapsNet not only acts as an outstanding method for ERP detection, but also provides useful insights into the ERP detection mechanism.
Full-text available
Brain-Computer Interfaces (BCIs) are systems allowing people to interact with the environment bypassing the natural neuromuscular and hormonal outputs of the peripheral nervous system (PNS). These interfaces record a user’s brain activity and translate it into control commands for external devices, thus providing the PNS with additional artificial outputs. In this framework, the BCIs based on the P300 Event-Related Potentials (ERP), which represent the electrical responses recorded from the brain after specific events or stimuli, have proven to be particularly successful and robust. The presence or the absence of a P300 evoked potential within the EEG features is determined through a classification algorithm. Linear classifiers such as stepwise linear discriminant analysis and support vector machine (SVM) are the most used discriminant algorithms for ERPs’ classification. Due to the low signal-to-noise ratio of the EEG signals, multiple stimulation sequences (a.k.a. iterations) are carried out and then averaged before the signals being classified. However, while augmenting the number of iterations improves the Signal-to-Noise Ratio, it also slows down the process. In the early studies, the number of iterations was fixed (no stopping environment), but recently several early stopping strategies have been proposed in the literature to dynamically interrupt the stimulation sequence when a certain criterion is met in order to enhance the communication rate. In this work, we explore how to improve the classification performances in P300 based BCIs by combining optimization and machine learning. First, we propose a new decision function that aims at improving classification performances in terms of accuracy and Information Transfer Rate both in a no stopping and early stopping environment. Then, we propose a new SVM training problem that aims to facilitate the target-detection process. Our approach proves to be effective on several publicly available datasets.
Over the past decade convolutional neural networks (CNNs) have become the driving force of an ever-increasing set of applications, achieving state-of-the-art performance. Modern CNN architectures are often composed of many convolutional and some fully connected layers, and have thousands or millions of parameters. CNNs have shown to be effective in the detection of Event-Related Potentials from electroencephalogram (EEG) signals, notably the P300 component which is frequently employed in Brain-Computer Interfaces (BCIs). However, for this task, the increase in detection rates compared to approaches based on human-engineered features has not been as impressive as in other areas and might not justify such a large number of parameters. In this paper, we study the performance of existing CNN architectures with diverse complexities for single-trial within-subject and cross-subject P300 detection on four different datasets. We also proposed SepConv1D, a very simple CNN architecture consisting of a single depthwise separable 1D convolutional layer followed by a fully connected Sigmoid classification neuron. We found that with as few as four filters in its convolutional layer and an overall small number of parameters, SepConv1D obtained competitive performances in the four datasets. We believe these results may represent an important step towards building simpler, cheaper, faster, and more portable BCIs.
The Brain-Computer Interfaces (BCI) based on Electroencephalography (EEG), allow that through the processing of impulses or electrical signals generated by the human brain, people who have some type of severe motor disability or suffer from neurological conditions or neurodegenerative diseases, can establish communication with electronic devices. This paper proposes the development of an expert system that generates the control sequences for a neuroprosthesis that will be used in the rehabilitation of patients who cannot control their own muscles through neuronal pathways. This proposal is based on the EGG record during the operation of a BCI under the rare event paradigm and the presence or not of the P300 wave of the Event-Related Potential (ERP). Feature extraction and classification will be implemented on a mobile device using Python as a platform. The processing of the EEG records will allow obtaining the information so that an Expert System implemented in the mobile device, is responsible for determining the control sequences that will be executed by a neuroprosthesis. The tests will be performed by controlling a neuroprosthesis developed by the Instituto Nacional de Rehabilitación in México, which aims to stimulate the movement of a person’s upper limb.
ResearchGate has not been able to resolve any references for this publication.