PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

When brain activity is translated into commands for real applications, the potential for human capacities augmentation is promising. In this paper, EMD is used to decompose EEG signals during Imagined Speech in order to use it as a biometric marker for creating a Biometric Recognition System. For each EEG channel, the most relevant Intrinsic Mode Functions (IMFs) are decided based on the Minkowski distance, and for each IMF 4 features are computed: Instantaneous and Teager energy distribution and Higuchi and Petrosian Fractal Dimension. To test the proposed method, a dataset with 20 subjects who imagined 30 repetitions of 5 words in Spanish, is used. Four classifiers are used for this task - random forest, SVM, naive Bayes, and k-NN - and their performances are compared. The accuracy obtained (up to 0.92 using Linear SVM) after 10-folds cross-validation suggest that the proposed method based on EMD can be valuable for creating EEG-based biometrics of imagined speech for Subjects identification.
Content may be subject to copyright.
EEG-based Subjects Identification based on
Biometrics of Imagined Speech using EMD
Luis Alfredo Moctezuma and Marta Molinas
Department of Engineering Cybernetics, Norwegian University of Science and
Technology. Trondheim, Norway
luisalfredomoctezuma@gmail.com, marta.molinas@ntnu.no
Abstract. When brain activity is translated into commands for real ap-
plications, the potential for human capacities augmentation is promising.
In this paper, EMD is used to decompose EEG signals during Imagined
Speech in order to use it as a biometric marker for creating a Biometric
Recognition System. For each EEG channel, the most relevant Intrinsic
Mode Functions (IMFs) are decided based on the Minkowski distance,
and for each IMF 4 features are computed: Instantaneous and Teager
energy distribution and Higuchi and Petrosian Fractal Dimension. To
test the proposed method, a dataset with 20 subjects who imagined 30
repetitions of 5 words in Spanish, is used. Four classifiers are used for
this task - random forest, SVM, naive Bayes and k-NN - and their per-
formances are compared. The accuracy obtained (up to 0.92 using Linear
SVM) after 10-folds cross-validation suggest that the proposed method
based on EMD can be valuable for creating EEG-based biometrics of
imagined speech for Subjects identification.
Keywords: Biometric security, Subjects identification, Imagined Speech,
Electroencephalograms (EEG), Empirical Mode Decomposition (EMD)
1 Introduction
Electroencephalography (EEG) is a popular non-invasive technique of Brain
Computer Interface (BCI), and it refers to the exploration of bioelectrical brain
activity registered during different activation functions. EEG does not require
any type of surgery, however, compared with invasive techniques the signals ob-
tained are weaker. Another important term is the Electrophysiological source,
that refers to the neurological mechanisms adopted by a BCI user to stimulate
the brain signals [1].
Due to the easy setup and the little training required, this paper uses the
Electrophysiological source Imagined Speech [2], that refers to imagined or inter-
nal speech without uttering-sounds / articulating-gestures, to create a biometric
system for subjects identification.
Due to the non-stationary and non-linear nature of brain signals, signal pro-
cessing tools like the Wavelet Transform [3,4,5] and power spectral density (PSD)
[6] capable of dealing with these properties, have been reported in the literature.
arXiv:1809.06697v1 [q-bio.NC] 13 Sep 2018
2 EEG-based Subjects Identification based on Biometrics of Imagined Speech
Most recent works have shown wavelets as a powerful tool to analyze brain sig-
nals. However, its main disadvantage is the need to fit the best mother function
for the signal. This means that mother functions will be different depending on
the task/neuro-paradigm/environment adopted.
Recently, the Empirical Mode Decomposition (EMD) has been employed to
analyze brain signals corresponding to different tasks. It has shown to be robust
in decomposing non-stationary and non-linear time series, with the advantage
that it does not need a-priory definition of specific parameters to the signal, in
contrast with wavelet transform.
The interest in biometric recognition systems has increased in the las years,
since traditional security systems (security guards, smart cards, etc) poses serious
challenges of increased vulnerabilities. Current biometric security systems are
vulnerable due to a variety of attacks to skip the authentication process [7].
This is because authentication systems cannot discriminate between authorized
users and an intruder who fraudulently obtains the access privileges.
To tackle this problem, some researchers have explored the use of brain sig-
nals as a measure for a biometric security system. This is possible because any
human physiological and/or behavioral characteristic can be used as a biometric
feature as long as it satisfies the following requirements: universality, perma-
nence, collectability, performance, acceptability and circumvention. A biometric
recognition system is able to perform automatic Subjects identification based
on their physiological and/or behavioral features[7] with the advantage that a
single biometric trait can be used for the access into several accounts.
In that context, the main neuro-paradigms in the state-of-the-art are: senso-
rimotor activity, imagination of activities (Visual Counting and geometric figure
Rotation [8] and mental composition of letters [9], among others [10]. One of the
challenges for subject identification task is the feature extraction stage in order
to represent the brain signal captured from different electrodes with a single vec-
tor, since it is impractical and computationally costly to use all data generated
by the brain.
Some authors report the use of imagined speech, for example [6] used EEG
signals from a small population of 6 subjects while imagining the syllables /ba/
and /ku/. The collected database consisted of 6 sessions and for each one 20
trials per subject from 128 channels with a sampling frequency of 1024 Hz. using
Electrical Geodesics device. For feature extraction they used the PSD for each
EEG signal, then autoregressive (AR) model coefficients were computed for each
electrode using the Burg method [11]. The classification stage was performed
using the linear kernel of Support Vector Machine (SVM) classifier and using
1-Nearest-Neighbor (k-NN). For these two syllables they obtained 99.76% and
99.41% of accuracy respectively. In the work presented in [5], resting-states were
used for subject identification using Linear SVM. The dataset used consisted
of 40 subjects, and 192 instances per subject. The sampling frequency was 256
Hz with 64 channels. First, for pre-processing a band-pass filter (0.5-40 Hz) and
then the Common Average Reference were applied. For feature extraction the
Morlet Wavelet was used to extract power spectrum of 7 frequency bands, to
EEG-based Subjects Identification based on Biometrics of Imagined Speech 3
finally apply a downsampling to 32 Hz. The accuracies obtained in the best cases
were 100%, 96% and 72% respectively for 3 lengths of the signal (300, 60 and
30 seconds). However, in a real application, the registry of 300, 60 or even 30
seconds of a signal can be impractical and with high computational cost for real-
time. In addition the use of 128 or 64 channels does not support the portability
of the device.
As a first step to create a robust method without a-priory definition of ad-
ditional parameters, a method based on EMD to extract features from brain
signals of Imagined Speech, is presented here.
1.1 Empirical Mode Decomposition
The EMD method is useful to decompose non-linear and non-stationary sig-
nals into a finite number of Intrinsic Mode Functions (IMFs) that satisfies two
conditions [12]:
1. The number of extrema and the number of zero crossings must be either
equal or differ at most by one.
2. At any point, the mean value of the envelope defined by the local maxima
and the envelope defined by the local minima is zero.
The method decomposes a signal into oscillatory components by applying a
process called sifting. The sifting process for the signal x(t) can be summarized
as shown in the algorithm 1:
Data: Time serie = x(t)
Result: IMFs
sifting = True;
while sifting = True do
1. Identify all upper extrema in x(t)
2. Interpolate the local maxima to form an upper envelope u(x).
3. Identify all lower extrema of x(t)
4. Interpolate the local minima to form an lower envelope l(x)
5. Calculate the mean envelope:
m(t) = u(x)+l(x)
2
6. Extract the mean from the signal:
h(t) = x(t)m(t)
if h(t)satisfies the two IMF conditions then
h(t) is an IMF;
sifting = False ; Stop sifting
else
x(t)= h(t);
sifting = True ; Keep sifting
end
if x(t)is not monotonic then
Continue;
else
Break;
end
end
Algorithm 1: The sifting process for a signal x(t)
1.2 IMFs selection
The EMD is a powerful tool to decompose a non-stationary signal, however some
IMFs that contain limited information may appear in the decomposition because
4 EEG-based Subjects Identification based on Biometrics of Imagined Speech
the numerical procedure is susceptible to errors [13]. To select the IMFs that
contain the most relevant information about the signal, the methods presented
in [14,15] were applied and compared, to finally use the method proposed in [15]
that employs the Minkowski Distance (dmink), as follow.
dmink = n
X
i=1 xiyi
2!1/2
(1)
where xiand yiare the i-th respective samples of the observed signal and
the extracted IMF.
According to the authors, the redundant IMFs have a shape and frequency
content different than those of the original signal, which means that when a IMF
is not appropriate, the dmink presents a maximum value.
In this work, a new method for feature extraction based on EMD is proposed.
In the next section the method is described in brief. Then, the application of the
proposed method for Subjects identification is explained.
2 Description of the method
The main contribution of the proposed method is the feature extraction stage,
that consists on applying the Empirical Mode Decomposition (EMD) to obtain
5 Intrinsic Mode Functions (IMF) per channel of the EEG data.
To select the most relevant IMFs, the Minkowski Distance was computed [15].
Once the most relevant IMFs from all instances in the dataset were obtained,
it turned out that the number of IMFs was different depending of the size of
the signal and the imagined word. However, to obtain meaningful features it is
necessary to have the same number of IMFs from all instances. To cope with this,
the IMFs selected were limited to the minimum relevant IMFs in all instances,
which in this case were only the first 2 IMFs.
Then for each IMF, 4 features were computed: Instantaneous energy, Teager
energy, Higuchi fractal dimension and Petrosian fractal dimension, as it is
shown in the figure 1. All features per IMF and per channel were concatenated
to obtain a feature vector per instance. Once the feature vectors were obtained,
they were used to train 4 machine learning-based classifiers (random forest, naive
Bayes, Support Vector Machine (SVM) and K-Nearest Neighbors (k-NN)) in
order to compare their performances.
In the following, the step-by-step procedure proposed in this work to identify
Subjects by using the EEG of their imagined speech, is described.
2.1 Feature extraction
When the most relevant IMFs were selected, 4 features were computed for each
one in order to reduce the feature vector to obtain a good representation of the
signal. The first 2 features used are related with the energy distribution and
the others two with Fractal dimensions. Each feature tested in this work is here
described in brief.
EEG-based Subjects Identification based on Biometrics of Imagined Speech 5
Fig. 1. Flowchart summarizing the feature extraction stage.
Instantaneous Energy: gives the energy distribution in each band [16]:
fj=log10 1
Nj
Nj
X
r=1
(wj(r))2!(2)
Teager Energy: This energy operator reflects variations in both amplitude
and frequency of the signal and it is a robust parameter as it attenuates
auditory noise [16,17].
fj=log10 1
Nj
Nj1
X
r=1 (wj(r))2wj(r1) wj(r+ 1)!(3)
Higuchi Fractal Dimension: The algorithm approximates the mean length
of the curve using segments of ksamples and estimates the dimension of a
time-varying signal directly in the time domain [18]. Considered a finite set
of observations taken at a regular interval: X(1), X(2), X (3), .., X(N). From
this series, a new one Xm
kmust be constructed,
Xm
k:X(m), X(m+k), X(m+ 2k), .., X m+Nm
kk(4)
Where m= 1,2, .., k,mindicate the initial time and kthe interval time.
Then, the length of the curve associated to each time series Xm
kcan be
computed as follow:
Lm(k) = 1
k Nm
k
X
i=1 X(m+ik)Xm+ (i1)k! N1
Nm
kk!(5)
6 EEG-based Subjects Identification based on Biometrics of Imagined Speech
Higuchi takes the mean length of the curve for each k, as the average value
of Lm(k), for m= 1,2, ..., k and k= 1,2, ..., kmax, that it is calculated as:
L(k) = 1
k
k
X
m1
(Lm(k)) (6)
Petrosian Fractal Dimension: can be used to provide a fast compu-
tation of the fractal dimension of a signal by translating the series into a
binary sequence [19].
F DP etrosian =log10 n
log10 n+ log10 n
n+0.4N(7)
Where nis the length of the sequence and Nis the number of sign changes
in the binary sequence.
2.2 Classifiers and validation
At this point, the features vector have the same features per each instance with
an assigned tag. This allows the use of machine learning methods. In this work,
the machine learning methods random forest, naive Bayes, SVM and k-NN were
used. For SVM, all experiments were reproduced with the kernels Linear, RBF
(Radial Basis Function) and Sigmoid. In the random forest case, the experiments
were reproduced with different tree depths (2, 3, 4, 5) using the Gini impurity.
k-NN was tested with different number of neighbors (1, 2, 3, 4, 5, 6, 7, 8, 9).
The accuracy with the 4 classifiers was estimated to evaluate their perfor-
mances using 10-folds cross-validation.
3 Dataset and experiments
In this section, the dataset used to test the proposed method in two different
experiments for Subjects identification is described in brief .
The purpose of the first experiment is to show that the Subject can be iden-
tified regardless of the imagined word, to show if a biometric system using differ-
ent password per Subject can be possible. The second experiment aim is to find
whether the accuracy is higher if the password is pre-defined. In others words,
using the same imagined word as biometric security measure.
3.1 Dataset
The dataset consists of EEG signals from 27 subjects recorded using EMOTIV
EPOC device while imagining 33 repetitions of five imagined words in Spanish;
arriba, abajo, izquierda, derecha and seleccion, corresponding to the English
words up, down, left, right and select.
EEG-based Subjects Identification based on Biometrics of Imagined Speech 7
Each repetition of the imagined words was separated by a state of rest. The
protocol for EEG signal acquisition is described in details in [4].
EEG signals were recorded from 14 channels which were placed on the head
according to the 10-20 international system [20], with a sample frequency of 128
Hz.
3.2 Setup
For the next experiments the first 20 subjects and the first 30 repetitions per
each of the 5 imagined words were used . In summary, the terms used along the
paper are the following.
S= 20: Subjects.
W= 5: The imagined words in the dataset.
R= 30: Repetitions per imagined word.
C= 14: Channels used in all instances.
I MF s= 2: IMF per channel.
F= 4: Features per IMF, corresponding to Teager energy, Instantaneous
energy, Higuchi Fractal Dimension and Petrosian Fractal Dimension.
According to the proposed method the feature vector size per subject is
FI MF sC= 112. Next, the specific setup for the experiments and the
results are presented.
3.3 Subject level analysis
In this experiment, 4 classifiers were used in order to compare their performances
and each classifier has Sclasses, and per class RW= 150 instances.
In the table 1 the accuracies obtained with the proposed method are shown.
Table 1. Accuracy obtained when all imagined words were joined for subjects identi-
fication
Classifier Accuracy
random forest 0.64
SVM 0.84
naive Bayes 0.68
k-NN 0.78
The aim of this experiment is to show that the method can be used for
subjects identification with high accuracy rates. According to the results in the
table 1, the classifier SVM was the best. As it was mentioned before, SVM was
tested with different kernels and for this experiment the best one was the Linear
SVM.
8 EEG-based Subjects Identification based on Biometrics of Imagined Speech
3.4 Word level analysis
In this experiment, the classifiers were trained using all words separately, in
order to explore if the proposed method works best for a specific word. Each
classifier has Sclasses, and per class Rinstances.
Table 2. Accuracy obtained per imagined word for subjects identification
Classifier Up Down Left Right Select
random forest 0.78 0.77 0.73 0.73 0.75
SVM 0.91 0.87 0.88 0.84 0.92
naive Bayes 0.90 0.85 0.88 0.85 0.89
k-NN 0.85 0.80 0.81 0.79 0.88
In this experiment also the highest accuracy was obtained when using the
SVM classifier with the Linear kernel, obtaining the accuracy of 0.92. On the
other hand, the lowest performance was obtained using the classifier random
forest
4 Discussion and Conclusions
In this paper, a method based on EMD for Subjects identification from EEG
signals of imagined speech was presented. The proposed method was applied to
a dataset of Imagine Speech with encouraging results. The accuracies obtained
suggest that the the use of imagined speech for Subjects identification, specially
using the classifier Linear SVM, can be effective and worth exploring further.
When the imagined words were joined to observe if it is possible to identify
a Subject regardless of the imagined word, the highest accuracy obtained was
0.84 using Linear SVM. Then, when the second experiment was carried out
to observe the accuracies using the imagined words separately, the maximum
accuracies reached were also using Linear SVM : 0.91, 0.87, 0.88, 0.84 and 0.92
respectively per each imagined word.
In the work presented in [21] the Common Average Reference (CAR)[22]
was used to improve the signal-to-noise ratio. Then, the feature extraction was
based on Instantaneous and Teager energy distribution of 4 decomposition levels
of Wavelet using the mother function Biorthogonal 2.2, and random forest for
classification. The accuracies obtained using the imagined word select were 0.96
and 0.93 respectively. In this paper, the highest accuracy (Using Linear SVM)
obtained for the imagined word select was 0.92, which is slightly lower than the
above ones. However, with the inherent adaptivity of EMD for feature extraction,
there is no need to pre-define any mother function for the particular task or
neuro-paradigm. In addition, the EMD inherently improves the signal-to-noise
ratio by removing the noise in the first IMFs.
EEG-based Subjects Identification based on Biometrics of Imagined Speech 9
A limitation of the proposed method is the use of a dataset of brain signals
from only 20 subjects. In future, it will be necessary to reproduce the experiments
using a larger population in order produce an alternative competitive system to
the current biometric security systems used in industry. Further efforts will be
made to explore alternative techniques for IMFs selection and for EEG channels
selection, since it is well known that specific channels will provide more relevant
information than others for the distinct task selected for subject identification.
Acknowledgments. This work was supported by Enabling Technologies -
NTNU, under the project “David versus Goliath: single-channel EEG unrav-
els its power through adaptive signal analysis - FlexEEG”.
References
1. Bashashati, Ali, Mehrdad Fatourechi, Rabab K. Ward, and Gary E. Birch: A survey
of signal processing algorithms in braincomputer interfaces based on electrical brain
signals. Journal of Neural engineering 4, no. 2 (2007): R32.
2. Desain, Peter, Jason Farquhar, Pim Haselager, Christian Hesse, and R. S. Schaefer:
What BCI research needs. In Proc. ACM CHI 2008 Conf. on Human Factors in
Computing Systems (Venice, Italy) (2008):.
3. Moctezuma, Luis Alfredo, Maya Carrillo, Luis Villase˜nor Pineda, and Alejandro A.
Torres Garc´ıa: Hacia la clasificaci´on de actividad e inactividad ling¨uistica a partir
de senales de electroencefalogramas (EEG). Research in Computing Science 140
(2017): 135-149.
4. Torres-Garc´ıa, A. A., C. A. Reyes-Garc´ıa, L. Villase˜nor-Pineda, and J. M. Ram´ırez-
Cort´ıs: An´alisis de se˜nales electroencefalogr´aficas para la clasificacin de habla imag-
inada. Revista mexicana de ingenier´ıa biom´edica 34, no. 1 (2013): 23-39.
5. Nishimoto, Takashi, Yoshiki Azuma, Hiroshi Morioka, and Shin Ishii: Individual
Identification by Resting-State EEG Using Common Dictionary Learning. In Inter-
national Conference on Artificial Neural Networks. Springer, Cham (2017): 199-207.
6. Brigham, Katharine, and BVK Vijaya Kumar: Subject identification from electroen-
cephalogram (EEG) signals during imagined speech. In Biometrics: Theory Applica-
tions and Systems (BTAS), 2010 Fourth IEEE International Conference on (2010):
1-8.
7. Jain, Anil K., Arun Ross, and Umut Uludag: Biometric template security: Chal-
lenges and solutions. In Signal Processing Conference, 2005 13th European (2005):
1-4.
8. Ashby, Corey, Amit Bhatia, Francesco Tenore, and Jacob Vogelstein: Low-cost elec-
troencephalogram (EEG) based authentication. In Neural Engineering (NER), 2011
5th International IEEE/EMBS Conference on (2011): 442-445.
9. Palaniappan, Ramaswamy: Electroencephalogram signals from imagined activities:
A novel biometric identifier for a small population. In International Conference on
Intelligent Data Engineering and Automated Learning. Springer, Berlin, Heidelberg
(2006): 604-611.
10. Del Pozo-Banos, Marcos, Jes´us B. Alonso, Jaime R. Ticay-Rivas, and Carlos M.
Travieso: Electroencephalogram subject identification: A review. Expert Systems
with Applications 41, no. 15 (2014): 6537-6554.
10 EEG-based Subjects Identification based on Biometrics of Imagined Speech
11. Steven, M. Kay: Modern spectral estimation: theory and application. Signal Pro-
cessing Series (1988):.
12. Huang, Norden E., Zheng Shen, Steven R. Long, Manli C. Wu, Hsing H. Shih,
Quanan Zheng, Nai-Chyuan Yen, Chi Chao Tung, and Henry H. Liu: The empirical
mode decomposition and the Hilbert spectrum for nonlinear and non-stationary
time series analysis. In Proceedings of the Royal Society of London A: mathematical,
physical and engineering sciences, vol. 454, no. 1971 (1998): 903-995.
13. Rilling, Gabriel, Patrick Flandrin, and Paulo Goncalves: On empirical mode de-
composition and its algorithms. In IEEE-EURASIP workshop on nonlinear signal
and image processing, vol. 3 NSIP-03, Grado (I), (2003): 8-11.
14. de Souza, Douglas Baptista, Jocelyn Chanussot, and Anne-Catherine Favre: On
selecting relevant intrinsic mode functions in empirical mode decomposition: An
energy-based approach. In Acoustics, Speech and Signal Processing (ICASSP), 2014
IEEE International Conference on (2014): 325-329.
15. Boutana, Daoud, Messaoud Benidir, and Braham Barkat: On the selection of in-
trinsic mode function in EMD method: application on heart sound signal. In Applied
Sciences in Biomedical and Communication Technologies (ISABEL), 2010 3rd In-
ternational Symposium on (2010): 1-5.
16. Didiot, Emmanuel, Irina Illina, Dominique Fohr, and Odile Mella: A wavelet-based
parameterization for speechmusic discrimination. Computer Speech & Language 24,
no. 2 (2010): 341-357.
17. Jabloun, Firas, and A. Enis Cetin: The Teager energy based feature parameters for
robust speech recognition in car noise. In Acoustics, Speech, and Signal Processing.
1999 IEEE International Conference on, vol. 1. (1999): 273-276.
18. T. Higuchi: Approach to an irregular time series on the basis of the fractal theory.
Physica D, Nonlinear Phenomena, vol. 31, (1988): 277-283.
19. A. Petrosian: Kolmogorov complexity of finite sequences and recognition of differ-
ent preictal eeg patterns. In Computer-Based Medical Systems, Proceedings of the
Eighth IEEE Symposium on (1995): 212-217.
20. Jasper, Herbert: Report of the committee on methods of clinical examination in
electroencephalography. Electroencephalogr Clin Neurophysiol 10 (1958): 370-375.
21. Moctezuma, Luis Alfredo, Marta Molinas, A. A. Torres Garc´ıa, Luis Villase˜nor
Pineda and Maya Carrillo: Towards an API for EEG-Based Imagined Speech clas-
sification. International conference on Time Series and Forecasting (2018):.
22. Bertrand, O., F. Perrin, and J. Pernier: A theoretical justification of the aver-
age reference in topographic evoked potential studies. Electroencephalography and
Clinical Neurophysiology/Evoked Potentials Section 62, no. 6 (1985): 462-464.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Empirical mode decomposition (EMD) allows decomposing an observed multicomponent signal into a set of monocomponent signals called Intrinsic Mode Functions (IMFs). EMD provides a large number of IMFs and it is important to select the fundamental IMFs and eliminate the redundant ones. This paper proposes a new criterion, based simultaneously on the Minkowski distance and the Jensen Rényi divergence of order α (α-JRD), to automatically select the appropriate IMFs in a set of the extracted ones. Examples, using synthetic and real-life heart sound signals, are presented in order to validate the performance of the proposed technique.
Article
Full-text available
In this paper, a new set of speech feature parameters based on multirate signal processing and the Teager energy operator is developed. The speech signal is first divided into nonuniform subbands in mel-scale using a multirate filter-bank, then the Teager energies of the subsignals are estimated. Finally, the feature vector is constructed by log-compression and inverse DCT computation. The new feature parameters have a robust speech recognition performance in car engine noise which is low pass in nature.
Conference Paper
Full-text available
We investigate the potential of using electrical brainwave signals during imagined speech to identify which subject the signals originated from. Electroencephalogram (EEG) signals were recorded at the University of California, Irvine (UCI) from 6 volunteer subjects imagining speaking one of two syllables, /ba/ and /ku/, at different rhythms without performing any overt actions. In this work, we assess the degree of subject-to-subject variation and the feasibility of using imagined speech for subject identification. The EEG data are first preprocessed to reduce the effects of artifacts and noise, and autoregressive (AR) coefficients are extracted from each electrode's signal and concatenated for subject identification using a linear SVM classifier. The subjects were identifiable to a 99.76% accuracy, which indicates a clear potential for using imagined speech EEG data for biométrie identification due to its strong inter-subject variation. Furthermore, the subject identification appears to be tolerant to differing conditions such as different imagined syllables and rhythms (as it is expected that the subjects will not imagine speaking the syllables at exactly the same rhythms from trial to trial). The proposed approach was also tested on a publicly available database consisting of EEG signals corresponding to Visual Evoked Potentials (VEPs) to test the applicability of the proposed method on a larger number of subjects, and it was able to classify 120 subjects with 98.96% accuracy.
Conference Paper
Recently, a number of biometric methods to identify individuals based on personal characteristics, such as fingerprints and irises, have been developed. Individual identification based on electroencephalography (EEG) measurements is one of the safe identification techniques to prevent spoofing. In this study, we propose to employ common dictionary learning, which was formerly presented by Morioka et al. [1] aiming at performing subject-transfer decoding, for extracting features for EEG-based individual identification. Using the proposed method, though a classifier was trained based on the EEG signals during the selective spatial attention task, we found each test subject was almost perfectly identified out of 40 based on its resting-state EEG signals.
Conference Paper
Although the empirical mode decomposition is a powerful tool for analyzing complicated datasets, many irrelevant intrinsic mode functions may appear in the decomposition. In this paper, we develop an energy-based method to detect relevant intrinsic mode functions. The new method can be seen as a generalization of techniques that are based on correlation. An experimental study is carried out in different datasets for assessing the performance of the proposed technique.
Article
El presente trabajo tiene como objetivo interpretar las señales de EEG registradas durante la pronunciación imaginada de palabras de un vocabulario reducido, sin emitir sonidos ni articular movimientos (habla imaginada o no pronunciada) con la intención de controlar un dispositivo. Específicamente, el vocabulario permitiría controlar el cursor de la computadora, y consta de las palabras del lenguaje español: "arriba", "abajo", "izquierda", "derecha", y "seleccionar". Para ello, se registraron las señales de EEG de 27 individuos utilizando un protocolo básico para saber a priori en qué segmentos de la señal la persona imagina la pronunciación de la palabra indicada. Posteriormente, se utiliza la transformada wavelet discreta (DWT) para extraer características de los segmentos que son usados para calcular la energía relativa wavelet (RWE) en cada una de los niveles en los que la señal es descompuesta, y se selecciona un subconjunto de valores RWE provenientes de los rangos de frecuencia menores a 32 Hz. Enseguida, éstas se concatenan en dos configuraciones distintas: 14 canales (completa) y 4 canales (los más cercanos a las áreas de Broca y Wernicke). Para ambas configuraciones se entrenan tres clasificadores: Naive Bayes (NB), Random Forest (RF) y Máquina de vectores de soporte (SVM). Los mejores porcentajes de exactitud se obtuvieron con RF cuyos promedios fueron 60.11% y 47.93% usando las configuraciones de 14 canales y 4 canales, respectivamente. A pesar de que los resultados aún son preliminares, éstos están arriba del 20 %, es decir, arriba del azar para cinco clases. Con lo que se puede conjeturar que las señales de EEG podrían contener información que hace posible la clasificación de las pronunciaciones imaginadas de las palabras del vocabulario reducido.
Article
This is, to the best of the authors knowledge, the first complete research on the state of the art on EEG based subject identification. As well as covering the full story of this field (from 1980 to 2013), an overview of the findings made in genetic and neurophysiology areas, from which it is based, is also provided. After a comprehensive search, 109 biometric publications were found and studied, from which 88 were finally included in this document. A categorization of papers is proposed based on the recording paradigm. The most used databases, some of them public, have been identified and named to allow the comparison of results from these and future works. The findings of this work show that, although basic questions remain to be answered, the EEG, and specially its power spectrum in the range of the alpha rhythm, contains subject specific information that can be used for classification. Moreover, approaches such as a multi-day-session training, the fusion of information from different electrodes and bands, and Support Vector Machines are recommended to maximize the system’s performance. All in all, the problem of subject identification by means of their EEG is harder than initially expected, as it relies on information extracted from complex heterogeneous EEG traits which are the results of elaborated models of inheritance, which in turn makes the problem very sensitive to its variables (time, frequency, space, recording paradigm and algorithms).