Conference PaperPDF Available

Abstract and Figures

This work aims to find discriminative EEG-based features and appropriate classification methods that can categorise brainwave patterns based on their level of activity or frequency for mental state recognition useful for human-machine interaction. By using the Muse headband with four EEG sensors (TP9, AF7, AF8, TP10), we categorised three possible states such as relaxing, neutral and concentrating based on a few states of mind defined by cognitive behavioural studies. We have created a dataset with five individuals and sessions lasting one minute for each class of mental state in order to train and test different methods. Given the proposed set of features extracted from the EEG headband five signals (alpha, beta, theta, delta, gamma), we have tested a combination of different features selection algorithms and classifier models to compare their performance in terms of recognition accuracy and number of features needed. Different tests such as 10-fold cross validation were performed. Results show that only 44 features from a set of over 2100 features are necessary when used with classical classifiers such as Bayesian Networks, Support Vector Machines and Random Forests, attaining an overall accuracy over 87%.
Content may be subject to copyright.
2018 International Conference on Intelligent Systems (IS)
978-1-5386-7097-2/18/$31.00 ©2018 IEEE
A Study on Mental State Classification
using EEG-based Brain-Machine Interface
Jordan J. Bird
School of Engineering & Applied Science
Aston University
Birmingham, UK
Anikó Ekárt
School of Engineering & Applied Science
Aston University
Birmingham, UK
Luis J. Manso
School of Engineering & Applied Science
Aston University
Birmingham, UK
Diego R. Faria
School of Engineering & Applied Science
Aston University
Birmingham, UK
Eduardo P. Ribeiro
Department of Electrical Engineering
Federal University of Parana
Curitiba, Brazil
AbstractThis work aims to find discriminative EEG-based
features and appropriate classification methods that can
categorise brainwave patterns based on their level of activity or
frequency for mental state recognition useful for human-machine
interaction. By using the Muse headband with four EEG sensors
(TP9, AF7, AF8, TP10), we categorised three possible states such
as relaxing, neutral and concentrating based on a few states of
mind defined by cognitive behavioural studies. We have created a
dataset with five individuals and sessions lasting one minute for
each class of mental state in order to train and test different
methods. Given the proposed set of features extracted from the
EEG headband five signals (alpha, beta, theta, delta, gamma), we
have tested a combination of different features selection
algorithms and classifier models to compare their performance in
terms of recognition accuracy and number of features needed.
Different tests such as 10-fold cross validation were performed.
Results show that only 44 features from a set of over 2100
features are necessary when used with classical classifiers such as
Bayesian Networks, Support Vector Machines and Random
Forests, attaining an overall accuracy over 87%.
Keywords EEG, brain-machine interface, machine learning,
mental states classification
The ability to autonomously detect mental states, whether
cognitive or affective, is useful for multiple purposes in many
domains such as robotics, health care, education, neuroscience,
etc. The importance of efficient human-machine interaction
mechanisms increases with the number of real life scenarios
where smart devices, including autonomous robots, can be
applied. One of the many alternatives that can be used to
interact with machines is through superficial brain activity
signals. These signals, called electroencephalograms or EEG
for short, convey information regarding the voltage measured
by electrodes (dry or wet) placed around the scalp of an
individual. In addition to regular non-invasive
electroencephalography there can also be found invasive
alternatives which can monitor brain activity placing the
electrodes directly inside the skull of the subject [35]. This
technique is known as intracranial electroencephalography
(iEEG). Despite iEEG can yield better signal acquisition, it is
invasive and therefore more complex to apply. Extracranial
electroencephalography techniques include wearable and non-
wearable technologies. The fact that extracranial devices used
to acquire EEG signals are non-invasive, are becoming easier
to wear, and their price is decreasing widens the range of
applications for which they are suitable.
A major challenge in brain-machine interface applications is
inferring how momentary mental states are mapped into a
particular pattern of brain activity. One of the main issues of
classifying EEG signals is the amount of data needed to
properly describe the different states, since the signals are
complex, non-linear, non-stationary, and random in nature.
The signals are considered stationary only within short
intervals, that is why the best practice is to apply short-time
windowing technique in order to detect local discriminative
features to meet this requirement. The paper at hand focuses
on selecting a subset of highly discriminative features and
comparing to state-of-the-art classification methods that can
categorise EEG signals into different mental states, taking into
consideration the performance in terms of accuracy and
computational cost. The application considered herein is to
distinguish among three different mental states (e.g. relaxed,
neutral and highly concentrated) of an individual using an
EEG device with dry electrodes that can interface a range of
applications, such as to control the movement of a robot.
The remainder of the paper proceeds as follows. Related works
are summarised in section II. The experimental setup,
including information regarding the device used, and details
about the data acquisition are described in section III. The
methods tested to perform feature selection and the criteria
used to compare the different classifiers are presented in
section IV. Preliminary results are presented in section V. A
discussion on the conclusions drawn from the experimental
results is provided in section VI.
Statistical features derived from EEG data are commonly used
alongside machine learning techniques to classify mental
states [18], [19]. These nominal states can then be used for
finite points of control as a Brain-Computer Interface. A Muse
headband has been recognised by neuroscientists for its
effectiveness and relatively low cost as well as its accuracy
when classified with Bayesian methods [8]. Through signals,
two tasks were recognised with 95% accuracy, though it is
worth noting that tasks were classified rather than mental
states, and said tasks were in binary distinction to one another.
Using a Muse headband, researchers accurately measured a
user’s enjoyment [11], [12] of an activity from brain signals
alone using the stimuli of two videogames, one measurably
more enjoyable than the other. With the use of a high
resolution 32-channel EEG and statistical feature extraction, a
model was developed to control a robot’s movement [9].
Using statistics focused on the signals produced by the motor
cortex which is thought to control muscles for movement [10],
researchers classified various states which successfully
resulted in a model that could direct a robot’s movement. EEG
data has been used extensively to detect abnormal brain
activity related to ill-health such as stroke [13] specifically
when ischemia is present in the brain, brain activity points to
abnormalities prior to the stroke occurring. As well as stroke
detection, neuroscientists found that upper extremities in
motor function post-stroke could be rehabilitated using EEG
data with robotics feedback [14] in the form of a brain-
machine interface. Results were promising in terms of the
effectiveness of the system’s ability to rehabilitate. Also
studied extensively is the ability to use EEG data to detect
seizures both in adults suffering with epilepsy [15] and notably
in new-born infants [16]. A Spiking Neural Network was
developed to classify seizure detection based on statistics
extracted from EEG streams with a high accuracy of 92.5%
[17]. Random Forest classification of extracted EEG features
was used to identify mental states during stages of sleep with a
high accuracy of 82% [20], a Bayesian classifier was trained
on more general awake, sleep and REM sleep states with
accuracies ranging between 92-97% in both humans and rats
[21]. Neural Networks have been observed to have an
accuracy of 64% when classifying emotional states based on
EEG data [7].
Differently from the aforementioned works, this work focuses
on a study on features selection and classification models
given a set of proposed features such as statistical, entropy-
based, derivatives and time-frequency features from short
temporal lapses of EEG data to then generate multiple data
sets of the same data points with original contribution in their
differing selections of attributes, which in turn are selected by
various machine learning models. The primary goal is to find a
suitable model that can categorise mental states based on EEG
data from the TP9, AF7, AF8 and TP10 electrodes.
A. EEG Data Acquisition
The sensor Muse Headband was used for data collection. The
Muse is a commercial EEG sensing device with five dry-
application sensors, one used as a reference point (NZ) and
four (TP9, AF7, AF8, TP10) to record brain wave activity.
Fig. 1. The International 10-20 EEG Electrode Placement Standard [4]
Highlighted in yellow are the sensors of the Muse Headband. The NZ
placement (green) is used as a reference point for calibration.
Fig. 2. Example of a live EEG stream of the four Muse sensors, Right AUX
did not have a device and was discarded due to it simply being noise.
This live feed graph has a Y-Axis of measured microvolts at t=0 on each
sensor, and an X-axis detailing the time reading.
To prevent the interference of electromyographic signals,
nonverbal tasks that required little to no movement were set.
Blinking, though providing interference to the AF7 and AF8
sensors, was neither encouraged nor discouraged to retain a
natural state. This was due to the dynamicity of blink rate
being linked to tasks requiring differing levels of
concentration [1], and as such the classification algorithms
would take these patterns of signal spikes into account. In
addition, subjects were asked not to close their eyes during any
of the tasks. Three stimuli were devised to cover the three
mental states available from the Muse Headband - relaxed,
neutral, and concentrating. The relaxed task had the subjects
listening to low-tempo music and sound effects designed to aid
in meditation whilst being instructed on relaxing their muscles
and resting. For a neutral mental, a similar test was carried out,
but with no stimulus at all, this test was carried out prior to
any others to prevent lasting effects of a relaxed or
concentrative mental state. Finally, for concentration, the
subjects were instructed to follow the shell game in which a
ball was hidden under one of three cups, which were then
switched, the task was to try and follow which cup hid the
ball. Future work arises in the implementation of a standard
experiment for each state, for proper comparison to similar
experiment. After a short amount of time into the stimulus
starting, as to not gather data with an inaccurate class, the EEG
data from the Muse Headband was automatically recorded for
sixty seconds. The data was observed to be streaming at a
variable frequency within the range of 150 - 270 Hz.
BlueMuse [5] was used for interfacing the device to a
computer, and Muselsl [6] was used to convert the Muse
signals to MicroVolts and record the data into a preliminary
dataset ready for feature extraction. Fig 2. shows a live stream
of EEG data, blinking can be seen in the troughs of TP9 and
TP10 (forehead sensors). At each point in the data stream (150
- 270 Hz), all signals were recorded along with a UNIX
timestamp which was further used for down sampling the data
to produce a uniform stream frequency. The measured
voltages on the graph can be mapped to the EEG placements
seen in Fig 1. Before the features extraction we have down
sampled the data. The sampling rate was decimated to 200 Hz
based on fast Fourier transformations along a given axis. The
resampled signal starts at the same value as x, but it is sampled
with a spacing of len(x) / num * (spacing of x). Because a
Fourier method is used, the signal is assumed to be periodic.
This is a realistic down-sampling as the dominant energy is
concentrated in the range of 20 - 500Hz, even though the
frequency range of the EEG sensor is superior.
A. Proposed Set of Features for EEG signals
Feature extraction and classification of EEG signals are core
issues in brain computer interface (BCI) applications. One
challenging problem when it comes to EEG feature extraction
is the complexity of the signal, since it is non-linear, non-
stationary, and random in nature. The signals are considered
stationary only within short intervals, that is why the best
practice is to apply short-time windowing technique to meet
this requirement. However, it is still considered an assumption
that holds during a normal brain condition. Non-stationary
signals can be observed during the change in alertness and
wakefulness, during eye blinking, and also during transitions
of mental states. Thus, this subsection describes the set of
features considered in this work to adequately discriminate
different classes of mental states. These features rely on
statistical techniques, time-frequency based on fast Fourier
transform (FFT), Shannon entropy, max-min features in
temporal sequences, log-covariance and others. All features
proposed to classify the mental states are computed in terms of
the temporal distribution of the signal in a given time window.
This slide window is defined as a period of 1 second at 250
Hz, i.e. all features are computed within this time instant. An
overlap of 0.5 second is used when moving the window, i.e.
the temporal window 1 (w1) starts at 0 sec. and finishes at 1
sec.; w2 starts at 1.5 sec. and finishes at 2.5 sec.; w3 starts at 2
sec. and finishes at 3 sec.; w4 starts at 2.5 sec. and finishes at
3.5 sec., and so on. Another important point to compute the
features is the signals from the EEG Muse headband. Since it
returns five types of signal frequencies {, β, , , }, then we
compute all proposed set of features for each signal. Thus, the
total number of feature values extracted from these signals is
2147 values.
Statistical Features: In order to have a compact representation
of the raw sensor data in a given time range, we are using a set
of classical statistical features since they are useful with
proven efficiency to complement set of multiples features in
order to recognise patterns in time series. The statistical
features are: (i) given a set of data values {x1, x2, ...xN}
acquired in each temporal window, the mean value
 
of that sequence is computed; (ii) the standard
 
; (iii) statistical moments of 3rd
and 4th order, which gives us the skewness to measure the
asymmetry of the data, and also the kurtosis to measure the
peakedness of the probability distribution of the data,
respectively. The statistical moments employed are computed
as follows:
, (1)
 
 (2)
where is the k = {3rd, 4th} moment about the mean and
y = {skewness, k = 3; kurtosis, k = 4}. Another type of
statistical features computed was the autocorrelation of the
signals at each time window for each of the five signals from
the EEG. The correlation of the signal with a delayed copy of
itself as a function of delay was employed similarly to [22]
and [23], where the implementation details and parameters are
Max, Min and Derivatives: Given a time window of 1 sec., the
maximum and minimum values are computed to increase the
diversity of the features types. Derivatives are also computed
as temporal features. For each time window, we split the time
window by 2, such w/2 = 0.5 sec. and w = 1 sec., resulting in
two sequences of data at ~125 Hz, then we compute:
where w and w/2 indicates the first and second half of the
sequence of data in a time window of 1 sec. The same strategy
is employed to get the derivative given the max and min
features in sub time windows:
The next temporal features are extracted after splitting the
initial time window of one second into 4 batches of 0.25 sec.
each. Then we computed the mean, max and min values of
each batch, {µ1, µ2, µ3, µ4}, {max1, max2, max3, max4} and
{min1, min2, min3, min4}. Then we compute the 1D Euclidean
distance among all mean values, 12 = | µ1- µ2|, 13 = | µ1- µ3|,
14 = | µ1- µ4|, 23 = |µ2 - µ3|, 24 = |µ2 - µ4|, 34 = |µ3 -
µ4|, the same for the minimum and maximum values, so that in
the end we got 18 features based on distances. Using the four
mean values, and the four max and four min values, and
adding the previous 18, we got 30 features for each signal in
the short time window, so that counting the 5 signals we have
150 temporal features per second.
Log-covariance features: Given the previous 150 temporal
features, we then discard the last 6 features in order to attain
144 features, so that we could build a  square matrix
to compute the log-covariance as follows:
  (6)
where lcM is a resulting vector containing the upper triangular
elements (78 features) of the matrix after computing the matrix
logarithm over the covariance matrix M; U(.) is a function to
return the upper triangular elements; logm(.) is the matrix
logarithm function; and the covariance matrix is given by
  
The rationale
behind of log-covariance is the mapping of the convex cone of
a covariance matrix to the vector space by using the matrix
logarithm so that it does not lie in Euclidean space, i.e., the
covariance matrix space is not closed under multiplication
with negative scalars.
Shannon entropy and log-energy entropy: non-linear analysis
such as Shannon entropy has proven its efficiency in signal
processing and time series since randomness of non-linear data
is well embodied by calculating entropies over the time series.
Entropy is an uncertainty measure and in brain-machine
interface applications, it is used to measure the level of chaos
of the system, since it is a non-linear measure quantifying the
degree of complexity of the data. In information theory, the
Shannon entropy is given by:
, (7)
where h is a feature computed in every time window of 1 sec.
and Sj is each element (normalized) of this temporal window.
Then, given the same time window, we split into two to
compute the log-energy entropy as follows:
 
, (8)
where i represents an index for the elements of the first sub
window (0 - 0.5 sec.) and j represents an index for the second
sub window (0.5 - 1 sec.).
Frequency domain: The FFT is an advantageous method to
analyse the spectrum of a given time-series. At every time
window we compute it as follows:
 k = 0, ... , N - 1. (9)
Accumulative features as energy model: An accumulative
value was obtained frame-by-frame given a time window, for
each individual feature, duplicating the number of features.
We compute the difference between the values of the current
frame to the previous frame and accumulate it over time as
 
  , (10)
where 
is the resulting energy model for the current time
instant given a specific type of feature , i = {1, ... , N} at a
time instant z representing a specific frame within a time
B. Feature Selection Algorithms
Feature selection aims to remove data which has no useful
application and only serves to unneededly increase the demand
for resources. Five datasets were generated using different
algorithms. Each retained the same data points, but which had
a reduced number of attributes selected by the algorithm. The
evaluators used were as follows:
1. OneR: calculates error rate of each prediction based on
one rule and selects the lowest risk classification [24].
2. Information Gain: assigns a worth to each individual
attribute by measuring the information gain with
respect to the class (difference of entropy) [25].
3. Correlation: measures the correlation of the attribute
and class via their Pearson's coefficient which is used
to rank attributes’ worth comparable to all others. [26].
4. Symmetrical Uncertainty: measures the uncertainty of
an attribute with respect to the class and bases selection
on lower uncertainties [27].
5. Evolutionary Algorithm: creates a population of
attribute subsets and ranks their effectiveness with a
fitness function to measure their predictive ability of
the class. At each generation, solutions are bred to
create offspring, and weakest solutions are killed off in
a tournament of fitness [34].
C. Machine Learning Algorithms
As a benchmark, a ZeroR classifier was first run on each
dataset. This simplistic classifier chooses one single class to
apply to all of the data to reduce inaccurate classifications, it is
expected that an accuracy of one third is achieved with a fair
distribution of the three mental states. Two models were
trained on Bayes Theorem, a formula of conditional
probability based on hypothesis H and evidence E. The
theorem states that the probability of the hypothesis being true
before evidence P(H) is related to the probability of the
hypothesis after reading the evidence P(H | E) and is given as
follows [29]: 
Naivety arises due to the unverified assumption of non-
consideration of the relationships between the absence of
attributes. A Bayesian Network (Bayes Net) model was also
trained. This method generates a probabilistic graphical model
via representing probabilities of variables to classes on a
Directed Acyclic Graph (DAG) [28] as follows:
 (12)
Model Accuracy % (2dp)
Naive Bayes
Bayes Net
Random Tree
Random Forest
Attribute Selection
No. of attributes
Information Gain
The goal is to infer the current time value of Ct given the
data Xt:t-T = {Xt, Xt-1,...,Xt-T} and the prior knowledge of the
class, which is attained by the a-posteriori probability
P(Ct |Ct-1:t-T, Xt:t-T). The superscript notation denotes the set
of values over a time interval.
Three decision trees were developed. Generated by the C4.5
algorithm [2], a J48 tree splits each decision based on
information gain, due to the measure of entropy in a leaf
A Random Tree is generated through a stochastic process
that will consider a random number of attributes at each
node. A Random Forest is the process of generating multiple
Random Trees [3]. A Multilayer Perceptron (MLP) model
was generated, a feedforward Neural Network in that cycles
are not formed by neurons. An MLP was implemented due
to its ability to classify data points that are not linearly
separable in Euclidean space [30]. A model was also trained
using a Support Vector Machine (SVM), which classifies
labelled data through a process of supervised learning,
where examples are mapped out in space and classification
is performed by the closest area in which the unknown class
data falls [31]. In particular, an improved version of Platt’s
Sequential Minimal Optimization (SMO) was used to train
the SVM [32], [33].
The five generated sets from the original dataset are shown
in Table I. Five different algorithms were chosen, and their
results ranked by their individual scores. Arbitrary cut off
points were implemented where the scores closed in on
either 0 or the lowest score present if there were no zero
values. The values given are incomparable between
algorithms due to their unique methods of giving score. The
MLP was given 2000 epochs to train with the number of
nodes on layers set to the default “a” setting, dynamically
calculated by n = (attributes + classes)/2
for each dataset it was trained on. A Zero Rules classifier
was run as a benchmark, and with close to equally
distributed data, set a model accuracy of 33.36% on all
datasets for comparison. We can observe from when
compared to all other classifiers which are not naive. The
most effective model was a Random Forest classifier along
with the dataset created by the OneR Attribute Selector,
which had a high accuracy of 87.16% when classifying the
data into one of the three mental states. Preliminary results
for each of the datasets and their trained models are
presented in Table II. For each test, 10-fold cross validation
was used to train the model. All random seeds were set to
their default value of 1. Table II that all of the models far
outperformed the benchmarks set by the Zero Rules
classifier, the lowest being 51.49% (Symmetrical
Uncertainty dataset with a Naive Bayes classifier). It is
reasonable to assume that the naivety in not considering
attribute relationships has led to poorer results.
This paper presented a study on mental state classification
based on EEG signals, it proposed a set of features using a
short-term windowing extracted from five signals from an
EEG sensor to categorise three different states: neutral,
relaxed and concentrated. A dataset was created using data
from five individuals in sessions lasting one minute for each
state. The primary goal of this work was to find appropriate
set of features by testing multiple feature selection
algorithms and classification models that provide acceptable
accuracy performance on the dataset that can be useful for
human-machine interaction. From the multiple feature sets
and models produced, the most accurate is a Random Forest
classifier on an attribute selected by the OneR ruleset, with a
prediction accuracy of 87.16%. Future work will be focused
on comparing our best results with deep learning strategies
and implementing a real-time application to: (i) control
devices, such as robots; and (ii) detect positive and negative
moods useful for applications in mental health care.
[1] Himebaugh, N.L., Begley, C.G., Bradley, A. and Wilkinson, J.A.,
2009. Blinking and tear break-up during four visual tasks. Optometry
and Vision Science, 86(2), pp. E106-E114.
[2] Quinlan, R., 1993. C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers, San Mateo, CA.
[3] Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-
[4] Jasper, Herbert H. 1958. "The ten-twenty electrode system of the
International Federation." Electroenceph. Clin. Neurophysiol. 370-
[5] Kowaleski, J. (2017). BlueMuse.
[6] Barachant, A. (2017). Muselsl.
[7] Bos, D.O., 2006. EEG-based emotion recognition. The Influence of
Visual and Auditory Stimuli, 56(3), pp.1-17.
[8] Krigolson, O.E., Williams, C.C., Norton, A., Hassall, C.D. and
Colino, F.L., 2017. Choosing MUSE: Validation of a low-cost,
portable EEG system for ERP research. Frontiers in neuroscience, 11,
[9] Li, W., Jaramillo, C. and Li, Y., 2012, January. Development of mind
control system for humanoid robot through a brain computer
interface. In 2012 International Conference on Intelligent System
Design and Engineering Application (pp. 679-682). IEEE.
[10] Rosenzweig, M.R., Breedlove, S.M. and Leiman, A.L., 2002.
Biological psychology: An introduction to behavioral, cognitive, and
clinical neuroscience. Sinauer Associates.
[11] Abujelala, M., Abellanoza, C., Sharma, A. and Makedon, F., 2016,
June. Brain-ee: Brain enjoyment evaluation using commercial eeg
headband. In Proceedings of the 9th acm international conference on
pervasive technologies related to assistive environments (p. 33).
[12] Plotnikov, A., Stakheika, N., De Gloria, A., Schatten, C., Bellotti, F.,
Berta, R., Fiorini, C. and Ansovini, F., 2012, July. Exploiting real-
time EEG analysis for assessing flow in games. In 2012 IEEE 12th
International Conference on Advanced Learning Technologies (pp.
688-689). IEEE.
[13] Jordan, K.G., 2004. Emergency EEG and continuous EEG monitoring
in acute ischemic stroke. J. of Clinical Neurophys., 21(5), pp.341-352.
[14] Ang, K.K., Guan, C., Chua, K.S.G., Ang, B.T., Kuah, C., Wang, C.,
Phua, K.S., Chin, Z.Y. and Zhang, H., 2010, August. Clinical study of
neurorehabilitation in stroke using EEG-based motor imagery brain-
computer interface with robotic feedback. 2010 Annual International
Conference of the IEEE (pp. 5549-5552).
[15] Tzallas, A.T., Tsipouras, M.G. and Fotiadis, D.I., 2009. Epileptic
seizure detection in EEGs using timefrequency analysis. IEEE
transactions on information technology in biomedicine, 13(5), pp.703-
[16] Aarabi, A., Grebe, R. and Wallois, F., 2007. A multistage knowledge-
based system for EEG seizure detection in newborn infants. Clinical
Neurophysiology, 118(12), pp.2781-2797.
[17] Ghosh-Dastidar, S. and Adeli, H., 2007. Improved spiking neural
networks for EEG classification and epilepsy and seizure detection.
Integrated Computer-Aided Engineering, 14(3), pp.187-212.
[18] Chai, T.Y., Woo, S.S., Rizon, M. and Tan, C.S., 2010. Classification
of human emotions from EEG signals using statistical features and
neural network. In International (Vol. 1, No. 3, pp. 1-6). Penerbit
[19] Tanaka, H., Hayashi, M. and Hori, T., 1996. Statistical features of
hypnagogic EEG measured by a new scoring system. Sleep, 19(9),
[20] Fraiwan, L., Lweesy, K., Khasawneh, N., Wenz, H. and Dickhaus, H.,
2012. Automated sleep stage identification system based on time
frequency analysis of a single EEG channel and random forest
classifier. Computer methods and programs in biomedicine, 108(1),
[21] Rytkönen, K.M., Zitting, J. and Porkka-Heiskanen, T., 2011.
Automated sleep scoring in rats and mice using the naive Bayes
classifier. Journal of neuroscience methods, 202(1), pp.60-64.
[22] Vital, J.P., Faria, D.R., Dias, G., Couceiro, M.S., Coutinho, F. and
Ferreira, N.M., 2017. Combining discriminative spatiotemporal
features for daily life activity recognition using wearable motion
sensing suit. Pattern Analysis and Applications, 20(4), pp.1179-1194.
[23] Faria, D.R., Vieira, M., Premebida, C. and Nunes, U., 2015, August.
Probabilistic human daily activity recognition towards robot-assisted
living. In Robot and Human Interactive Communication (RO-MAN),
2015 24th IEEE International Symposium on (pp. 582-587). IEEE.
[24] University of Waikato. 2018. OneR. [online]
Available at:
[Accessed 9 Aug. 2018].
[25] University of Waikato. 2018. InfoGainAttributeEval. [online] Available at:
AttributeEval.html [Accessed 9 Aug. 2018].
[26] Pearson, K., 1895. Note on regression and inheritance in the case of
two parents. Proceedings of the Royal Society of London, 58, pp.240-
[27] Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., 2016. Data Mining:
Practical machine learning tools and techniques. Morgan Kaufmann.
[28] Pearl, Judea 2000. Causality: Models, Reasoning, and Inference.
Cambridge University Press. ISBN 0-521-77362-8.
[29] Bayes, T., Price, R. and Canton, J., 1763. An essay towards solving a
problem in the doctrine of chances.
[30] Rosenblatt, F., 1961. Principles of neurodynamics. perceptrons and
the theory of brain mechanisms (No. VG-1196-G-8). CORNELL
[31] Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine
learning, 20(3), pp.273-297.
[32] Platt, J.C., 1999. 12 fast training of support vector machines using
sequential minimal optimization. Adv. in kernel methods, pp.185-208.
[33] Keerthi, S.S., Shevade, S.K., Bhattacharyya, C. and Murthy, K.R.K.,
2001. Improvements to Platt's SMO algorithm for SVM classifier
design. Neural computation, 13(3), pp.637-649.
[34] Back, T., 1996. Evolutionary algorithms in theory and practice:
evolution strategies, evolutionary programming, genetic algorithms.
Oxford university press.
[35] Shenoy, P; Miller, KJ; Ojemann, JG; Rao, RPN (2007). Generalized
features for electrocorticographic BCIs. IEEE Transactions on
Biomedical Eng. 55 (1), pp. 27380.
... The power spectral density was calculated by averaging the power for the frequency band in each epoch and then averaging it for all epochs. The frequency bands were divided into delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma . Twenty frequency power features, that is, five frequency bands for the four electrodes, were extracted. ...
... In summary, the EEG frequency power analysis was performed using a fast Fourier transform. Frequency bands were divided into delta (0-4 Hz), theta (4-7 Hz), alpha (8-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma . The absolute power of each frequency band was estimated for each EEG channel. ...
... (a) Muse2 headset band. (b) EEG montage of Muse 2 headset band based on the International 10-20 EEG electrode placement standard[27,28]. ...
Full-text available
Classifying emotional states is critical for brain–computer interfaces and psychology-related domains. In previous studies, researchers have tried to identify emotions using neural data such as electroencephalography (EEG) signals or brain functional magnetic resonance imaging (fMRI). In this study, we propose a machine learning framework for emotion state classification using EEG signals in virtual reality (VR) environments. To arouse emotional neural states in brain signals, we provided three VR stimuli scenarios to 15 participants. Fifty-four features were extracted from the collected EEG signals under each scenario. To find the optimal classification in our research design, three machine learning algorithms (XGBoost classifier, support vector classifier, and logistic regression) were applied. Additionally, various class conditions were used in machine learning classifiers to validate the performance of our framework. To evaluate the classification performance, we utilized five evaluation metrics (precision, recall, f1-score, accuracy, and AUROC). Among the three classifiers, the XGBoost classifiers showed the best performance under all experimental conditions. Furthermore, the usability of features, including differential asymmetry and frequency band pass categories, were checked from the feature importance of XGBoost classifiers. We expect that our framework can be applied widely not only to psychological research but also to mental health-related issues.
... Skewness and Kurtosis are statistical parameters that measure the degree of asymmetry or peakedness of data distribution. The Shannon entropy and log-energy entropy are used to measure how much information is being carried by a signal [2]. Details of these features are explained in [2], [11], [14], [15]. ...
... The Shannon entropy and log-energy entropy are used to measure how much information is being carried by a signal [2]. Details of these features are explained in [2], [11], [14], [15]. The MEMD features extracted from the IMFs are ranked based on their individual performance in the classification task. ...
Full-text available
In this study, the Multivariate Empirical Mode Decomposition (MEMD) approach is applied to extract features from multi-channel EEG signals for mental state classification. MEMD is a data-adaptive analysis approach which is suitable particularly for multi-dimensional non-linear signals like EEG. Applying MEMD results in a set of oscillatory modes called intrinsic mode functions (IMFs). As the decomposition process is data-dependent, the IMFs vary in accordance with signal variation caused by functional brain activity. Among the extracted IMFs, it is found that those corresponding to high-oscillation modes are most useful for detecting different mental states. Non-linear features are computed from the IMFs that contribute most to mental state detection. These MEMD features show a significant performance gain over the conventional tempo-spectral features obtained by Fourier transform and Wavelet transform. The dominance of specific brain region is observed by analysing the MEMD features extracted from associated EEG channels. The frontal region is found to be most significant with a classification accuracy of 98.06%. This multi-dimensional decomposition approach upholds joint channel properties and produces most discriminative features for EEG based mental state detection.
... They captured EEG signals from different EEG channels and by employing techniques like kNN and linear discriminant analysis (LDA) algorithms, they attained maximum classification accuracy of 83.26% and 75.21%, respectively. Zhang et al. [15] employed Principal Component Analysis (PCA) for feature extraction.Two channels were used to extract the characteristics (F3 and F4). The researchers attained a classification accuracy of 73%. ...
... The publicly available database relating to emotional states was used in this investigation. Data was obtained from two participants (one male and one female) for three minutes per state (positive, neutral, and negative) [15]. The Muse EEG headband is used to record the EEG placements of the TP9, AF7, AF8, and TP10 through dry electrodes. ...
Full-text available
In human contact, emotion is very crucial. Attributes like words, voice intonation, facial expressions, and kinesics can all be used to portray one's feelings. However, brain-computer interface (BCI) devices have not yet reached the level required for emotion interpretation. With the rapid development of machine learning algorithms, dry electrode techniques, and different real-world applications of the brain-computer interface for normal individuals, emotion categorization from EEG data has recently gotten a lot of attention. Electroencephalogram (EEG) signals are a critical resource for these systems. The primary benefit of employing EEG signals is that they reflect true emotion and are easily resolved by computer systems. In this work, EEG signals associated with good, neutral, and negative emotions were identified using channel selection preprocessing. However, researchers had a limited grasp of the specifics of the link between various emotional states until now. To identify EEG signals, we used discrete wavelet transform and machine learning techniques such as recurrent neural network (RNN) and k-nearest neighbor (kNN) algorithm. Initially, the classifier methods were utilized for channel selection. As a result, final feature vectors were created by integrating the features of EEG segments from these channels. Using the RNN and kNN algorithms, the final feature vectors with connected positive, neutral, and negative emotions were categorized independently. The classification performance of both techniques is computed and compared. Using RNN and kNN, the average overall accuracies were 94.844 % and 93.438 %, respectively.
... In this section of our experiment, we used a publicly available dataset [43] that contains EEG brainwave data collected by MUSE EEG headband with a resolution of TP9, AF7, AF8, TP10 electrodes. Labeling was performed by film clips with an obvious valence and includes positive and negative emotional states and neutral resting data [44,45]. The participants were one male and one female individual, and the data were collected for 3 minutes per state. ...
Full-text available
Clustering is an attractive method to handle large-scale data which are explosively generated through digitization. This approach is specifically appropriate when labeling is very costly. In this paper, we constructed an unsupervised learning algorithm and focused on a finite mixture model based on multivariate Beta distribution. Our motivation is the flexibility and high potential that this distribution offers in modeling data. To learn this mixture model, we used an expectation propagation inference framework in which the parameters and the complexity of the model were evaluated concurrently in a single optimization framework. We evaluated the performance of our framework on publicly available datasets related to forgery detection, EEG-based sentiment analysis and human activity recognition. Our proposed model demonstrates comparable results to similar alternatives.
There has been a sudden increase in demand for algorithms or models to correctly and accurately identify human emotions. The conformity for machines has come a long way from when smart machines capable of reaching a decision on their own were all that was expected of them, to machines capable of understanding what goes on in a person’s brain. Such autonomous agents can prove to be helpful not only in developing smarter machines but also in the field of medicine. Early prediction or recognizing brainwave patterns for epilepsy, seizures, manic depression, etc. is a key to achieve faster aid responses or prevention. In our work, we are limiting our focus to the most common practices used by researchers in this field, which is to obtain the electroencephalogram or EEG data, extract features and implement a classification algorithm. However, we are also trying to capitalize upon the massive improvements made in the field of supervised and unsupervised learning. The robust depth-wise separable convolution architecture called Xception has been implemented in this study to observe its performance as a feature extractor to the notoriously mutating EEG data. The EEG dataset being used in this study is open source. It is available in Kaggle and has three classes, namely positive, negative and neutral. We are implementing wavelet transform along with the Xception architecture to extract features from the dataset which are then classified using support vector machine. We achieve stellar results as a performance score of 98% can be observed for the measures accuracy, precision, recall as well as F1 score.
Fraud and abuse in insurance and health care claims have received major attention as they can cause increasing losses of revenue. Processing health care claims requires extensive workloads because the staffs have to investigate the legibility of the report. For the investigation of brain injury claims, the related insurance company may request medical images of the brain from the hospital and subsequently get opinions from the medical staff. Conventionally, computed tomography (CT) or magnetic resonance imaging (MRI) is utilized for this purpose. However, to perform a CT scan or an MRI scan for every patient that requested medical claims is impractical due to the limited resources. Thus, we proposed a screening approach that uses the resting-state electroencephalogram (EEG) recordings as the input to a long short-term memory (LSTM) network. This LSTM architecture can classify the resting state EEG into two classes, which are either as a moderate traumatic brain injury (TBI) patient or a healthy person. Experimental results show that the proposed approach is able to outperform two similar recent works by achieving a classification accuracy of 74.33%.
Full-text available
In modern Human-Robot Interaction, much thought has been given to accessibility regarding robotic locomotion, specifically the enhancement of awareness and lowering of cognitive load. On the other hand, with social Human-Robot Interaction considered, published research is far sparser given that the problem is less explored than pathfinding and locomotion. This thesis studies how one can endow a robot with affective perception for social awareness in verbal and non-verbal communication. This is possible by the creation of a Human-Robot Interaction framework which abstracts machine learning and artificial intelligence technologies which allow for further accessibility to non-technical users compared to the current State-of-the-Art in the field. These studies thus initially focus on individual robotic abilities in the verbal, non-verbal and multimodality domains. Multimodality studies show that late data fusion of image and sound can improve environment recognition, and similarly that late fusion of Leap Motion Controller and image data can improve sign language recognition ability. To alleviate several of the open issues currently faced by researchers in the field, guidelines are reviewed from the relevant literature and met by the design and structure of the framework that this thesis ultimately presents. The framework recognises a user's request for a task through a chatbot-like architecture. Through research in this thesis that recognises human data augmentation (paraphrasing) and subsequent classification via language transformers, the robot's more advanced Natural Language Processing abilities allow for a wider range of recognised inputs. That is, as examples show, phrases that could be expected to be uttered during a natural human-human interaction are easily recognised by the robot. This allows for accessibility to robotics without the need to physically interact with a computer or write any code, with only the ability of natural interaction (an ability which most humans have) required for access to all the modular machine learning and artificial intelligence technologies embedded within the architecture. Following the research on individual abilities, this thesis then unifies all of the technologies into a deliberative interaction framework, wherein abilities are accessed from long-term memory modules and short-term memory information such as the user's tasks, sensor data, retrieved models, and finally output information. In addition, algorithms for model improvement are also explored, such as through transfer learning and synthetic data augmentation and so the framework performs autonomous learning to these extents to constantly improve its learning abilities. It is found that transfer learning between electroencephalographic and electromyographic biological signals improves the classification of one another given their slight physical similarities. Transfer learning also aids in environment recognition, when transferring knowledge from virtual environments to the real world. In another example of non-verbal communication, it is found that learning from a scarce dataset of American Sign Language for recognition can be improved by multi-modality transfer learning from hand features and images taken from a larger British Sign Language dataset. Data augmentation is shown to aid in electroencephalographic signal classification by learning from synthetic signals generated by a GPT-2 transformer model, and, in addition, augmenting training with synthetic data also shows improvements when performing speaker recognition from human speech. Given the importance of platform independence due to the growing range of available consumer robots, four use cases are detailed, and examples of behaviour are given by the Pepper, Nao, and Romeo robots as well as a computer terminal. The use cases involve a user requesting their electroencephalographic brainwave data to be classified by simply asking the robot whether or not they are concentrating. In a subsequent use case, the user asks if a given text is positive or negative, to which the robot correctly recognises the task of natural language processing at hand and then classifies the text, this is output and the physical robots react accordingly by showing emotion. The third use case has a request for sign language recognition, to which the robot recognises and thus switches from listening to watching the user communicate with them. The final use case focuses on a request for environment recognition, which has the robot perform multimodality recognition of its surroundings and note them accordingly. The results presented by this thesis show that several of the open issues in the field are alleviated through the technologies within, structuring of, and examples of interaction with the framework. The results also show the achievement of the three main goals set out by the research questions; the endowment of a robot with affective perception and social awareness for verbal and non-verbal communication, whether we can create a Human-Robot Interaction framework to abstract machine learning and artificial intelligence technologies which allow for the accessibility of non-technical users, and, as previously noted, which current issues in the field can be alleviated by the framework presented and to what extent.
Stress, either physical or mental, is experienced by almost every person at some point in his lifetime. Stress is one of the leading causes of various diseases and burdens society globally. Stress badly affects an individual's well-being. Thus, stress-related study is an emerging field, and in the past decade, a lot of attention has been given to the detection and classification of stress. The estimation of stress in the individual helps in stress management before it invades the human mind and body. In this paper, we proposed a system for the detection and classification of stress. We compared the various machine learning algorithms for stress classification using EEG signal recordings. Interaxon Muse device having four dry electrodes has been used for data collection. We have collected the EEG data from 20 subjects. The stress was induced in these volunteers by showing stressful videos to them, and the EEG signal was then acquired. The frequency-domain features such as absolute band powers were extracted from EEG signals. The data were then classified into stress and non-stressed using different machine learning methods - Random Forest, Support Vector Machine, Logistic Regression, Naive Bayes, K-Nearest Neighbors, and Gradient Boosting. We performed 10-fold cross-validation, and the average classification accuracy of 95.65% was obtained using the gradient boosting method.
Full-text available
In recent years there has been an increase in the number of portable low-cost electroencephalographic (EEG) systems available to researchers. However, to date the validation of the use of low-cost EEG systems has focused on continuous recording of EEG data and/or the replication of large system EEG setups reliant on event-markers to afford examination of event-related brain potentials (ERP). Here, we demonstrate that it is possible to conduct ERP research without being reliant on event markers using a portable MUSE EEG system and a single computer. Specifically, we report the results of two experiments using data collected with the MUSE EEG system—one using the well-known visual oddball paradigm and the other using a standard reward-learning task. Our results demonstrate that we could observe and quantify the N200 and P300 ERP components in the visual oddball task and the reward positivity (the mirror opposite component to the feedback-related negativity) in the reward-learning task. Specifically, single sample t-tests of component existence (all p's < 0.05), computation of Bayesian credible intervals, and 95% confidence intervals all statistically verified the existence of the N200, P300, and reward positivity in all analyses. We provide with this research paper an open source website with all the instructions, methods, and software to replicate our findings and to provide researchers with an easy way to use the MUSE EEG system for ERP research. Importantly, our work highlights that with a single computer and a portable EEG system such as the MUSE one can conduct ERP research with ease thus greatly extending the possible use of the ERP methodology to a variety of novel contexts.
Full-text available
Motion sensing plays an important role in the study of human movements, motivated by a wide range of applications in different fields, such as sports, health care, daily activity, action recognition for surveillance, assisted living and the entertainment industry. In this paper, we describe how to classify a set of human movements comprising daily activities using a wearable motion capture suit, denoted as FatoXtract. A probabilistic integration of different classifiers recently proposed is employed herein, considering several spatiotemporal features, in order to classify daily activities. The classification model relies on the computed confidence belief from base classifiers, combining multiple likelihoods from three different classifiers, namely Naïve Bayes, artificial neural networks and support vector machines, into a single form, by assigning weights from an uncertainty measure to counterbalance the posterior probability. In order to attain an improved performance on the overall classification accuracy, multiple features in time domain (e.g., velocity) and frequency domain (e.g., fast Fourier transform), combined with geometrical features (joint rotations), were considered. A dataset from five daily activities performed by six participants was acquired using FatoXtract. The dataset provided in this work was designed to be extremely challenging since there are high intra-class variations, the duration of the action clips varies dramatically, and some of the actions are quite similar (e.g., brushing teeth and waving, or walking and step). Reported results, in terms of both precision and recall, remained around 85 %, showing that the proposed framework is able to successfully classify different human activities.
Conference Paper
Full-text available
Previous studies that involve measuring EEG, or electroencephalograms, have mainly been experimentally-driven projects; for instance, EEG has long been used in research to help identify and elucidate our understanding of many neuroscientific, cognitive, and clinical issues (e.g., sleep, seizures, memory). However, advances in technology have made EEG more accessible to the population. This opens up lines for EEG to provide more information about brain activity in everyday life, rather than in a laboratory setting. To take advantage of the technological advances that have allowed for this, we introduce the Brain-EE system, a method for evaluating user engaged enjoyment that uses a commercially available EEG tool (Muse). During testing, fifteen participants engaged in two tasks (playing two different video games via tablet), and their EEG data were recorded. The Brain-EE system supported much of the previous literature on enjoyment; increases in frontal theta activity strongly and reliably predicted which game each individual participant preferred. We hope to develop the Brain-EE system further in order to contribute to a wide variety of applications (e.g., usability testing, clinical or experimental applications, evaluation methods, etc.).
Full-text available
A statistical based system for human emotions classification by using electroencephalogram (EEG) is proposed in this paper. The data used in this study is acquired using EEG and the emotions are elicited from six human subjects under the effect of emotion stimuli. This paper also proposed an emotion stimulation experiment using visual stimuli. From the EEG data, a total of six statistical features are computed and back-propagation neural network is applied for the classification of human emotions. In the experiment of classifying five types of emotions: Anger, Sad, Surprise, Happy, and Neutral. As result the overall classification rate as high as 95% is achieved.
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.