Conference PaperPDF Available

Mental Emotional Sentiment Classification with an EEG-based Brain-machine Interface


Abstract and Figures

This paper explores single and ensemble methods to classify emotional experiences based on EEG brainwave data. A commercial MUSE EEG headband is used with a resolution of four (TP9, AF7, AF8, TP10) electrodes. Positive and negative emotional states are invoked using film clips with an obvious valence, and neutral resting data is also recorded with no stimuli involved, all for one minute per session. Statistical extraction of the alpha, beta, theta, delta and gamma brainwaves is performed to generate a large dataset that is then reduced to smaller datasets by feature selection using scores from OneR, Bayes Network, Information Gain, and Symmetrical Uncertainty. Of the set of 2548 features, a subset of 63 selected by their Information Gain values were found to be best when used with ensemble classifiers such as Random Forest. They attained an overall accuracy of around 97.89%, outperforming the current state of the art by 2.99 percentage points. The best single classifier was a deep neural network with an accuracy of 94.89%.
Content may be subject to copyright.
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Mental Emotional Sentiment Classification with an EEG-
based Brain-Machine Interface
Jordan J. Bird
School of Engineering and Applied Science
Aston University
Birmingham, UK
Christopher D. Buckingham
School of Engineering and Applied Science
Aston University
Birmingham, UK
Anikó Ekárt
School of Engineering and Applied Science
Aston University
Birmingham, UK
Diego R. Faria
School of Engineering and Applied Science
Aston University
Birmingham, UK
This paper explores single and ensemble methods to classify
emotional experiences based on EEG brainwave data. A
commercial MUSE EEG headband is used with a resolution of
four (TP9, AF7, AF8, TP10) electrodes. Positive and negative
emotional states are invoked using film clips with an obvious
valence, and neutral resting data is also recorded with no stimuli
involved, all for one minute per session. Statistical extraction of
the alpha, beta, theta, delta and gamma brainwaves is performed
to generate a large dataset that is then reduced to smaller datasets
by feature selection using scores from OneR, Bayes Network,
Information Gain, and Symmetrical Uncertainty. Of the set of
2548 features, a subset of 63 selected by their Information Gain
values were found to be best when used with ensemble classifiers
such as Random Forest. They attained an overall accuracy of
around 97.89%, outperforming the current state of the art by 2.99
percentage points. The best single classifier was a deep neural
network with an accuracy of 94.89%.
Emotion Classification, Brain-Machine Interface, Machine
The proceedings are the records of the IAPE’18 conference. We
ask that authors follow some simple guidelines. In essence, we ask
you to make your paper look exactly like this document. The
easiest way to do this is simply to replace the content with your
own material.
Autonomous non-invasive detection of emotional states is
potentially useful in multiple domains such as human robot
interaction and mental healthcare. It can provide an extra
dimension of interaction between user and device, as well as
enabling tangible information to be derived that does not depend
on verbal communication [1]. With the increasing availability of
low-cost electroencephalography (EEG) devices, brainwave data
is becoming affordable for the consumer industry as well as for
research, introducing the need for autonomous classification
without the requirement of an expert on hand.
Due to the complexity, randomness, and non-stationary aspects of
brainwave data, classification is very difficult with a raw EEG
stream. For this reason, stationary techniques such as time
windowing must be introduced alongside feature extraction of the
data within a window. There are many statistics that can be
derived from such EEG windows, each of which has varying
classification efficacy depending on the goal. Feature selection
must be performed to identify useful statistics and reduce the
complexity of the model generation process, saving both time and
computational resources during the training and classification
The main contributions of this work are as follows:
Exploration of single and ensemble methods for the
classification of emotions.
A high performing data mining strategy reaching
97.89% accuracy.
The inclusion of facial EMG signals as part of the
classification process.
A resolution of three emotional classes (positive,
neutral, negative) to allow for real world on mental
states that are not defined by prominent emotions.
One Rule classification demonstrating how accurately
the AF7 electrode’s mean value classifies mental states.
The remainder of this paper will explore related state-of-the-art
research and provide the main inspiration and influences for the
study. It will explain the methodology of data collection, feature
generation, feature selection and prediction methods. The results
will be presented and discussed alongside comparable work,
followed by conclusions and future work.
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Figure 1. Di agr am to show Lövheim’ s Cube of
Emotional Categorization
Statistics derived from a time-windowing technique with feature
selection have been found to be effective for classifying mental
states such as relaxed, neutral, and concentrating [2]. An ensemble
method of Random Forest had an observed classification accuracy
of 87% when performed with a dataset which was pre-processed
with the OneR classifier as a feature selector. These promising
results suggested a study on classification of emotional states
using a similar exploration method would be similarly successful.
The best current state-of-the-art solution for classification of
emotional EEG data from a low-resolution, low-cost EEG setup
used Fisher’s Discriminant Analysis to produce an accuracy of
95% [3]. The study tried to prevent participants from becoming
tense and discourage blinking but the previous study [2] found
that EMG data from these activities helped classification because
blink rates are a factor in concentration for example. Hence the
new study described in this paper will explore classification of
emotions in EEG data when unconscious movements are neither
encouraged nor discouraged. Conscious extraneous movements
such as taking a sip of water will not be allowed because they just
form outlying or masking points in the data. For example, if the
people experiencing positive emotions are also drinking water, the
model will simply classify the electrical data that has been
generated by those movements. Stimuli to evoke emotions for
EEG-based studies are often found to be best with music [4] and
film [5]. This paper thus focuses on film clips that have audio
tracks (speech and/or music) to evoke emotions, similarly to a
related study that used music videos [6].
Common Spatial Patterns have proved extremely effective for
emotion classification, attaining an overall best solution at 93.5%
[7]. A MUSE EEG headband was successfully used to classify
high resolutions of valence through differing levels of enjoyment
during a certain task [8]. Deep Belief Network (DBN), Artificial
Neural Network (ANN), and Support Vector Machine (SVM)
methods have all been able to classify emotions from EEG data
was also found to be very effective with when considering binary
classes of positive and negative [9]. This study will build on all
these results using similar methods as well as an ensemble, to
exploit their differing strengths and weaknesses. The study also
supports the usage of a neutral class, for transition into real-world
use, to provide a platform for emotional classification where
emotions are not prominent. It adds valence or perceived
sentiment because this was previously found to be helpful in the
learning processes for a web-based chatbot [10].
3.1 Electroencephalography
Electroencephalography is the process using applied electrodes to
derive electrophysiological data and signals produced by the brain
[11] [12]. Electrodes can be subdural [13] ie. under the skull,
placed on and within the brain itself. Noninvasive techniques
require either wet or dry electrodes to be placed around the
cranium [14]. Raw electrical data is measured in Microvolts (uV)
at observed time t producing wave patterns from t to t+n.
3.2 Human Emotion
Human emotions are varied and complex but can be generalized
into positive and negative categories [15]. Some emotions overlap
such as ’hope’ and ’anguish’, which are considered positive and
negative respectively but that are often experienced
Table 1. Table to show Lövheim categories and their
encapsulated emotions with a valence label
Shame (Negative) Humiliation (Negative)
Contempt (Negative) Disgust (Negative)
Fear (Negative)
Terror (Negative)
Enjoyment (Positive) Joy (Positive)
Distress (Negative) Anguish (Negative)
Surprise (Negative) (Lack of Dopamine)
Anger (Negative) Rage (Negative)
Interest (Positive) Excitement (Positive)
contemporaneously: e.g. the clearly doomed hope and
accompanying anguish for a character’s survival in a film. This
study will concentrate on those emotions that do not overlap, to
help correctly classify what is and is not a positive experience.
Lövheim’s three-dimensional emotional model maps brain
chemical composition to generalised states of positive and
negative valence [16]. This is shown in Fig. 1 with emotion
categories A-H from each of the model’s vertices, further detailed
in Table I. Various chemical compositions can be mapped to
emotions with positive and negative classes. Furthermore, studies
show that chemical composition influences nervous oscillation
and thus the generation of electrical brainwaves [17]. Since
emotions are encoded within chemical composition that directly
influence electrical brain activity, this study proposes that they
can be classed using statistical features of the produced
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Figure 2. A simpl ified di agram of a ful ly-connected feed
forward deep neural network.
3.3 Machine Learning Algorithms
The study in this paper applies a number of machine learning
algorithms. One Rule (OneR) classification is a simplistic
probabilistic process of selecting one attribute from the dataset
and generating logical rules based upon it. For example:
"IF temperature LESS THAN 5.56 THEN December"
"IF temperature MORE THAN 23.43 THEN July"
are rules generated based on a temperature attribute to predict the
month (class). This model will identify the strongest attribute
within the dataset for classifying emotions
Decision Trees follow a linear process of conditional control
statements based on attributes, through a tree-like structure where
each node is a rule based decision that will further lead to other
nodes. Finally, an end node is reached, and a class is given to the
data object. The level of randomness or entropy on all end nodes
is used to measure the classification ability of the tree. The
calculation of entropy is given as:
*12 3 "4$
Entropic models are compared by their difference in entropy
which is information gain. A positive value would be a better
model, whereas a negative value shows information loss versus
the comparative model. This is given as:
% &!
: "<$
where E is the entropy calculated by Equation 1.
Support Vector Machines (SVM) classify data points by
generating and optimising a hyperplane to separate them and
classifying based on their position in comparison to the
hyperplane [18]. A model is considered optimised when the
average margins between points and the separator is at its
maximum value. Sequential Minimal Optimisation (SMO) is a
high-performing algorithm to generate and implement an SVM
classifier [19]. The large optimisation problem is broken down
into smaller subproblems, that can then be solved linearly.
Bayes’ Theorem [20] uses conditional probabilities to determine
the likelihood of Class A based on Evidence, B, as follows:
)">$ 3 "B$
For this study, evidence consists of attribute values (EEG
time-window statistics) and ground-truth training for determining
their most likely classes. A simpler version is known as
Naive Bayes, which assumes independence of attribute values
whether or not they are really unrelated. Classification of
Naive Bayes is adapted from Equation 3 as follows:
% E F
*12 : "N$
where y is the class and k is the data object (row) that is being
Logistic Regression is a symmetric statistical model used for
mapping a numerical value to a probability, ie. hours of study to
predict a student’s exam grade [21]. For a binary classification
problem with i attributes, and β model parameters, the log odds l
is given as
, %&OPQ&
and thus the corresponding
odds of outcome are therefore given as
- %&TUVW&
can be used to predict a model outcome based on previous data.
A Multilayer Perceptron is a type of Artificial Neural Network
(ANN) that predicts a class by taking input parameters and
computing them through a series of hidden layers to one or more
nodes on the final output layer. More than one hidden layer forms
a deep neural network and output layers can be different classes
or, if there is just one, a regression output. A simplified diagram
of a fully connected feed forward deep neural network can be seen
in Fig. 2. Learning is performed for a defined time and follows the
process of backpropagation [22], which is the process of deriving
a gradient that is further used to calculate weights for each node
(neuron) in the network. Training is based on reducing the error
rate given by the error function ie. the performance of a network
in terms of correct and incorrect classifications or total Euclidean
distance from the real numerical values. An error is calculated at
output and fed backwards from outputs to inputs.
3.4 Model Ensemble Methods
An ensemble combines two or more prediction models into a
single process. A method of fusion takes place to increase the
success rate of a prediction process by treating the models as a
sum of their parts.
Voting is a simple ensemble process of combining models and
allowing them to vote through a democratic or elitist process.
Each of the models are trained, and then for prediction, they
award vote v to class(es) via a specified method:
Average of probabilities; v = confidence
Majority vote; v = 1
Min/Max probability v = average confidence of all
Following the selected process, a democracy will produce an
outcome prediction as that of the class that has received the
strongest vote or set of votes.
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Random Forest forms a voting ensemble from Decision Trees
[23]. Multiple trees are generated on randomly generated subsets
of the input data (Bootstrap Aggregation) and then those trees, the
random forest, will all vote on their predicted outcome and a
prediction is derived. Adaptive Boosting is the process of creating
multiple unique instances of one type of model prediction to
effectively improve the model in situations where selected
parameters may prove ineffective [24]. Classification predictions
are combined and weighted after a process of using a random data
subset to improve on a previous iteration of a model. Combination
is given as:
^12 :"_$
where F is the set of t models and x is the data object with an
unknown class [25].
The study employs four dry extra-cranial electrodes via a
commercially available MUSE EEG headband. Microvoltage
measurements are recorded from the TP9, AF7, AF8, and TP10
electrodes, as seen in figure 3. Sixty seconds of data were
recorded from two subjects (1 male, 1 female, aged 20-22) for
each of the 6 film clips found in Table II producing 12 minutes
(720 seconds) of brain activity data (6 minutes for each emotional
state). Six minutes of neutral brainwave data were also collected
resulting in a grand total of 36 minutes of EEG data recorded from
subjects. With a variable frequency resampled to 150Hz, this
resulted in a dataset of 324,000 data points collected from the
waves produced by the brain. Activities were exclusively stimuli
that would evoke emotional responses from the set of emotions
found in Table I and were considered by their valence labels of
positive and negative rather than the emotions themselves. Neutral
data were also collected, without stimuli and before any of the
emotions data (to avoid contamination by the latter), for a third
class that would be the resting emotional state of the subject.
Three minutes of data were collected per day to reduce the
interference of a resting emotional state.
Table 2. Source of Film Clips used as Stimuli for EEG
Brainwave Data Collection
Marley and Me
Twentieth Century
Fox, etc.
Walt Disney Pictures,
My Girl
Entertainment, etc.
La La Land
Entertainment, etc.
Slow Life
BioQuest Studios
Funny Dogs
Table 3. Attribute Evaluation Methods used to Generate
Datasets for Model Training
Ranker Cutoff
No. Attributes
Participants were asked to watch the film without making any
conscious movements (eg. drinking coffee) to prevent the
influence of Electromyographic (EMG) signals on the data due to
their prominence over brainwaves in terms of signal strength. A
previous study that suggested blinking patterns are useful for
classifying mental states [2] d blinking patterns are useful for
classifying mental states [2] inspired this study to neither
encourage nor discourage unconscious movements. Observations
of the experiment showed a participant smile for a short few
seconds during the ‘funny dogs’ compilation clip, as well as
become visibly upset during the ‘Marley and Me’ film clip (death
scene). These facial expressions will influence the recorded data
but are factored into the classification model because they
accurately reflect behaviour in the real world, where these
emotional responses would also occur. Hence, to accurately model
realistic situations, both EEG and facial EMG signals are
considered as informative. To generate a dataset of statistical
features, an effective methodology from a previous study [2] was
used to extract 2400 features through a sliding window of 1
second beginning at t=0 and t=0.5. Downsampling was set to the
minimum observed frequency of 150Hz.
Feature selection algorithms were run to generate a reduced
dataset from the 2,549 source attributes. Chosen methods ranked
attributes based on their effectiveness when used in classification,
and a manual cutoff point was tuned where the score began to
drop off, therefore retaining only the strongest attributes. Details
of attribute numbers generated by each method can be seen in
Table III. The reduced dimensionality makes the classification the
classification experiments more tractable and within the remit of
given computational resources.
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Model training for each method was performed on every dataset
generated by the four methods shown in Table III. The
parameters, where required, were set to the following:
10-fold cross validation for training models (average of
10 models on 10 folds of data).
A manually tuned deep neural network of two layers, 30
and 20 neurons on each layer respectively. Backward
propagation of errors. 500 epoch training time.
All random numbers generated by the Java Virtual
Machine with a seed of 0.
Ensemble voting based on Average Probabilities.
After downsampling, there were slightly more datapoints for the
neutral state, and thus to benchmark a Zero Rules (’most common
class’) classifier would classify all points as neutral. This was
33.58% and therefore any result above this shows useful rule
Models for ensemble were selected manually based on best
performance. Voting was performed on average probabilities
using the Random Tree, SMO, BayesNet, Logistic Regression,
and MLP models. Random Forests, due to their impressive
classification ability was attempted to be optimized by the
AdaBoost Algorithm.
Results of both single and ensemble classifiers can be seen in
Table IV. The best model, a Random Forest with the Infogain
dataset, achieved a high accuracy of 97.89%. The small amount
of classification errors came from a short few seconds of the half
an hour dataset, meaning that errors could be almost completely
mitigated when classifying in real time due to the sliding window
technique used for small timeframes t-n. Adaptive boosting was
promising for all Random Forest models but could not achieve a
score higher, pointing towards the possibility of outlying points.
For single classification, the multilayer perceptron was the most
consistently best model, showing the effectiveness of neural
networks for this particular problem.
The effectiveness of OneR classification showed that a certain
best attribute (mean value of AF7) existed that alone had a
classification ability of 85.27%. The rule is specified in Fig. 4.
The normalised mean value of the time windows extracted from
the AF7 electrode when observed show that minimum and
maximum values most commonly map to negative emotions,
whereas positive and neutral are very closely related, having rules
overlapping one another. One Rule classification improved over
the Zero Rule benchmark by over 50 points, and therefore would
have been an effective attribute to consider over others when it
came to utilising more than one of the attributes in the other
The two best models in our study are compared to the state of the
art alternatives in Table V. The method of generating attributes,
attribute selection via info gain and finally classification with a
Table 4. Classification Accuracy of Single and Ensemble Methods on the Four Generated Datasets
Single Model Accuracy
Ensemble Model Accuracy
Table 5. An Indirect Comparison of this Study to Similar
Works Performed on Different Datasets
This study
InfoGain, RandomForest
Bos, et al. [3]
Fisher’s Discriminant
This study
InfoGain, MLP
Li, et al. [7]
Common Spatial Patterns
Li, et al.
Linear SVM
Zheng, et al. [9]
Deep Belief Network
Koelstra, et al. [6]
Common Spatial Patterns
Normalised mean value of the AF7 electrode:
< -460.0 -> NEGATIVE
< -436.5 -> POSITIVE
< -101.5 -> NEGATIVE
< 25.45 -> POSITIVE
< 25.85 -> NEUTRAL
< 26.25 -> POSITIVE
< 37.7 -> NEUTRAL
< 39.05 -> POSITIVE
< 43.599999999999994 -> NEUTRAL
< 63.95 -> POSITIVE
< 97.7 -> NEUTRAL
< 423.0 -> POSITIVE
>= 423.0 -> NEGATIVE
Figure 4. The most effective single rule for
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
Random Forest outperforms an FDA model by 2.99 points.
Further work should be carried out to identify whether this
improved result was due to the methods chosen or the attribute
generation and selection, or possibly both.
The high performance of simple multilayer perceptrons suggests
neural network models can be effective, especially more complex
ones such as Convolutional Neural Networks (CNNs) that have
performed well in various classification experiments [27].
Similarly, ensemble and and Bayesian models are promising
avenues that could perform better with more advanced models,
such as Dynamic Bayesian Mixture Models (DBMM) [28] that
have previously been applied to statistical data extracted from
EEG brainwave signals.
Being able to recognise emotions autonomously would be
valuable for mental-health decision support systems such as
GRiST which is a risk and safety management system used by
mental-health practitioners and by people for assessing
themselves [29], [30]. Evaluations of emotions independent of
self-reporting would help calibrate the advice as well as guiding
more sensitive interactions. The measurement of brainwaves used
in this paper is too intrusive but would be useful for providing a
benchmark for finding other more appropriate methods.
This paper explored the application of single and ensemble
methods of classification to take windowed data from four points
on the scalp and quantify that data into an emotional
representation of what the participant was feeling at that time. The
methods showed that using a low resolution, commercially
available EEG headband can be effective for classifying a
participant’s emotional state. There is considerable potential for
producing classification algorithms that have practical value for
real-world decision support systems. Responding to emotional
states can improve interaction and, for mental-health systems,
contribute to the overall assessment of issues and how to resolve
This work was partially supported by the European Commission
through the H2020 project EXCELL (https://www.excell-, grant number 691829 (A. Ekart) and by the EIT
Health GRaCEAGE grant number 18429 awarded to C. D.
[1] M. S. El-Nasr, J. Yen, and T. R. Ioerger, “Flame - fuzzy
logic adaptivemodel of emotions,” Autonomous Agents and
Multi-agent systems, vol. 3, no. 3, pp. 219257, 2000.
[2] Ding, W. and Marchionini, G. 1997. A Study on Video
Browsing Strategies. Technical Report. University of
Maryland at College Park.
[3] J. J. Bird, L. J. Manso, E. P. Ribiero, A. Ekart, and D. R.
Faria, “A study on mental state classification using eeg-based
brain-machine interface,” in 9th International Conference on
Intelligent Systems, IEEE, 2018.
[4] D. O. Bos et al., “EEG-based emotion recognition,” The
Influence of Visual and Auditory Stimuli, pp. 117, 2006.
[5] Y.-P. Lin, C.-H. Wang, T.-P. Jung, T.-L. Wu, S.-K. Jeng, J.-
R. Duann, and J.-H. Chen, EEG-based emotion recognition
in music listening,” IEEE Transactions on Biomedical
Engineering, vol. 57, no. 7, pp. 17981806, 2010.
[6] X.-W. Wang, D. Nie, and B.-L. Lu, “Emotional state
classification from eeg data using machine learning
approach,” Neurocomputing, vol. 129, pp. 94106, 2014.
[7] S. Koelstra, A. Yazdani, M. Soleymani, C. Mühl, J.-S. Lee,
A. Nijholt, T. Pun, T. Ebrahimi, and I. Patras, “Single trial
classification of eeg and peripheral physiological signals for
recognition of emotions induced by music videos,” in Int.
Conf. on Brain Informatics, pp. 89100, Springer, 2010.
[8] M. Li and B.-L. Lu, “Emotion classification based on
gamma-band eeg,” in Engineering in medicine and biology
society, 2009. EMBC 2009. Annual international conference
of the IEEE, pp. 12231226, IEEE, 2009.
[9] M. Abujelala, C. Abellanoza, A. Sharma, and F. Makedon,
“Brainee: Brain enjoyment evaluation using commercial eeg
headband,” in Proceedings of the 9th acm international
conference on pervasive technologies related to assistive
environments, p. 33, ACM, 2016.
[10] W.-L. Zheng, J.-Y. Zhu, Y. Peng, and B.-L. Lu, “Eeg-based
emotion classification using deep belief networks,” in
Multimedia and Expo (ICME), 2014 IEEE International
Conference on, pp. 16, IEEE, 2014.
[11] J. J. Bird, A. Ekárt, and D. R. Faria, “Learning from
interaction: An intelligent networked-based human-bot and
bot-bot chatbot system,” in UK Workshop on Computational
Intelligence, pp. 179190, Springer, 2018.
[12] B. E. Swartz, “The advantages of digital over analog
recording techniques,” Electroencephalography and clinical
neurophysiology, vol. 106, no. 2, pp. 113117, 1998.
[13] A. Coenen, E. Fine, and O. Zayachkivska, “Adolf beck: A
forgotten pioneer in electroencephalography,” Journal of the
History of the Neurosciences, vol. 23, pp. 276286, 2014.
[14] A. K. Shah and S. Mittal, “Invasive electroencephalography
monitoring: Indications and presurgical planning,” Annals of
Indian Academy of Neurology, vol. 17, pp. S89, 2014.
[15] B. A. Taheri, R. T. Knight, and R. L. Smith, “A dry electrode
for eeg recording,” Electroencephalography and clinical
neurophysiology, vol. 90, no. 5, pp. 376383, 1994.
[16] K. Oatley and J. M. Jenkins, Understanding emotions.
Blackwell publishing, 1996.
[17] H. Lövheim, “A new three-dimensional model for emotions
and monoamine neurotransmitters,” Medical hypotheses, vol.
78, no. 2, pp. 341348, 2012
[18] J. Gruzelier, “A theory of alpha/theta neurofeedback, creative
performance enhancement, long distance functional
connectivity and psychological integration,” Cognitive
processing, vol. 10, no. 1, pp. 101109, 2009.
[19] C. Cortes and V. Vapnik, “Support-vector networks,”
Machine learning, vol. 20, no. 3, pp. 273297, 1995.
[20] J. Platt, “Sequential minimal optimization: A fast algorithm
for training support vector machines,” 1998.
DISP '19, Oxford, United Kingdom
ISBN: 978-1-912532-09-4
[21] T. Bayes, R. Price, and J. Canton, An essay towards solving
a problem in the doctrine of chances,” 1763.
[22] S. H. Walker and D. B. Duncan, “Estimation of the
probability of an event as a function of several independent
variables,” Biometrika, vol. 54, no. 1-2, pp. 167179, 1967.
[23] Y. Bengio, I. J. Goodfellow, and A. Courville, “Deep
learning,” Nature, vol. 521, no. 7553, pp. 436444, 2015.
[24] T. K. Ho, “Random decision forests,” in Document analysis
and recognition, 1995., proceedings of the third international
conference on, vol. 1, pp. 278282, IEEE, 1995.
[25] Y. Freund and R. E. Schapire, “A decision-theoretic
generalization of on-line learning and an application to
boosting,” Journal of computer and system sciences, vol. 55,
no. 1, pp. 119139, 1997.
[26] R. Rojas, “Adaboost and the super bowl of classifiers a
tutorial introduction to adaptive boosting,” Freie University,
Berlin, Tech. Rep, 2009
[27] H. H. Jasper, “The ten-twenty electrode system of the
international federation,” Electroencephalogr. Clin.
Neurophysiol., vol. 10, pp. 370375, 1958.
[28] M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn
transfer learning for image classification,” in UK Workshop
on Computational Intelligence, pp. 191202, Springer, 2018.
[29] D. R. Faria, M. Vieira, C. Premebida, and U. Nunes,
“Probabilistic human daily activity recognition towards
robot-assisted living,” in Robot and Human Interactive
Communication (RO-MAN), 2015 24th IEEE International
Symposium on, pp. 582587, IEEE, 2015.
[30] C. D. Buckingham, A. Ahmed, and A. Adams, “Designing
multiple user perspectives and functionality for clinical
decision support systems,” pp. 211218, 2013.
[31] C. D. Buckingham, A. Adams, L. Vail, A. Kumar, A.
Ahmed, A. Whelan, and E. Karasouli, “Integrating service
user and practitioner expertise within a web-based system for
collaborative mental-health risk and safety management,”
Patient Education and Counseling, pp. 1189-1196, 2015.
... It enables the responses of software applications to be adapted to the emotional states of the end-users [4] . Unfortunately, many methods of emotion recognition focus on a single modality (speech, facial expression, posture, electroencephalograph (EEG), etc.) [5] . This greatly limits the accuracy of the emotion recognition task [3] . ...
... In this paper, we compare the recognition performance of Early Fusion, Hybrid Fusion, and 3) We systematically compare the recognition performance of these DL models on the RAVDESS audio-visual dataset [14] and an EEG dataset from [5]. been applied to explore these traditional approaches amongst others, for effective data fusion. ...
... The RAVDESS dataset was developed by Livingstone and Russo [14] . The EEG features were already extracted and pre-processed by the developers [5] , so no further pre-processing was done. ...
Full-text available
Multimodal emotion recognition is a robust and reliable method as it utilizes multimodal data for more comprehensive representation of emotions. Data fusion is a key step in multimodal emotion recognition, because the accuracy of the recognition model mostly depends on how the different modalities are combined. The goal of this paper is to compare the performances of deep learning (DL) based models for the task of data fusion and multimodal emotion recognition. The contributions of this paper are two folds: 1) We introduce three DL models for multimodal fusion and classification: early fusion, hybrid fusion, and multi-task learning. 2) We systematically compare the performance of these models on three multimodal datasets. Our experimental results demonstrate that multi-task learning achieves the best results across all modalities; 75.41%, 68.33%, and 78.75% for classification of three emotional states using the combinations of audio-visual, EEG-audio, and EEG-visual data, respectively.
... In this section of our experiment, we used a publicly available dataset [43] that contains EEG brainwave data collected by MUSE EEG headband with a resolution of TP9, AF7, AF8, TP10 electrodes. Labeling was performed by film clips with an obvious valence and includes positive and negative emotional states and neutral resting data [44,45]. The participants were one male and one female individual, and the data were collected for 3 minutes per state. ...
Full-text available
Clustering is an attractive method to handle large-scale data which are explosively generated through digitization. This approach is specifically appropriate when labeling is very costly. In this paper, we constructed an unsupervised learning algorithm and focused on a finite mixture model based on multivariate Beta distribution. Our motivation is the flexibility and high potential that this distribution offers in modeling data. To learn this mixture model, we used an expectation propagation inference framework in which the parameters and the complexity of the model were evaluated concurrently in a single optimization framework. We evaluated the performance of our framework on publicly available datasets related to forgery detection, EEG-based sentiment analysis and human activity recognition. Our proposed model demonstrates comparable results to similar alternatives.
... lab setting). Furthermore, while many studies with BCIs focus on comparatively easy outcome variables (e.g., moving a mouse cursor), recent studies have used machine learning algorithms to predict emotional states from EEG data, achieving an accuracy of up to 97% (Bird, Ekart, Buckingham, & Faria, 2019), and to classify certain neuropsychological disorders (Vanneste, Song, & De Ridder, 2018). More relevant to the topic of this article is a report by Dryburgh, McKenna, and Rekik (2020), who used data from 226 male participants to estimate the relationship between resting-state functional MRI and intelligence. ...
Intelligence is one of the most important psychological constructs and influences many decisions. Unsurprisingly, a large number of measurement instruments are available. However, conceptual development related to intelligence has been stagnant for many years despite recent technological trends that would enable new approaches to assessing human intelligence. One such approach would be to develop intelligence tests in virtual-reality scenarios, enabling researchers to observe how people interact with problems to solve them. Furthermore, artificial intelligence and machine learning could be used to gain even more insights from test data or use data arising from people's everyday lives to predict intelligence. Endeavors to assess intelligence without tests may eventually also lead to approaches using physiological variables related to the brain to make predictions. This article proposes several visions of plausible future developments in intelligence assessment over the coming decades and examines potential problems that might arise with these new methods.
... However, as shown in Figure 5, the left frontal electrodes, especially channel AF7, appeared to be particularly discriminative for the two emotions. The relevance of the electrode AF7 is in line with previous studies using MUSE EEG headband comprising four channels (TP9, AF7, AF8, TP10) for emotion classification purposes, highlighting the importance of channel AF7 to accurately distinguish between mental states (Bird et al., 2019;Raheel et al., 2019). Interesting to notice is that channels from the right frontal hemisphere did not appear to be as discriminative as their left-sided counterparts, both for time-frequency and correlational features, as shown in Table 3 and Figure 7, respectively. ...
Full-text available
During the last decades, neurofeedback training for emotional self-regulation has received significant attention from scientific and clinical communities. Most studies have investigated emotions using functional magnetic resonance imaging (fMRI), including the real-time application in neurofeedback training. However, the electroencephalogram (EEG) is a more suitable tool for therapeutic application. Our study aims at establishing a method to classify discrete complex emotions (e.g., tenderness and anguish) elicited through a near-immersive scenario that can be later used for EEG-neurofeedback. EEG-based affective computing studies have mainly focused on emotion classification based on dimensions, commonly using passive elicitation through single-modality stimuli. Here, we integrated both passive and active elicitation methods. We recorded electrophysiological data during emotion-evoking trials, combining emotional self-induction with a multimodal virtual environment. We extracted correlational and time-frequency features, including frontal-alpha asymmetry (FAA), using Complex Morlet Wavelet convolution. Thinking about future real-time applications, we performed within-subject classification using 1-s windows as samples and we applied trial-specific cross-validation. We opted for a traditional machine-learning classifier with low computational complexity and sufficient validation in online settings, the Support Vector Machine. Results of individual-based cross-validation using the whole feature sets showed considerable between-subject variability. The individual accuracies ranged from 59.2 to 92.9% using time-frequency/FAA and 62.4 to 92.4% using correlational features. We found that features of the temporal, occipital, and left-frontal channels were the most discriminative between the two emotions. Our results show that the suggested pipeline is suitable for individual-based classification of discrete emotions, paving the way for future personalized EEG-neurofeedback training.
There has been a sudden increase in demand for algorithms or models to correctly and accurately identify human emotions. The conformity for machines has come a long way from when smart machines capable of reaching a decision on their own were all that was expected of them, to machines capable of understanding what goes on in a person’s brain. Such autonomous agents can prove to be helpful not only in developing smarter machines but also in the field of medicine. Early prediction or recognizing brainwave patterns for epilepsy, seizures, manic depression, etc. is a key to achieve faster aid responses or prevention. In our work, we are limiting our focus to the most common practices used by researchers in this field, which is to obtain the electroencephalogram or EEG data, extract features and implement a classification algorithm. However, we are also trying to capitalize upon the massive improvements made in the field of supervised and unsupervised learning. The robust depth-wise separable convolution architecture called Xception has been implemented in this study to observe its performance as a feature extractor to the notoriously mutating EEG data. The EEG dataset being used in this study is open source. It is available in Kaggle and has three classes, namely positive, negative and neutral. We are implementing wavelet transform along with the Xception architecture to extract features from the dataset which are then classified using support vector machine. We achieve stellar results as a performance score of 98% can be observed for the measures accuracy, precision, recall as well as F1 score.
Full-text available
In modern Human-Robot Interaction, much thought has been given to accessibility regarding robotic locomotion, specifically the enhancement of awareness and lowering of cognitive load. On the other hand, with social Human-Robot Interaction considered, published research is far sparser given that the problem is less explored than pathfinding and locomotion. This thesis studies how one can endow a robot with affective perception for social awareness in verbal and non-verbal communication. This is possible by the creation of a Human-Robot Interaction framework which abstracts machine learning and artificial intelligence technologies which allow for further accessibility to non-technical users compared to the current State-of-the-Art in the field. These studies thus initially focus on individual robotic abilities in the verbal, non-verbal and multimodality domains. Multimodality studies show that late data fusion of image and sound can improve environment recognition, and similarly that late fusion of Leap Motion Controller and image data can improve sign language recognition ability. To alleviate several of the open issues currently faced by researchers in the field, guidelines are reviewed from the relevant literature and met by the design and structure of the framework that this thesis ultimately presents. The framework recognises a user's request for a task through a chatbot-like architecture. Through research in this thesis that recognises human data augmentation (paraphrasing) and subsequent classification via language transformers, the robot's more advanced Natural Language Processing abilities allow for a wider range of recognised inputs. That is, as examples show, phrases that could be expected to be uttered during a natural human-human interaction are easily recognised by the robot. This allows for accessibility to robotics without the need to physically interact with a computer or write any code, with only the ability of natural interaction (an ability which most humans have) required for access to all the modular machine learning and artificial intelligence technologies embedded within the architecture. Following the research on individual abilities, this thesis then unifies all of the technologies into a deliberative interaction framework, wherein abilities are accessed from long-term memory modules and short-term memory information such as the user's tasks, sensor data, retrieved models, and finally output information. In addition, algorithms for model improvement are also explored, such as through transfer learning and synthetic data augmentation and so the framework performs autonomous learning to these extents to constantly improve its learning abilities. It is found that transfer learning between electroencephalographic and electromyographic biological signals improves the classification of one another given their slight physical similarities. Transfer learning also aids in environment recognition, when transferring knowledge from virtual environments to the real world. In another example of non-verbal communication, it is found that learning from a scarce dataset of American Sign Language for recognition can be improved by multi-modality transfer learning from hand features and images taken from a larger British Sign Language dataset. Data augmentation is shown to aid in electroencephalographic signal classification by learning from synthetic signals generated by a GPT-2 transformer model, and, in addition, augmenting training with synthetic data also shows improvements when performing speaker recognition from human speech. Given the importance of platform independence due to the growing range of available consumer robots, four use cases are detailed, and examples of behaviour are given by the Pepper, Nao, and Romeo robots as well as a computer terminal. The use cases involve a user requesting their electroencephalographic brainwave data to be classified by simply asking the robot whether or not they are concentrating. In a subsequent use case, the user asks if a given text is positive or negative, to which the robot correctly recognises the task of natural language processing at hand and then classifies the text, this is output and the physical robots react accordingly by showing emotion. The third use case has a request for sign language recognition, to which the robot recognises and thus switches from listening to watching the user communicate with them. The final use case focuses on a request for environment recognition, which has the robot perform multimodality recognition of its surroundings and note them accordingly. The results presented by this thesis show that several of the open issues in the field are alleviated through the technologies within, structuring of, and examples of interaction with the framework. The results also show the achievement of the three main goals set out by the research questions; the endowment of a robot with affective perception and social awareness for verbal and non-verbal communication, whether we can create a Human-Robot Interaction framework to abstract machine learning and artificial intelligence technologies which allow for the accessibility of non-technical users, and, as previously noted, which current issues in the field can be alleviated by the framework presented and to what extent.
Full-text available
The goal of this study was to investigate the effect of audio listened to through headphones on subjectively reported human focus levels, and to identify through objective measures the properties that contribute most to increasing and decreasing focus in people within their regular, everyday environment. Participants ( N = 62, 18–65 years) performed various tasks on a tablet computer while listening to either no audio (silence), popular audio playlists designed to increase focus (pre-recorded music arranged in a particular sequence of songs), or engineered soundscapes that were personalized to individual listeners (digital audio composed in real-time based on input parameters such as heart rate, time of day, location, etc.). Audio stimuli were delivered to participants through headphones while their brain signals were simultaneously recorded by a portable electroencephalography headband. Participants completed four 1-h long sessions at home during which different audio played continuously in the background. Using brain-computer interface technology for brain decoding and based on an individual’s self-report of their focus, we obtained individual focus levels over time and used this data to analyze the effects of various properties of the sounds contained in the audio content. We found that while participants were working, personalized soundscapes increased their focus significantly above silence ( p = 0.008), while music playlists did not have a significant effect. For the young adult demographic (18–36 years), all audio tested was significantly better than silence at producing focus ( p = 0.001–0.009). Personalized soundscapes increased focus the most relative to silence, but playlists of pre-recorded songs also increased focus significantly during specific time intervals. Ultimately we found it is possible to accurately predict human focus levels a priori based on physical properties of audio content. We then applied this finding to compare between music genres and revealed that classical music, engineered soundscapes, and natural sounds were the best genres for increasing focus, while pop and hip-hop were the worst. These insights can enable human and artificial intelligence composers to produce increases or decreases in listener focus with high temporal (millisecond) precision. Future research will include real-time adaptation of audio for other functional objectives beyond affecting focus, such as affecting listener enjoyment, drowsiness, stress and memory.
Real-time emotion recognition with electroencephalograph (EEG) has been an active field of research in recent years. In particular, deep learning has been shown to be effective in emotion classification tasks. However, the monitoring of EEG signals is a continuous process, there is a need for energy-efficient emotion classification methods. Compared with artificial neural networks (ANNs), spiking neural networks (SNNs), in which weight multiplications are replaced by additions, are more energy efficient. In this paper, we propose a near-lossless transfer learning method for SNNs, specially designed for EEG signals. Data is preprocessed, and its power spectral density (PSD) is extracted to represent the frequency domain of the raw EEG signal. Using a 3-layer pretrained SNN, running on the DEAP dataset, we achieved an accuracy of 78.87% and 76.5% for valence and arousal dimensions, respectively. By training a model based on one dimension and fine-tuning on another, we even achieve higher accuracy, 82.75% for the valence and 84.22% for the arousal. As far as we know, our results yield the smallest SNN with the highest accuracy for this task to date. The energy power of our SNNs for valence and arousal dimensions is 13.8% that of our CNN-based solutions. The framework was developed by PyTorch and is available under an open-source license.
Full-text available
Having measurable physiological correlates, hypnosis should be measurable generally itself. The precise, continual, quantitative assessment (versus phenomenological one) of a current trance level (i.e., "depth") is possible only instrumentally. We've shown that electrophysiological patterns of a trance are stable from session to session, but significantly vary among subjects. Hence, to measure the trance level individually we proposed the following Brain-Computer interface approach and tested it on the 27 video-EEG recordings of 8 outpatients with anxiety and depressive disorders: on the data of the first session using Common Spatial Pattern filtering and Linear Discriminant Analysis classification, we trained the predictive models to discriminate conditions of "a wakefulness" and "a deep trance" and applied them to the subsequent sessions to predict the deep trance probability (in fact, to measure the trance level). We obtained integrative individualized continuously changing parameter reflecting the hypnosis level graphically online, providing the trance microdynamics control. The classification accuracy was high, especially while filtering the signal in 1.5-14 and 4-15 Hz. The applications and perspectives are being discussed.
Conference Paper
Full-text available
This work aims to find discriminative EEG-based features and appropriate classification methods that can categorise brainwave patterns based on their level of activity or frequency for mental state recognition useful for human-machine interaction. By using the Muse headband with four EEG sensors (TP9, AF7, AF8, TP10), we categorised three possible states such as relaxing, neutral and concentrating based on a few states of mind defined by cognitive behavioural studies. We have created a dataset with five individuals and sessions lasting one minute for each class of mental state in order to train and test different methods. Given the proposed set of features extracted from the EEG headband five signals (alpha, beta, theta, delta, gamma), we have tested a combination of different features selection algorithms and classifier models to compare their performance in terms of recognition accuracy and number of features needed. Different tests such as 10-fold cross validation were performed. Results show that only 44 features from a set of over 2100 features are necessary when used with classical classifiers such as Bayesian Networks, Support Vector Machines and Random Forests, attaining an overall accuracy over 87%.
Conference Paper
Full-text available
In this paper we propose an approach to a chatbot software that is able to learn from interaction via text messaging between human-bot and bot-bot. The bot listens to a user and decides whether or not it knows how to reply to the message accurately based on current knowledge, otherwise it will set about to learn a meaningful response to the message through pattern matching based on its previous experience. Similar methods are used to detect offensive messages, and are proved to be effective at overcoming the issues that other chatbots have experienced in the open domain. A philosophy of giving preference to too much censorship rather than too little is employed given the failure of Microsoft Tay. In this work, a layered approach is devised to conduct each process, and leave the architecture open to improvement with more advanced methods in the future. Preliminary results show an improvement over time in which the bot learns more responses. A novel approach of message simplification is added to the bot’s architecture, the results suggest that the algorithm has a substantial improvement on the bot’s conversational performance at a factor of three.
Conference Paper
Full-text available
Many image classification models have been introduced to help tackle the foremost issue of recognition accuracy. Image classification is one of the core problems in Computer Vision field with a large variety of practical applications. Examples include: object recognition for robotic manipulation, pedestrian or obstacle detection for autonomous vehicles, among others. A lot of attention has been associated with Machine Learning, specifically neural networks such as the Convolutional Neural Network (CNN) winning image classification competitions. This work proposes the study and investigation of such a CNN architecture model (i.e. Inception-v3) to establish whether it works best in terms of accuracy and efficiency with new image datasets via Transfer Learning. The retrained model is evaluated, and the results are compared to some state-of-the-art approaches.
Conference Paper
Full-text available
Previous studies that involve measuring EEG, or electroencephalograms, have mainly been experimentally-driven projects; for instance, EEG has long been used in research to help identify and elucidate our understanding of many neuroscientific, cognitive, and clinical issues (e.g., sleep, seizures, memory). However, advances in technology have made EEG more accessible to the population. This opens up lines for EEG to provide more information about brain activity in everyday life, rather than in a laboratory setting. To take advantage of the technological advances that have allowed for this, we introduce the Brain-EE system, a method for evaluating user engaged enjoyment that uses a commercially available EEG tool (Muse). During testing, fifteen participants engaged in two tasks (playing two different video games via tablet), and their EEG data were recorded. The Brain-EE system supported much of the previous literature on enjoyment; increases in frontal theta activity strongly and reliably predicted which game each individual participant preferred. We hope to develop the Brain-EE system further in order to contribute to a wide variety of applications (e.g., usability testing, clinical or experimental applications, evaluation methods, etc.).
Conference Paper
Full-text available
In this work, we present a human-centered robot application in the scope of daily activity recognition towards robot-assisted living. Our approach consists of a probabilistic ensemble of classifiers as a dynamic mixture model considering the Bayesian probability, where each base classifier contributes to the inference in proportion to its posterior belief. The classification model relies on the confidence obtained from an uncertainty measure that assigns a weight for each base classifier to counterbalance the joint posterior probability. Spatio-temporal 3D skeleton-based features extracted from RGB-D sensor data are modeled in order to characterize daily activities, including risk situations (e.g.: falling down, running or jumping in a room). To assess our proposed approach, state-of-the-art datasets such as MSR-Action3D Dataset and MSR-Activity3D Dataset [1] are used to compare the results with other recent methods. Reported results on test datasets show that our proposed approach outperforms state-of-the-art methods in terms of precision, recall, and overall accuracy. Moreover, we also validated our framework running on-the-fly in a mobile robot with an RGB-D sensor to identify daily activities for a robot-assisted living application.
Conference Paper
Full-text available
In recent years, there are many great successes in using deep architectures for unsupervised feature learning from data, especially for images and speech. In this paper, we introduce recent advanced deep learning models to classify two emotional categories (positive and negative) from EEG data. We train a deep belief network (DBN) with differential entropy features extracted from multichannel EEG as input. A hidden markov model (HMM) is integrated to accurately capture a more reliable emotional stage switching. We also compare the performance of the deep models to KNN, SVM and Graph regularized Extreme Learning Machine (GELM). The average accuracies of DBN-HMM, DBN, GELM, SVM, and KNN in our experiments are 87.62%, 86.91%, 85.67%, 84.08%, and 69.66%, respectively. Our experimental results show that the DBN and DBN-HMM models improve the accuracy of EEG-based emotion classification in comparison with the state-of-the-art methods.
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Objectives: To develop a decision support system (DSS), myGRaCE, that integrates service user (SU) and practitioner expertise about mental health and associated risks of suicide, self-harm, harm to others, self-neglect, and vulnerability. The intention is to help SUs assess and manage their own mental health collaboratively with practitioners. Methods: An iterative process involving interviews, focus groups, and agile software development with 115 SUs, to elicit and implement myGRaCE requirements. Results: Findings highlight shared understanding of mental health risk between SUs and practitioners that can be integrated within a single model. However, important differences were revealed in SUs' preferred process of assessing risks and safety, which are reflected in the distinctive interface, navigation, tool functionality and language developed for myGRaCE. A challenge was how to provide flexible access without overwhelming and confusing users. Conclusion: The methods show that practitioner expertise can be reformulated in a format that simultaneously captures SU expertise, to provide a tool highly valued by SUs. A stepped process adds necessary structure to the assessment, each step with its own feedback and guidance. Practice Implications: The GRiST web-based DSS ( links and integrates myGRaCE self-assessments with GRiST practitioner assessments for supporting collaborative and self-managed healthcare.