ArticlePDF Available

Abstract and Figures

This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: one for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations.
This content is subject to copyright. Terms and conditions apply.
Research Article
A Deep Evolutionary Approach to Bioinspired Classifier
Optimisation for Brain-Machine Interaction
Jordan J. Bird , Diego R. Faria, Luis J. Manso,
Anikó Ekárt, and Christopher D. Buckingham
School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK
Correspondence should be addressed to Jordan J. Bird; birdj@aston.ac.uk
Received 14 December 2018; Accepted 21 February 2019; Published 13 March 2019
Academic Editor: Danilo Comminiello
Copyright ©  Jordan J. Bird et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
is study suggests a new approach to EEG data classication by exploring the idea of using evolutionary computation to both select
useful discriminative EEG features and optimise the topology of Articial Neural Networks. An evolutionary algorithm is applied
to select the most informative features from an initial set of  EEG statistical features. Optimisation of a Multilayer Perceptron
(MLP) is performed with an evolutionary approach before classication to estimate the best hyperparameters of the network.
Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of
models is tested for each problem. ree experiments are provided for comparison using dierent classiers: one for attention
state classication, one for emotional sentiment classication, and a third experiment in which the goal is to guess the number a
subject is thinking of. e obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of .%, .%, and
.% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to
the Adaptive Boosted LSTM for the two rst experiments and signicantly higher for the number-guessing experiment with an
Adaptive Boosted DEvo MLP reaching .%, while being signicantly quicker to train and classify. In particular, the accuracy
of the nonboosted DEvo MLP was of .%, .%, and .% in the same benchmarks. Two datasets for the experiments were
gathered using a Muse EEG headband with four electrodes corresponding to TP, AF, AF, and TP locations of the international
EEG placement standard. e EEG MindBigData digits dataset was gathered from the TP, FP, FP, and TP locations.
1. Introduction
Bioinspired algorithms have been extensively used as robust
and ecient optimisation methods. Despite the fact that
they have been criticised for being computationally expen-
sive, they have also been proven useful to solve complex
optimisation problems. With the increasing availability of
computing resources, bioinspired algorithms are growing in
popularity due to their eectiveness at optimising complex
problem solutions. Scientic studies of natural optimisation
from many generations past, such as Darwinian evolution,
are now becoming viable inspiration for solving real-world
problems.
is increasing resource availability is also allowing for
more complex computing in applications such as Internet of
ings (IoT), Human-Robot Interaction (HRI), and human-
computer interaction (HCI), providing more degrees of both
control and interaction to the user. One of these degrees of
control is the source of all others, the human brain, and it can
be observed using electroencephalography. At its beginning
EEG was an invasive and uncomfortable method, but with
the introduction of dry, commercial electrodes, EEG is now
fully accessible even outside of laboratory setups.
It has been noted that a large challenge in brain-machine
interaction is inferring the attentional and emotional states
from particular patterns and behaviours of electrical brain
activity. Large amounts of data are needed to be acquired
from EEG, since the signals are complex, nonlinear, and
nonstationary. To generate discriminative features to describe
a wave requires the statistical analysis of time window inter-
vals. is study focuses on bringing together previous related
research and improving the state-of-the-art with a Deep
Evolutionary (DEvo) approach when optimising bioinspired
classiers. e application of this study allows for a whole
Hindawi
Complexity
Volume 2019, Article ID 4316548, 14 pages
https://doi.org/10.1155/2019/4316548
Complexity
bioinspired and optimised approach for mental attention
classication and emotional state classication and to guess
the number in which a subject thinks of. ese states can then
be taken forward as states of control in, for example, human-
robot interaction.
In addition to the experimental results, the contributions
of the work presented in this paper are as follows:
(i) An eective framework for classication of complex
signals (brainwave data) through processes of evolu-
tionary optimisation and bioinspired classication.
(ii) A new evolutionary approach to hyperheuristic bioin-
spired classiers to prevent convergence on local
minima found in the EEG feature space.
(iii) To gain close to identical accuracies, and in one
case exceeding them, with resource-intensive deep
learning through the optimised processes found in
nature.
e remainder of this article proceeds as follows: Sec-
tion  provides an exploration of the state-of-the-art works
related to this study, briey introducing the most relevant
concepts applied into the DEvo approach to machine learning
with electroencephalographic data. Section  describes the
methods used to perform the experiments performed. e
results of the experiments, including graphical representa-
tionsofresultsanddiscussionofimplications,arepresented
in Section . Section  details the conclusions extracted from
the experiments and the suggested future work.
2. Background
2.1. Electroencephalography and Machine Learning with EEG.
Electroencephalography, or EEG, is the measurement and
recording of electrical activity produced by the brain [].
e collection of EEG data is carried out through the use
of applied electrodes, which read minute electrophysiological
currents produced by the brain due to nervous oscillation
[, ]. e most invasive form of EEG is subdural [] in which
electrodesareplaceddirectlyonthebrainitself.Farless
invasive techniques require electrodes to be placed around
the cranium, of which the disadvantage is that signals are
beingreadthroughthethickboneoftheskull[].Raw
electrical data is measured in microVolts (uV), which over
time produce wave patterns.
L¨
ovheim’s  study produced a new three-dimensional
way of graphically representing human emotion in terms of
categories and hormone levels []. is graphical represen-
tationcanbeseeninFigure,withexpositionofemotional
categories found in Table . Each vertex of the cube represents
a centroid of an emotional category. It is worth noting that
categories are not completely concrete, and that emotions
are experienced in gradient, as well as overlapping between
categories []. It is this chemical composition that causes
certain nervous oscillation and thus electrical brainwave
activity []. us, the brainwave activity can be used as data
to estimate human emotions.
e Muse headband is a commercially available EEG
recording device with four electrodes placed on the TP,
Arousal/
Noradrenaline
Self-confidence/
Serotonin
Reinforcement/
Dopamine
AB
CD
EF
GH
F : L¨
ovheim’s cube: mapping levels of noradrenaline,
dopamine, and serotonin to human emotion.
Nasion (front)
Inion (back)
NZ
FP1 FP2
AF7 AF8
TP9 TP10
F7 F8
Fz
F3 F4
CzC3 C4
T3 T4
Pz
P3 P4
T5 T6
O1 O2
F : EEG sensors TP, AF, AF, and TP of the Muse
headband on the international standard EEG placement system [].
AF,AF,andTPpositionsbasedontheinternationalEEG
placement system []. ese can be seen in Figure . Because
the signals are quite weak in nature, signal noise is a major
issue due to it eectively masking the useful information
[]. e EEG headband employs various artefact separation
techniques to best retain the brainwave data and discard
unwantednoise[].Previously,theheadbandhasbeenused
along with machine learning techniques to measure dierent
levels of user enjoyment, treating it as a gradient much like in
sentiment analysis projects, researchers successfully managed
to measure dierent levels of a user’s enjoyment [, ]
while playing mobile phone games. Muse headbands are also
oen used in neuroscience research projects due to their
low-cost and ease of deployment (since they are a consumer
product), as well as its eectiveness in terms of classication
and accuracy []. In this experiment, binary classication
Complexity
of two physical tasks achieved % accuracy using Bayesian
probability methods.
Previous work with the Muse headband used classical
and ensemble machine learning techniques to accurately
classify both mental [] and emotional [] states based on
datasets generated by statistical extraction. e application of
statistical extraction as a form of data preprocessing is useful
across many platforms, e.g., for semantic place recognition
in human-robot interaction [, ]. Machine learning tech-
niques with inputs being that of statistical features of the wave
are commonly used to classify mental states [, ] for brain-
machine interaction, where states are used as dimensions
of user input. Probabilistic methods such as Deep Belief
Networks, Support Vector Machines, and various types of
neural network have been found to experience varying levels
of success in emotional state classication, particularly in
binary classication [].
EEG brainwave data classication is a contemporary
focus in the medical elds; abnormalities in brainwave
activity have been successfully classied as those leading to
a stroke using a Random Forest classication method [].
In addition to the detection of a stroke, researchers also
found that monitoring classied brain activity aided suc-
cessfully with rehabilitation of motor functions aer stroke
when coupled with human-robot interaction []. Brainwave
classication has also been very successful in the preemptive
detection of epileptic seizures in both adults and newborn
infants [, ]. e classication of minute parts of the sleep-
wake cycle is also a focus of medical researchers in terms
of EEG data mining. Low resolution, three-state (awake,
sleep, and REM sleep) EEG data was classied with Bayesian
methods to a high accuracy of -% in both humans
andratsusingidenticalmodels[],bothshowingtheease
of classication of these states as well as the cross-domain
application between human and rat brains. Random Forest
classication of an extracted set of statistical EEG attributes
could classify sleeping patterns with higher resolution than
that of the previous study at around % accuracy [].
It is worth noting that for a real-time averaging technique
(prediction of a time series of, for example, every  second),
only majority classication accuracies at >% would be
required, though the time series could be trusted at shorter
lengths with better results from the model.
Immune Clonal Algorithm, or ICA, has been suggested
as a promising method for EEG brainwave feature extrac-
tion through the generation of mathematical temporal wave
descriptors []. is approach found success in classication
of epileptic brain activity through generated features as
inputs to Naive Bayes, Support Vector Machine, K-Nearest
Neighbours, and Linear Discriminant Analysis classiers.
Autonomous classication through aective computing
in human-machine interaction is a very contemporary area
of research due to the increasing amounts of computa-
tional resources available, including, but not limited to,
facial expression recognition [], Sentiment Analysis [],
human activity recognition [, ], and human behaviour
recognition [, ]. In terms of social human-machine
interaction, a Long Short-Term Memory network was found
to be extremely useful in user text analysis to derive an
T : Exposition of emotional categories of Figure .
Emotion Category Emotion
AShame
Humiliation
BContempt
Disgust
CFear
Terror
DEnjoyment
Joy
EDistress
Anguish
F Surprise
GAnger
Rage
HInterest
Excitement
aective sentiment based on negative and positive polarities
[] and was used in the application of a chatbot.
2.2. Evolutionary Algorithms. An evolutionary algorithm will
search a problem space inspired by the natural process of
Darwinian evolution []. Solutions are treated as living
organisms that, as a population, will produce more ospring
that can survive. Where each solution has a measurable
tness,asurvival of the ttest will occur, causing the weaker
solutions to be killed o and allowing for the stronger to
survive []. e evolutionary search in its simplest form will
follow this process:
() Create an initial random population solution
() Simulate the following until termination occurs:
(a) Using a chosen method, select parent(s) for use
in generating ospring(s)
(b) Evaluate the ospring’s tness
(c) Consider the whole population, and kill o the
weakest members
e aforementioned algorithm is oen used to decide on
network parameters [] since there is “no free lunch” []
when it comes to certain types of optimisation problems.
In particular, it has been demonstrated that the problem of
searching for the optimal parameters for a neural network
cannot be solved in polynomial time [].
2.3. Multilayer Perceptron. AMultilayerPerceptronisatype
of Articial Neural Network (ANN) that can be used as a
universal function approximator and classier. It computes
a number of inputs through a series of layers of neurons,
nally outputting a prediction of class or real value. More than
one hidden layer forms a deep neural network. Output nodes
aretheclassesusedforclassicationwithasomax(single)
choice, or, if there is just one a regression output (e.g., stock
price prediction in GBP).
Complexity
Learning is performed for a dened time measured in
epochsandfollowstheprocessofbackpropagation [].
Backpropagation is a case of automatic dierentiation in
which errors in classication or regression (when comparing
outputs of a network to ground truths) are passed backwards
from the nal layer, to derive a gradient which is then used
to calculate neuron weights within the network, dictating
their activation. at is, a gradient descent optimisation
algorithm is employed for the calculation of neuron weights
by computing the gradient of the loss function (error rate).
Aer learning, a more optimal neural network is generated
whichisemployedasafunctiontobestmapinputstooutputs
or attributes to class.
e process of weight renement for the set training time
is given as follows:
() Generate the structure of the network based on input
nodes, dened hidden layers, and required outputs.
()Initialiseallofthenodeweightsrandomly.
() Pass the inputs through the network and generate
predictions as well as cost (errors).
() Compute gradients.
() Backpropagate errors and adjust neuron weights.
Errorscanbecalculatedinnumerousways,e.g.,distancein
Euclidean or non-Euclidean space for regression. In classi-
cation problems, entropy is oen used, that is, the level of
randomness or predictability for the classication of a set:
()=−
𝑗𝑗×log 𝑗()
Comparing the dierence of two measurements of entropy
(two models) gives the information gain (relative entropy).
is is the value of the Kullback-Leibler (KL) divergence
when a univariate probability distribution of a given attribute
is compared to another []. e calculation with the entropy
algorithminmindisthussimplygivenas
(,)=()−(|)()
A positive information gain denotes a lower error rate and
thus a better model, i.e., a more improved matrix of network
weights.
Denser is a related novel method of evolutionary opti-
misation of an MLP []. Whereas this study focuses on
the search space of layer structure within fully connected
neural networks, Denser also considers the type of layer.
is increase of parameters to optimise grows the search
space massively and is a very computationally intensive algo-
rithm, which achieves very high results. Benchmarked is an
impressive result of . on the CIFAR- image recognition
dataset.EvoDeep[]isasimilarapproachfocusingon
deep neural networks with varying layers; researchers found
Roulette Selection (random) to be the best for selecting two
parents for ospring, and thus such selection was chosen
for this study’s evolutionary search. A method of “Extreme
Learning Machines” wasproposedfortheoptimisationof
deep learning processes and was extended to also perform
feature extraction within the topological layers of the model
[].
output recurrent
recurrent
recurrent
recurrent
recurrent
block output
block input
output gate
LSTM block yo
input
input
input
input
peepholes
forget gate
cell
input gate
i
z
c
f
F : Diagram of a standard block within a Long Short-Term
Memory network [].
2.4. Long Short-Term Memory. Long Short-Term Memory
(LSTM) is a form of Articial Neural Network in which
multiple Recurrent Neural Networks (RNN) will predict
based on state and previous states. As seen in Figure , the
data structure of a neuron within a layer is an “LSTM Block”.
e general idea is as follows.
2.4.1. Forget Gate. e forget gate will decide on which
information to store and which to delete or “forget”:
𝑡=𝑓.𝑡=1,
𝑡+𝑓, ()
wheretisthecurrenttimestep,Wf is the matrix of weights, h
is the previous output (t-1), xt is the batch of inputs as a single
vector, and nally bf is an applied bias.
2.4.2. Data Storage and Hidden State. Aer deciding which
information to forget, the unit must also decide which
information to remember. In terms of a cell input i,Ct is a
vector of new values generated.
𝑡=𝑖.𝑡=1,
𝑡+𝑖,()
𝑡=tanh 𝑐.𝑡=1,
𝑡+𝑐. ()
Using the calculated variables in the previous operations,
the unit will follow a convolutional operation to update
parameters:
𝑡=𝑡∗𝑡−1 +𝑡
𝑡.()
2.4.3. Output. In the nal step, the unit will produce an
output at output gate Ot aer the other operations are
Complexity
EEG Collection
Initial
Dataset
Natural
Feature
Extraction
Bio-Inspired
Optimised
Model
bio-feature
selection
Bio-inspired pre-processing
Dataset
Bio-inspired
Hyperheuristic
Optimisation
Complex Signals
F : A graphical representation of the Deep Evolutionary (DEvo) approach to complex signal classication. An evolutionary algorithm
simulation selects a set of natural features before a similar approach is used, then this feature set becomes the input to optimise a bioinspired
classier.
complete, and the hidden state of the node is updated:
𝑡=𝑜.𝑡=1,
𝑡+𝑜, ()
𝑡=𝑡tanh 𝑡.()
Due to the observed consideration of time sequences, i.e.,
previously seen data, it is oen found that time dependent
data (waves; logical sequences) are very eectively classied
thanks to the addition of unit memory. LSTMs are particu-
larly powerful when dealing with speech recognition [] and
brainwave classication [] due to their temporal nature.
2.5. Adaptive Boosting. Adaptive Boosting (AdaBoost) is an
algorithm which will create multiple unique instances of a
certain model to attempt to mitigate situations in which
selected parameters are less eective than others at a certain
time []. e models will combine their weighted predic-
tions aer training on a random data subset to improve the
previous iterations. e fusion of models is given as
𝑇()=𝑇
𝑡=1𝑡(),()
where Fis the set of classiers and xis the data object being
considered [].
3. Method
Building on top of previous works which have succeeded
using bioinspired classiers for prediction of biological pro-
cesses, this work suggests a completely bioinspired process. It
includes biological inspiration into every step of the process
rather than just the classication stage. e system as a whole
therefore has the following stages:
() Generation of an initial dataset of biological data,
EEG signals in particular (collection).
() Selection of attributes via biologically inspired com-
puting (attribute selection).
() Optimisation of a neural network via biologically
inspired computing (hyperheuristics).
() Use of an optimised neural network for the classica-
tion of the data (classication).
e steps allow for evolutionary optimisation of data
preprocessing as well as using a similar approach for deep
neural networks which also evolve. is leads to the Deep
Evolutionary or DEvo approach. A graphical representation of
the above steps can be seen in Figure . Nature is observed to
be close to optimal in both procedure and resources; the goal
of this process therefore is to best retain high accuracies of
complexmodels,buttoreducetheprocessingtimerequired
to execute them.
e rest of this section serves to give details to the steps
oftheDEvoapproachseeninFigure.
3.1. Data Acquisition. As previously mentioned, the
paper at hand provides three experiments dealing with
the classication of the attentional, emotional state,
and “thinking of” stateofsubjects.Forthersttwo
sets of experiments, two datasets were acquired from
previous studies [, ]. e rst dataset (mental state)
distinguishes three dierent states related to how focused the
subject is: relaxed, concentrative, or neutral (https://www
.kaggle.com/birdy/eeg-brainwave-dataset-mental-state).
is data was recorded for three minutes, per state, per
person of the subject group. e subject group was made
up of two adult males and two adult females aged 22±2.
e second dataset (emotional state) was based on whether
a person was feeling positive, neutral, or negative emotions
(https://www.kaggle.com/birdy/eeg-brainwave-dataset-
feeling-emotions). Six minutes for each state were recorded
fromtwoadults,maleandfemaleaged21±1producing
a total of  minutes of brainwave activity data. e
experimental setup of the Muse headset being used to
gather data from the TP, AF, AF, and TP extra-cranial
electrodes during a previous study [] can be seen in Figure .
An example of the raw data retrieved from the headband can
be seen in Figure . Additionally, observations of the range
Complexity
F : A subject having their EEG brainwave data recorded while
being exposed to a stimulus with an emotional valence [].
of subjects for the two aforementioned datasets were made;
educational level was relatively high within the subjects, two
were PhD Students, one Master’s Student, and one with a BSc
degree, all from STEM elds. All subjects were in ne health,
both physical and mental. All subjects were from the United
Kingdom, three were from the West Midlands whereas one
was from Essex. All of the subjects volunteered to take part
in this study.
e two mental state datasets are a constant work in
progress in order to become representative of a whole human
population rather than those described in this section, the
data as-is provides a preliminary point of testing and a proof
of concept of the DEvo approach to bioinspired classier
optimisation, and this would be an ongoing process if
subject diversity has a noticeable impact, since the global
demographic oen changes.
For the third experiment, the “MindBigData” dataset
was acquired and processed (http://www.mindbigdata.com/
opendb/). is publicly available data is an extremely large
dataset gathered over the course of two years from one
subjectinwhichthesubjectwasaskedtothinkofadigit
between and including  to  for two seconds. is gives a
ten class problem. Due to the massive size of the dataset and
computational resources available,  experiments for each
class were extracted randomly, giving a uniform extraction
of  seconds per digit class and therefore  seconds
of EEG brainwave data. It must be critically noted that a
machine learning model would be classifying this single
subject’s brainwaves, and in conjecture, transfer learning is
likely impossible. Future work should concern the gathering
of similar data but from a range of subjects. e MindBigData
dataset used a slightly older version of the Muse headband,
corresponding to two slightly dierent yet still frontal lobe
sensors, collecting data from the TP, FP, FP, and TP
electrode locations.
3.2. Full Set of Features (Preselection). As described previ-
ously, feature extraction is based on previous research into
eective statistical attributes of EEG brainwave data [].
is section describes the reasoning behind the necessity of
performing statistical extraction, as well as the method to
perform the process.
e EEG sensor used for the experiments, the Muse
headband, communicates with the computer using Bluetooth
Low energy (BLE). e use of this protocol improves the
autonomy of the sensor at the expense of a nonuniform
sampling rate. e rst step applied to normalise the dataset is
using a Fourier-based method to resample the data to a xed
frequency of Hz.
Brainwave data is nonlinear and nonstationary in nature,
andthussinglevaluesarenotindicativeofclass.atis,
mental classication is based on the temporal nature of the
wave, and not the values specically. For example, a simplied
concentrative and relaxed wave can be visually recognised
due to the fact that wavelengths of concentrative mental state
class data are far shorter, and yet, a value measured at any one
point might be equal for the two states (i.e., microVolts).
Additionally, the detection of the natures that dictate alpha,
beta, theta, delta, and gamma waves also requires analysis
over time. It is for these reasons that temporal statistical
extraction is performed. For temporal statistical extraction,
sliding time windows of total length s are considered, with
an overlap of . seconds. at is, windows run from [0
1),[1.52.5),[23),[2.53), continuing until the
experiment ends.
e remainder of this subsection describes the dierent
statistical features types which are included in the initial
dataset:
(i)Asetofvaluesofsignalswithinasequenceof
temporal windows 1,2,3⋅⋅⋅𝑛are considered and
mean values are computed:
1
𝑖
𝑁𝑖.()
(ii) e standard deviation of values is recorded:
=1
𝑖
𝑁𝑖−2.()
(iii) Asymmetry and peakedness of waves are statistically
represented by the skewness and kurtosis via the
statistical moments of the third and fourth order.
Skewness:
=𝑘
𝑘()
and kurtosis:
𝑘=1
𝑖
𝑁𝑖−𝑘()
Complexity
TP9 - 33.91
AF7 - 4.83
AF8 - 9.22
TP10 - 31.66
Right AUX - 29.87
time (s)
F : An example of a raw EEG data stream from the Muse EEG headband. e Y-axis represents measured brainwave activity in
microVolts (mV) and the X-axis is the time at which the data was recorded.
are taken where k=rd and k=th moment about the
mean.
(iv) Max value within each particular time window
{1,2,...,𝑛}.
(v) Minimum value within each particular time window
{1,2,...,𝑛}.
(vi) Derivativesoftheminimumandmaximumvaluesby
dividing the time window in half, and measuring the
values from either half of the window.
(vii) Performing the min and max derivatives a second
time on the presplit window, resulting in the deriva-
tives of every .s time window.
(viii) For every min, max, and mean value of the four .s
time windows, the Euclidean distance between them
is measured. For example, the maximum value of time
windowoneoffourhasitsDEuclideandistance
measured between it and max values of windows two,
three, and four of four.
(ix) From the  features generated from quarter-second
min, max, and mean derivatives, the last six features
areignoredandthusax()featurematrix
can be generated. Using the Logarithmic Covariance
matrix model [], a log-cov vector and thus statistical
features can be generated for the data as such
=log (cov ()). ()
U returns the upper triangular features of the resul-
tant vector and the covariance matrix (cov(M))is
cov ()=cov𝑖𝑗 =1
𝑘
𝑁𝑖𝑘 −𝑖𝑘𝑗 −𝑗.()
(x) For each full s time window, the Shannon Entropy is
measured and considered as a statistical feature:
=−
𝑗𝑗×log 𝑗. ()
e complexity of the data is summed up as such,
where h is the statistical feature and S relates to each
signal within the time window aer normalisation of
values.
(xi) For each .s time window, the log-energy entropy is
measured as
log =
𝑖
log 2
𝑖+
𝑗
log 2
𝑖, ()
where iis the rst time window nto n+0.5 and jis the
second time window n+0.5 to n+1.
(xii) Analysis of a spectrum is performed by an algorithm
toperformFastFourierTransform(FFT)[]ofevery
recorded time window, derived as follows:
𝑘=𝑁−1
𝑛=0𝑡
𝑛−𝑖2𝜋𝑘(𝑛/𝑁), =0,...,1. ()
eabovestatisticalfeaturesareusedtorepresentthe
waves. With these features considered for each electrode
and time window (including those formed by overlaps), this
produces a total of  scalars per measure. e resulting
number of features is too large to be used in real time (i.e.,
it would be computationally intensive) and would not yield
good classication results because of the large dimensionality.
Attribute selection is therefore performed to overcome this
limitations and, additionally, make the train process signi-
cantly faster.
3.3. Evolutionary Optimisation and Machine Learning. e
evolutionary optimisation process as detailed previously was
applied when selecting discriminative attributes from the full
dataset for more optimised classication. An initial popu-
lation of  attribute subsets were generated and simulated
for  generations with tournament breeding selection [].
Evolutionary optimisation was also applied to explore the n-
dimensionalMLPtopologicalsearchspace,wherenis the
Complexity
number of hidden layers, with the goal of searching for the
best accuracy (tness metric). With the selected attributes
forming the new dataset to be used in experiments, two
models were generated: an LSTM and an MLP.
Before nalising the LSTM model, various hyperparam-
eters are explored, specically the topology of the network.
is was performed manually since evolutionary optimisa-
tion of LSTM topology would have been extremely computa-
tionally expensive. More than one hidden layer oen returned
worse results during manual exploration and thus one hidden
layer was decided upon. LSTM units within this layer would
be tested from  to  at steps of  units. Using a vector
of the time sequence statistical data as an input in batches of
 data points, an LSTM was trained for  epochs to predict
class for each number of units on a layer, and thus a manually
optimised topology was derived.
A Multilayer Perceptron was rst ne-tuned via an
evolutionary algorithm [] with the number of neurons and
layers as population solutions, with classication accuracy
as a tness. A maximum of three hidden layers and up to
 neurons per layer were implemented into the simulation.
Using -fold cross validation, the MLP had the following
parameters manually set:
(i) -epoch training time
(ii) Learning rate of .
(iii) Momentum of .
(iv) No decay
Finally, the two models were attemptedly boosted using
the AdaBoost algorithm in an eort to mitigate both the ill-
eects of manually optimising the LSTM topology as well as
ne-tune the models overall.
4. Results and Discussion
4.1. Evolutionary Attribute Selection. An evolutionary search
within the  dimensions of the datasets was executed for
 generations and a population of . For mental state, the
algorithm selected  attributes, whereas for the emotional
state, the algorithm selected a far greater  attributes for
the optimised dataset. is suggests that emotional state has
far more useful statistical attributes for classication whereas
mental state requires approx. % fewer. e MindBigData
EEG problem set, incomparable due to the previous due to
its larger range of classes, had  attributes selected by the
algorithm. is can be seen in Table .
e evolutionary search considered the information gain
(Kullback-Leibler Divergence) of the attributes and thus
their classication ability as a tness metric, i.e., where a
higher information gain represents a more eective and less
entropic a model when such attributes are considered as
input parameters. e search selected large datasets, between
sizes  for the MBD dataset, to the  selected for the
emotional state dataset. ough too numerous to detail the
whole process (all datasets are available freely online for full
recreation of experiments), observations were as follows:
(i) For the mental state dataset,  attributes were
selected; the highest was the entropy of the TP
electrode within the rst sliding window at an IG
of .. is was followed secondly with the eigen-
valueofthesameelectrode,showingthattheTP
placement is a good indicator for concentrative states.
It must be noted that these values may possibly
correlate with the Sternocleidomastoid Muscle’s con-
tractional behaviours during stress and ergo the stress
encountered during concentration or the lack thereof
during relaxation, and thus EMG behaviours may be
inadvertently classied rather than EEG.
(ii) Secondly, for the emotional state dataset, the most
important attribute was observed to be the mean
valueoftheAFelectrodeinthesecondoverlap-
ping time window. is gave an information gain of
., closely followed by a measure of . for the
rst covariance matrix of the rst sliding window.
Minimum, mean, and covariance matrix values of
electrodesallfollowedwithIGscoresfrom.to
. until standard deviation of electrodes followed.
Maximum values did not appear until the lower half
of the ranked data, in which the highest max value of
thesecondtimewindowoftheAFelectrodehadan
IG of ..
(iii) Finally, for the MBD dataset, few attributes were
chosen. is was not due to their impressive ability,
but due to the lack thereof when other attributes were
observed. For example, the most eective attribute
was considered the covariance matrix of the second
sliding windows of the frontal lobe electrodes, FP
andFP,buttheseonlyhasinformationgainvalues
of . and . each, far lower than those observed
in the other two experiments. To the lower end of
theselectedvalues,IGscoresof.appear,which
are considered very weak and yet still chosen by the
algorithm. e MBD dataset is thus an extremely
dicult dataset to classify.
Since the algorithm showed clearly a best attribute for
each, a benchmark was performed using a simple One Rule
Classier (OneR). OneR will focus on the values of the
best attribute and attempt to separate classes by numerical
rules. In Table , the observations above are shown more
concretely with statistical evidence. Classifying MindBigData
basedonthe.IGattributedetailedabovegainsonly
.% accuracy, whereas the far higher attributes for the other
two datasets gain .% and .% accuracies.
e datasets generated by this algorithm are taken for-
ward in the DEvo process, and the original datasets are thus
discarded. Further experiments are performed with this data
only.
4.2. Evolutionary Optimisation of MLP. During the algo-
rithm’s process, an issue arose with stagnation, in which the
solutions would quickly converge on a local minima and
an optimal solution was not found. On average, no further
improvement would be made aer generation . It can be
noted that the relatively at gradient in Figures  and 
suggests that the search space’s tness matrix possibly had a
Complexity
T : Datasets generated by evolutionary attribute selection.
Dataset Population Generations No. Chosen Attributes
Mental State   
Emotional State   
MindBigData   
T : Accuracies when attempting to classify based on only one
attribute of the highest information gain.
Dataset MS ES MBD
Benchmark Accuracy (%) . . .
Simulation 1
Simulation 2
Simulation 3
72.5
75
77.5
80
82.5
Global Best Accuracy (%)
23456789101
Generation
F : ree evolutionary algorithm simulations to optimise an
MLP for the mental state dataset.
much lower standard deviation and thus the area was more
dicult to traverse due to the lack of noticeable peaks and
troughs. e algorithm was altered to prevent genetic collapse
with the addition of speciation. e changes were as follows:
(i) A solution would belong to one of three species, A, B,
or C.
(ii) A solution’s species label would be randomly ini-
tialised along with the population members.
(iii) During selection of parent1’s breeding partner, only a
member of parent1’s species could be chosen.
(iv) If only one member of a species remains, it will not
produce ospring.
(v) An ospring will have a small random chance to
become another species (manually tuned to %)
e implementation of separate species in the simulation
allowed for more complex, better solutions to be discovered.
e increasing gradients as observed in Figures , , and 
show that constant improvement was achieved. e evolu-
tionary optimisation of MLP topology was set to run for a
Simulation 1
Simulation 2
Simulation 3
70
75
80
85
90
95
100
Global Best Accuracy (%)
231 56789104
Generation
F : ree evolutionary algorithm simulations to optimise an
MLP for the emotional state dataset.
set  generations, tested for scientic benchmark accuracy
three times due to the possibility of a single random mutation
nding a good result by chance (random search), taking
approximately ten minutes for each to execute.
is was repeated three times for purposes of scientic
accuracy. Tables , , and  detail the accuracy values
measured at each generation along with detail of the network
topology. Figures , , and  graphically represent these
experiments to detail the gradient of solution score increase.
4.3. Manual LSTM Tuning. Manual tuning was performed to
explore the options for LSTM topology for both mental state
and emotional state classication. Evolutionary optimisation
was not applied due to the high resource usage of LSTM
training, due to many single networks taking multiple hours
totrainontheCUDAcoresofanNVidiaGTX.
Results in Table  show that, for mental state,  LSTM units
aresomewhatmostoptimal,whereasLSTMunitsweredis-
covered to be most optimal for emotional state classication
and  LSTM units are best for the MindBigData digit set but
this result is extremely low for a uniform -class problem,
with very little information gain. Comparison of the LSTM
units to accuracy for both states can be seen in Figure .
For each of the experiments, these arrangements of LSTM
architecture will be taken forward as the selected model.
Additionally, empirical testing found that  epochs for
training of units seemed best but further exploration is
 Complexity
T : Global best MLP solutions for mental state classication.
Experiment Generation
12345678910
Layers 
Neurons          
Accuracy ( %) . . . . . . . . . 79.8061
Layers 
Neurons , , , , ,
Accuracy ( %) . . . . . . . . . .
Layers 
Neurons          
Accuracy ( %) . . . .  . . .  . . .
T : Global best MLP solutions for emotional state classication.
Experiment Generation
123456 7 8 910
Layers 
Neurons ,,,,,,,
Accuracy ( %) . . . . . . . . . .
Layers 
Neurons  ,  ,    
Accuracy ( %) . . . . . . . . . 96.1069
Layers 
Neurons ,, , , , , , , , , ,
Accuracy ( %) . . . . . . . . . .
Simulation 1
Simulation 2
Simulation 3
23456789101
Generation
10
15
20
25
30
Global Best Accuracy (%)
F : ree evolutionary algorithm simulations to optimise an
MLP for the MindBigData dataset.
required to ne-tune this parameter. A batch size of 
formed the input vectors of sequential statistical brainwave
data for the LSTM. Gradient descent was handled by the
Adaptive Moment Estimation (Adam) algorithm, with a
decay value of .. Weights were initialised by the commonly
Mental State
Emotional State
Emotional State
10
20
30
40
50
60
70
80
90
100
Accuracy (%)
50 75 100 12525
Hidden LSTM Units
F : Manual tuning of LSTM topology for mental state (MS),
emotional state (ES), and MindBigData (MBD) classication.
used XAVIER algorithm. Optimisation was performed by
Stochastic Gradient Descent. Manual experiments found that
a network with a depth of  persistently outperformed deeper
networks of two or more hidden layers for this specic
context; interestingly, this too is mirrored in the evolutionary
Complexity 
T : Global best MLP solutions for MindBigData classication.
Experiment Generation
12345678910
Layers      
Neurons 
Accuracy ( %) . . . . . . . . . .
Layers     
Neurons ,  ,        
Accuracy ( %) . . . . . . . . . .
Layers    
Neurons ,  ,  ,       
Accuracy ( %) . . . . . . . . . 27.0718
T : Manual tuning of LSTM topology for mental state (MS),
emotional state (ES), and EEG MindBigData classication.
LSTM Units MS (%) ES (%) MBD (%)
 . 96.86 .
 . . .
 . . .
 83.84 . 10.77
 . . .
DEvo MLP
LSTM
AB(DEvo MLP)
AB(LSTM)
6.38
63.63
63.96
638.32
16.66
65.11
32.88
594.55
3.97
52.32
41.05
810.33
Mental State
Emotional State
MindBigData Digits
0
200
400
600
800
Approx. time to train (s)
F : Graph to show the time taken to build the nal models
aer search.
optimisation algorithms for the MLP which always converged
to a single layer to achieve higher tness.
4.4. Single and Boost Accuracy. Figure  shows a comparison
ofapproximatetimetakentotrainthevariousmodels,note
that -fold cross validation was performed to prevent over-
tting and thus the actual time taken with this in mind is
around ten times more than the displayed value. Additionally,
this time was measured when training on the  CUDA
cores of an NVidia GTX (GB) would take considerably
longer on a CPU. Although the mental state dataset had
approximately ve times the number of attributes, the time
taken to learn on this dataset was only slightly longer than
theemotionalstatebyanaverageof%(.s).
Since the LSTM topology was linearly tuned in a manual
processwhereastheMLPwassearchedviaanevolutionary
algorithm, the processes are not scientically comparable
since the former depends on human experience and latter
upon resources available. us, time for these processes are
not given since only one is a measure of computational
resource usage; it is suggested that a future study should make
use of the evolutionary algorithm within the search space of
LSTM topologies too, in which case they can be compared.
ough,itcanbeinferredfromFigurethatthesearchfor
an LSTM would take considerably longer due to the increased
resources required in every experiment performed compared
to the MLP. Additionally, with this in mind, a Multiobjective
Optimisation (MOO) implementation of DEvo that consid-
ers both accuracy and resource usage as tness metrics could
further nd more optimal models in terms of both their
classication ability and optimal execution.
e overall results of the experiments can be seen rstly
in Table  and as a graphical comparison in Figure . For
the two three-state datasets, the most accurate model was an
AdaBoosted LSTM with results of .% and .% accura-
cies for the mental state and mental emotional state datasets,
respectively. e single LSTM and evolutionary-optimised
MLP models come relatively close to the best result, though
take far less time to train when the measured approximate
values in Figure  are observed. On the other hand, for the
MindBigData digits dataset, the best solution by far was the
AdaptiveBoostedDEvoMLP,andthesameboostingmethod
applied to the LSTM that previously improved them actually
caused a loss in accuracy.
Manual tuning of LSTM network topology was per-
formed due to the limited computational resources available;
the success in optimisation of the MLP suggests that further
improvements could be made through an automated process
of evolutionary optimisation in terms of the LSTM topology.
A further improvement to the DEvo system could be made
 Complexity
T : Classication accuracy on the two optimised datasets by the DEvo MLP, LSTM, and selected boost method.
Dataset Accuracy (%) Boost Accuracy (%)
DEvo MLP LSTM AB(DEvo MLP) AB(LSTM)
Mental State . 83.84 . 84.44
Emotional State . 96.86 . 97.06
MindBigData Digits . . 31.35 .
DEvo MLP
LSTM
AB(DEvo MLP)
AB(LSTM)
0
20
40
60
80
100
79.81 83.84 79.7
84.44
96.11 96.86 96.23 97.06
27.07
10.77
31.35
9.94
Accuracy (%)
Mental State
Emotional State
MindBigData Digits
F : Final results for the experiment.
by exploring the possibility of optimising the LSTM structure
through an evolutionary approach. In addition, more bioin-
spired classication techniques should be experimented with,
for example, a convolutional neural network to better imitate
and improve on the classication ability of natural vision [].
e three experiments were performed within the lim-
itationsoftheMuseheadbandsTP,AF,AF,andTP
electrodes. Higher resolution EEG setups would allow for
further exploration of the system in terms of mental data
classication, e.g., for physical movement originating from
the motor cortex.
5. Conclusion
is study suggested DEvo, a Deep Evolutionary, approach
to optimise and classify complex signals using bioinspired
computing methods in the whole pipeline, from feature selec-
tion to classication. For mental state and mental emotional
state classication of EEG brainwaves and their mathematical
features, two best models were produced:
() A more accurate AdaBoosted LSTM, that although it
took more time and resources to train in comparison
to other methods, it managed to attain accuracies
of .% and .% for the two rst datasets
(attentional and emotional state classication).
() Secondly, a AdaBoosted Multilayer Perceptron that
was optimised using a hyperheuristic evolutionary
algorithm. ough its classication accuracy was
slightly lower than that of the AdaBoosted LSTM
(.% and .% for the same two experiments), it
took less time to train.
For the MindBigData digits dataset the most accurate
model was an Adaptive Boosted version of the DEvo opti-
mised MLP, which achieved an accuracy of %. For this
problem, none of the LSTMs were able to achieve any
meaningful or useful results, but the DEvo MLP approach
saved time and also produced results that were useful. Results
were impressive for application due to the high classication
ability along with the reduction of resource usage; real-time
training from individuals would be possible and thusprovide
a more accurate EEG-based product to the consumer, for
example, in real-time monitoring of mental state for the
grading of meditation or yoga session quality. Real-time
communication would also be possible in human-computer
interaction where the brain activity acts as a degree of input.
e goal of the experiment was successfully achieved, the
DEvo approach has led to an optimised, resource-light model
that closely matches that to an extremely resource heavy
deep learning model, losing a small amount of accuracy but
computing in approximately % of the time, except for in one
case in which it far outperformed its competitor models.
e aforementioned models were trained on a set of
attributes that were selected with a bioinspired evolutionary
algorithm.
e success of these processes led to future work sugges-
tions, which follow the pattern of further bioinspired opti-
misation applications within the eld of machine learning.
Future work should also consider, for better application of
the process within the eld of electroencephalography, a
much larger collection of data from a considerably more
diverse range of subjects in order to better model the classier
optimisation for the thought pattern of a global population
rather than the subjects encompassed within this study.
Data Availability
Alldatausedinthisstudyisfreelyavailableonline;linksto
all datasets can be found within the data acquisition section.
Conflicts of Interest
e authors declare that they have no conicts of interest.
Complexity 
References
[] H. H. Jasper, “e ten-twenty electrode system of the interna-
tional federation,Clinical Neurophysiology,vol.,pp.,
.
[] K.Gre,R.K.Srivastava,J.Koutnk,B.R.Steunebrink,andJ.
Schmidhuber, “LSTM: a search space odyssey,IEEE Transac-
tions on Neural Networks and Learning Systems,vol.,no.,
pp. –, .
[]J.J.Bird,A.Ekart,C.D.Buckingham,andD.R.Faria,
“Mental emotional sentiment classication with an eeg-based
brain-machine interface,” in Proceedings of theInternational
ConferenceonDigitalImageandSignalProcessing(DISP19),
Springer, .
[] E.NiedermeyerandF.L.daSilva,Electroencephalography: Basic
Principles, Clinical Applications, and Related Fields,Lippincott
Williams & Wilkins, .
[] A. Coenen, E. Fine, and O. Zayachkivska, “Adolf beck: a
forgotten pioneer in electroencephalography,” Journal of the
History of the Neurosciences,vol.,no.,pp.,.
[] B. E. Swartz, “e advantages of digital over analog recording
techniques,Electroencephalography and Clinical Neurophysiol-
ogy,vol.,no.,pp.,.
[]A.K.ShahandS.Mittal,“Invasiveelectroencephalography
monitoring: Indications and presurgical planning,Annals of
Indian Academy of Neurology, vol. , no. , pp. S–S, .
[] B. A. Taheri, R. T. Knight, and R. L. Smith, “A dry electrode
for EEG recording,ElectroencephalographyandClinicalNeu-
rophysiology,vol.,no.,pp.,.
[] H. L¨
ovheim, “A new three-dimensional model for emotions and
monoamine neurotransmitters, Medical Hypothes es,vol.,no.
, pp. –, .
[] K.OatleyandJ.M.Jenkins,Understanding Emotions,Blackwell
Publishing, .
[] J. Gruzelier, “A theory of alpha/theta neurofeedback, creative
performance enhancement, long distance functional connectiv-
ity and psychological integration,Cognitive Processing,vol.,
no. , pp. –, .
[] E.-R. Symeonidou, A. D. Nordin, W. D. Hairston, and D.
P. Ferris, “Eects of cable sway, electrode surface area, and
electrode mass on electroencephalography signal qualityduring
motion,Sensors,vol.,no.,p.,.
[] A. S. Oliveira, B. R. Schlink, W. D. Hairston, P. K ¨
onig, and D.
P. Ferris, “Induction and separation of motion artifacts in EEG
data using a mobile phantom head device,Journal of Neural
Engineering,vol.,no.,p.,.
[] M. Abujelala, A. Sharma, C. Abellanoza, and F. Makedon,
“Brain-EE: Brain enjoyment evaluation using commercial EEG
headband,” in Proceedings of the 9th ACM International Confer-
ence on Pervasive Technologies Related to Assistive Environments,
PETRA 2016, p. , ACM, Greece, July .
[] A.Plotnikov,N.Stakheika,A.DeGloriaetal.,“Exploitingreal-
time EEG analysis for assessing ow in games,” in Proceedings
of the 12th IEEE International Conference on Advanced Learning
Tech n o l o g ies, ICALT 2012, pp. -, Italy, July .
[] O. E. Krigolson, C. C. Williams, A. Norton, C. D. Hassall, and F.
L. Colino, “Choosing MUSE: Validation of a low-cost, portable
EEG system for ERP research,Frontiers in Neuroscience, vol. ,
p. , .
[]J.J.Bird,L.J.Manso,E.P.Ribiero,A.Ekart,andD.R.
Faria, “A study on mental state classication using eeg-based
brain-machine interface,” in Proceedings of the 9th International
Conference on Intelligent Systems, IEEE, .
[] C. Premebida, D. R. Faria, F. A. Souza, and U. Nunes, “Applying
probabilistic Mixture Models to semantic place classication in
mobile robotics,” in Proceedings of the IEEE/RSJ International
Conference on Intelligent Robots and Systems, IROS 2015,p
p
.
–, Germany, October .
[] C. Premebida, D. R. Faria, and U. Nunes, “Dynamic Bayesian
network for semantic place classication in mobile robotics,
Autonomous Robots,vol.,no.,pp.,.
[] T. Y. Chai, S. S. Woo, M. Rizon, and C. S. Tan, “Classication of
human emotions from eeg signals using statistical features and
neural network,International Journal of Integrated Engineering
(Penerbit UTHM),vol.,pp.,.
[]H.Tanaka,M.Hayashi,andT.Hori,“Statisticalfeaturesof
hypnagogic EEG measured by a new scoring system,SLEEP,
vol. , no. , pp. –, .
[] W. Zheng, J. Zhu, Y. Peng, and B. Lu, “EEG-based emotion
classication using deep belief networks,” in Proceedings of the
2014 IEEE International Conference on Multimedia and Expo
(ICME), pp. –, Chengdu, China, July .
[] K. G. Jordan, “Emergency EEG and continuous EEG monitor-
ing in acute ischemic stroke,Journal of Clinical Neurophysiol-
ogy,vol.,no.,pp.,.
[] K. K. Ang, C. Guan, K. S. G. Chua et al., “Clinical study of
neurorehabilitation in stroke using EEG-based motor imagery
brain-computer interface with robotic feedback, in Proceedings
of the 2010 32nd Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, EMBC’10,pp.
–, Argentina, September .
[] A. T. Tzallas, M. G. Tsipouras, and D. I. Fotiadis, “Epileptic
seizure detection in EEGs using time-frequency analysis,IEEE
Transactions on Information Technology in Biomedicine,vol.,
no.,pp.,.
[] A.Aarabi,R.Grebe,andF.Wallois,“Amultistageknowledge-
based system for EEG seizure detection in newborn infants,
Clinical Neurophysiology,vol.,no.,pp.,.
[] K.-M. Rytk¨
onen, J. Zitting, and T. Porkka-Heiskanen, “Auto-
mated sleep scoring in rats and mice using the naive Bayes
classier,Journal of Neuroscience Methods,vol.,no.,pp.
–, .
[] L. Fraiwan, K. Lweesy, N. Khasawneh, H. Wenz, and H.
Dickhaus, “Automated sleep stage identication system based
on time-frequency analysis of a single EEG channel and random
forest classier,Computer Methods & Programs in Biomedicine,
vol.,no.,pp.,.
[] Y. Peng and B.-L. Lu, “Immune clonal algorithm based feature
selection for epileptic EEG signal classication,” in Proceedings
of the 2012 11th International Conference on Information Science,
Signal Processing and their Applications, ISSPA 2012,pp.
, Canada, July .
[] D. R. Faria, M. Vieira, F. C. C. Faria, and C. Premebida,
Aective facial expressions recognition for human-robot inter-
action,” in Proceedingsofthe26thIEEEInternationalSymposium
on Robot and Human Interactive Communication, RO-MAN
2017, pp. –, Portugal, September .
[] J. J. Bird, A. Ekart, and D. R. Faria, “High resolution sentiment
analysis by ensemble classication,” in Proceedings of the SAI
Computing Conference, SAI, 2019,.
[] C. Coppola, D. R. Faria, U. Nunes, and N. Bellotto, “Social
activity recognition based on probabilistic merging of skeleton
 Complexity
features with proximity priors from RGB-D data,” in Proceed-
ings of the 2016 IEEE/RSJ International Conference on Intelligent
Robots and Systems, IROS 2016,pp.,Republicof
Korea, October .
[] D. A. Adama, A. Lot, C. Langensiepen, and K. Lee, “Human
activities transfer learning for assistive robotics,” in Proceedings
of the UK Workshop on Computational Intelligence, pp. –,
Springer, .
[] S. W. Yahaya, C. Langensiepen, and A. Lot, “Anomaly detec-
tion in activities of daily living using one-class support vector
machine,” in Proceedings of the UK Workshop on Computational
Intelligence,pp.,Springer,.
[] D. Ortega-Anderez, A. Lot, C. Langensiepen, and K. Appiah,
A multi-level renement approach towards the classication
of quotidian activities using accelerometer data,Journal of
Ambient Intelligence and Humanized Computing,pp.,.
[] J. J. Bird, A. Ek ´
art, and D. R. Faria, “Learning from inter-
action: An intelligent networked-based human-bot and bot-
bot chatbot system,” in Proceedings of the UK Workshop on
Computational Intelligence, pp. –, Springer, .
[] P. A. Vikhar, “Evolutionary algorithms: a critical review and
its future prospects,” in Proceedings of the 2016 International
Conference on Global Trends in Signal Processing, Information
Computing and Communication, ICGTSPICC 2016, pp. –,
India, December .
[] C. Darwin, On the origin of species, 1859,Routledge,.
[] J. J. Bird, A. Ekart, and D. R. Faria, “Evolutionary optimisation
of fully connected articial neural network topology,” in Pro-
ceedings of the SAI Computing Conference, SAI, 2019,.
[] D. H. Wolpert and W. G. Macready, “No free lunch theorems for
optimization,IEEE Transactions on Evolutionary Computation,
vol. , no. , pp. –, .
[] D. E. Knuth, “Postscript about NP-hard problems,ACM
SIGACT News,vol.,no.,pp.-,.
[] Y. Bengio, I. J. Goodfellow, and A. Courville, “Deep learning,
Nature,vol.,no.,pp.,.
[] S. Kullback and R. A. Leibler, “On information and suciency,
Annals of Mathematical Statistics,vol.,no.,pp.,.
[] F. Assunc¸˜
ao, N. Lourenc¸o,P.Machado,andB.Ribeiro,
“DENSER: deep evolutionary network structured representa-
tion,Genetic Programming and Evolvable Machines,pp.,
.
[] A. Mart´
ın, R. Lara-Cabrera, F. Fuentes-Hurtado, V. Naranjo,
and D. Camacho, “EvoDeep: a new evolutionary approach for
automatic deep neural networks parametrisation,Journal of
Parallel and Distributed Computing,vol.,pp.,.
[] Y. Peng and B.-L. Lu, “Discriminative extreme learning machine
with supervised sparsity preserving for image classication,
Neurocomputing,vol.,pp.,.
[] A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech
recognition with deep bidirectional LSTM,” in Proceedings
of the 2013 IEEE Workshop on Automatic Speech Recognition
and Understanding, ASRU 2013, pp. –, Czech Republic,
December .
[]P.R.Davidson,R.D.Jones,andM.T.R.Peiris,“Detecting
behavioral microsleeps using EEG and LSTM recurrent neural
networks,” in Proceedings of the 2005 27th Annual International
Conference of the Engineering in Medicine and Biology Society,
IEEE-EMBS 2005, pp. –, China, September .
[] Y. Freund and R. E. Schapire, “A decision-theoretic generaliza-
tion of on-line learning and an application to boosting,Journal
of Computer and System Sciences,vol.,no.,part,pp.
, .
[] R. Rojas, “Adaboost and the super bowl of classiers a tutorial
introduction to adaptive boosting,” Tech. Rep., Freie University,
Berlin, Germany, .
[] T. Y. M. Chiu, T. Leonard, and K.-W. Tsui, “e matrix-
logarithmic covariance model,Journal of the American Statis-
tical Association, vol. , no. , pp. –, .
[] C. van Loan, Computational Frameworks for the Fast Fourier
Transf orm ,vol.ofFrontiers in Applied Mathematics,Society
for Industrial and Applied Mathematics (SIAM), Philadelphia,
Pa,USA,.
[] T. Back, Evolutionary Algorithms in eory and Practice: Evolu-
tion Strategies, Evolutionary Programming, Genetic Algorithms,
Oxford University Press, .
[] M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer
learning for image classication,” in Proceedings of the UK
Workshop on Computational Intelligence,pp.,Springer,
.
... Most of the approaches focused on physiologic signals utilize well-designed classifiers with hand-crafted features to identify human emotions. Electroencephalograph (EEG), the measurement of electrical stimulation in the brain, is a commonly used measure of brain function during cognitive activities such as working memory (Bird et al [1], Bird et al [2], Nie et al [3], Jatupaiboon et al [4], Shah et al [5], Bird et al [6]). Identify invariant representations to inter-and intrasubject variations and also intrinsic noise associated with these data represent challenges for EEG data modeling for cognitive activities. ...
... They used classic classifiers such as Bayesian Networks, Help Vector Machines, and Random Forests to obtain an average accuracy of more than 87%. To do this analysis, we have conducted in-depth research on two papers; Bird et al, [2] and Bird et al, [6] from which we use their benchmark to add some hangings in the preprocessing and processing processes. To estimate the network's optimal hyperparameters before classification, Bird et al, [2] used an evolutionary approach to evaluate Multilayer Perceptron (MLP). ...
... To do this analysis, we have conducted in-depth research on two papers; Bird et al, [2] and Bird et al, [6] from which we use their benchmark to add some hangings in the preprocessing and processing processes. To estimate the network's optimal hyperparameters before classification, Bird et al, [2] used an evolutionary approach to evaluate Multilayer Perceptron (MLP). In their paper, they explored DL and tuning with Long Short-Term Memory (LSTM), and measures Adaptive Boosting of the two types of models for each issue. ...
... Independent non-invasive monitoring of emotional states has the potential to be beneficial in a variety of fields, User and device engagement may be enhanced by incorporating human-robot interaction and mental healthcare, Information may be gathered that is not dependent on spoken communication by using augmented reality [12]. Electroencephalography (EEG) technology has become increasingly affordable; brainwave data is becoming more affordable for consumers and researchers alike, selfcategorization without the requirement for an expert. ...
... To find important statistics and decrease the complexity of the model development process, feature selection must be done, saving both time and www.ijacsa.thesai.org computing resources throughout the training and classification procedures [12]. ...
... Both the features were entered into the second and third-level classifiers. More comprehensive work observed in [17]. Significant information was extracted from around 2550 EEG signals using an evolutionary approach. ...
... In Eqs. (17) and (18) the information of the global best is shared with the local best through the second part. In the third part of the equation, random values are generated, which improves the local best and helps the local best to recover from the local minimum. ...
Article
The inability of a patient to talk, hear, or both for any specific reason can create a worrisome scenario because any form of reaction can deem the brain activity. In such a scenario, Electroencephalography (EEG) is used to measure a patient’s responsiveness by observing recorded electrical signals on the scalp. However, reading and interpreting EEG signals remotely is a difficult task due to the lack of an intelligent framework in the healthcare domain. Thus to monitor and analyze the EEG signals remotely with an efficient Internet of Medical Things (IoMT) framework and analyze these signals, various machine learning (ML) and deep learning (DL) models have been incorporated. However, the existing ML/DL models do not allow end-users to understand the entire logic behind analyzing those signals that make the IoMT framework decision not transparent. Motivated by the above-mentioned problems, in this paper, an empirical Intelligent Agent-based Bag-of-Neural Networks (IBoNN) model is incorporated in the IoMT framework to make real-time decisions with high accuracy. The IBoNN model is intended to categorize the incoming brain signals based on a collection of neural networks and determine the correct response from EEG signals of the patients. The outperformance of the proposed IBoNN model over the standard ML models is evaluated with a set of performance matrices over a benchmark dataset. From the experimental analysis, it has been observed that the IBoNN model yields an average accuracy of 91%–99% during the categorization of brain responses from EEG signals.
Conference Paper
A great number of individuals suffer from spinal cord injuries which lead to drastic limitations to their daily life functions. Specifically, suffering from paraplegia and quadriplegia will deem the patient unable to move two limbs or all four limbs respectively. However, even if the spine is damaged, this does not stop the brain from functioning, and it will still have the ability to distinguish objects. Further, the implementation of Brain- Machine Interface (BMI)-based IoT system suffers from several challenges such as the issues of accurately translating the user intention. This paper presents a method for brain waves recognition using deep learning (DL) based on shapes and colors for use in merging concepts of the internet of things (IoT) and the brain computer interface (BCI) which we wish to call "Internet of Brain Controlled Things or (IoBCT) in short. The results showed an acceptable accuracy of 0.93 in the brain waves pattern’s recognition which opens the way towards designing a reliable IoBCT.
Article
In this study, five Multiple Linear Regression, three Multilayer Perceptron Regressor, seven Decision Tree and four Support Vector Machine models were constructed to predict outlet temperature and humidity ratio of silica gel desiccant wheels using eight input parameters for unbalanced flow condition. The effect of different kernel functions of Support Vector Machine algorithms, on the modeling of desiccant wheel was investigated for the first time in the open literature. Detailed validation of the developed models showed that the Response Surface model outperformed other Multiple Linear Regression models, and the Support Vector Machine model with Pearson VII Universal kernel was the best among all models. The determination coefficient and root mean square error for temperature were found to be 0.9791 and 1.2832 °C for the Response Surface model and, 0.9984 and 0.3511 °C for the Support Vector Machine model with Pearson VII Universal kernel, respectively. In the case of humidity ratio, the corresponding statistical parameters were 0.9763 and 0.5672 g/kg for the former and, 0.9976 and 0.1810 g/kg for the latter. The proposed models can be used reliably in the analysis of solid desiccant-based air conditioning systems for design and energy analysis.
Preprint
Full-text available
We present the Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants. The objective was to collect cognitive signals as responses of participants engaged in a number of language intensive tasks involving different text-image stimuli settings when translating from English to Czech. Each participant was exposed to 32 text-image stimuli pairs and asked to (1) read the English sentence, (2) translate it into Czech, (3) consult the image, (4) translate again, either updating or repeating the previous translation. The text stimuli consisted of 200 unique sentences with 616 unique words coupled with 200 unique images as the visual stimuli. The recordings were collected over a two week period and all the participants included in the study were Czech natives with strong English skills. Due to the nature of the tasks involved in the study and the relatively large number of participants involved, the corpus is well suited for research in Translation Process Studies, Cognitive Sciences among other disciplines.
Article
Wind energy is a special type of renewable low-carbon energy with no greenhouse gas emissions. It can be used for power generation and grid stability improvement in the engineering field to provide highly accurate forecasts. However, most existing forecasting models ignore the role of data preprocessing and optimization methods, resulting in low forecasting accuracy. To fill this gap, a novel combined forecasting system that performs deterministic and probabilistic forecasts is proposed. The system has three parts: an advanced data denoising algorithm, deep learning forecasting models, and a self-improved multi-objective optimization algorithm for improving the performance of wind speed forecasting. The self-improved optimization algorithm can converge to global optimum based on theoretical proof. Using three datasets from 10-min wind speed data, several controlled experiments are designed and implemented to demonstrate the forecasting performance of the combined system in terms of deterministic and probabilistic forecasting. After the experiment, the forecasting effectiveness, improvement ratio of metrics, sensitivity, and computational complexity of this system are presented, further demonstrating the advantages of the combined system. Through the foregoing techniques, the proposed system resolves the problem of inaccurate wind speed forecasting, thus supplementing the existing field.
Thesis
Full-text available
In modern Human-Robot Interaction, much thought has been given to accessibility regarding robotic locomotion, specifically the enhancement of awareness and lowering of cognitive load. On the other hand, with social Human-Robot Interaction considered, published research is far sparser given that the problem is less explored than pathfinding and locomotion. This thesis studies how one can endow a robot with affective perception for social awareness in verbal and non-verbal communication. This is possible by the creation of a Human-Robot Interaction framework which abstracts machine learning and artificial intelligence technologies which allow for further accessibility to non-technical users compared to the current State-of-the-Art in the field. These studies thus initially focus on individual robotic abilities in the verbal, non-verbal and multimodality domains. Multimodality studies show that late data fusion of image and sound can improve environment recognition, and similarly that late fusion of Leap Motion Controller and image data can improve sign language recognition ability. To alleviate several of the open issues currently faced by researchers in the field, guidelines are reviewed from the relevant literature and met by the design and structure of the framework that this thesis ultimately presents. The framework recognises a user's request for a task through a chatbot-like architecture. Through research in this thesis that recognises human data augmentation (paraphrasing) and subsequent classification via language transformers, the robot's more advanced Natural Language Processing abilities allow for a wider range of recognised inputs. That is, as examples show, phrases that could be expected to be uttered during a natural human-human interaction are easily recognised by the robot. This allows for accessibility to robotics without the need to physically interact with a computer or write any code, with only the ability of natural interaction (an ability which most humans have) required for access to all the modular machine learning and artificial intelligence technologies embedded within the architecture. Following the research on individual abilities, this thesis then unifies all of the technologies into a deliberative interaction framework, wherein abilities are accessed from long-term memory modules and short-term memory information such as the user's tasks, sensor data, retrieved models, and finally output information. In addition, algorithms for model improvement are also explored, such as through transfer learning and synthetic data augmentation and so the framework performs autonomous learning to these extents to constantly improve its learning abilities. It is found that transfer learning between electroencephalographic and electromyographic biological signals improves the classification of one another given their slight physical similarities. Transfer learning also aids in environment recognition, when transferring knowledge from virtual environments to the real world. In another example of non-verbal communication, it is found that learning from a scarce dataset of American Sign Language for recognition can be improved by multi-modality transfer learning from hand features and images taken from a larger British Sign Language dataset. Data augmentation is shown to aid in electroencephalographic signal classification by learning from synthetic signals generated by a GPT-2 transformer model, and, in addition, augmenting training with synthetic data also shows improvements when performing speaker recognition from human speech. Given the importance of platform independence due to the growing range of available consumer robots, four use cases are detailed, and examples of behaviour are given by the Pepper, Nao, and Romeo robots as well as a computer terminal. The use cases involve a user requesting their electroencephalographic brainwave data to be classified by simply asking the robot whether or not they are concentrating. In a subsequent use case, the user asks if a given text is positive or negative, to which the robot correctly recognises the task of natural language processing at hand and then classifies the text, this is output and the physical robots react accordingly by showing emotion. The third use case has a request for sign language recognition, to which the robot recognises and thus switches from listening to watching the user communicate with them. The final use case focuses on a request for environment recognition, which has the robot perform multimodality recognition of its surroundings and note them accordingly. The results presented by this thesis show that several of the open issues in the field are alleviated through the technologies within, structuring of, and examples of interaction with the framework. The results also show the achievement of the three main goals set out by the research questions; the endowment of a robot with affective perception and social awareness for verbal and non-verbal communication, whether we can create a Human-Robot Interaction framework to abstract machine learning and artificial intelligence technologies which allow for the accessibility of non-technical users, and, as previously noted, which current issues in the field can be alleviated by the framework presented and to what extent.
Article
Real-time emotion recognition with electroencephalograph (EEG) has been an active field of research in recent years. In particular, deep learning has been shown to be effective in emotion classification tasks. However, the monitoring of EEG signals is a continuous process, there is a need for energy-efficient emotion classification methods. Compared with artificial neural networks (ANNs), spiking neural networks (SNNs), in which weight multiplications are replaced by additions, are more energy efficient. In this paper, we propose a near-lossless transfer learning method for SNNs, specially designed for EEG signals. Data is preprocessed, and its power spectral density (PSD) is extracted to represent the frequency domain of the raw EEG signal. Using a 3-layer pretrained SNN, running on the DEAP dataset, we achieved an accuracy of 78.87% and 76.5% for valence and arousal dimensions, respectively. By training a model based on one dimension and fine-tuning on another, we even achieve higher accuracy, 82.75% for the valence and 84.22% for the arousal. As far as we know, our results yield the smallest SNN with the highest accuracy for this task to date. The energy power of our SNNs for valence and arousal dimensions is 13.8% that of our CNN-based solutions. The framework was developed by PyTorch and is available under an open-source license.
Conference Paper
Full-text available
This study proposes an approach to ensemble sentiment classification of a text to a score in the range of 1-5 of negative-positive scoring. A high-performing model is produced from TripAdvisor restaurant reviews via a generated dataset of 684 word-stems selected by their information gain ranking. Analysis documents the few mis-classified instances as almost entirely being close to their real class, the best performing classification was an ensemble classifier of RandomForest, Naive Bayes Multinomial and Multilayer Perceptron (Neural Network) methods ensembled via a Vote on Average Probabilities approach. The best ensemble produced a classification accuracy of 91.02% which scored higher than the best single classifier, a Random Tree model with an accuracy of 78.6%. Ensemble through Adaptive Boosting, Random Forests and Voting is explored. All ensemble methods far outperformed the best single classifier methods.
Conference Paper
Full-text available
This paper explores single and ensemble methods to classify emotional experiences based on EEG brainwave data. A commercial MUSE EEG headband is used with a resolution of four (TP9, AF7, AF8, TP10) electrodes. Positive and negative emotional states are invoked using film clips with an obvious valence, and neutral resting data is also recorded with no stimuli involved, all for one minute per session. Statistical extraction of the alpha, beta, theta, delta and gamma brainwaves is performed to generate a large dataset that is then reduced to smaller datasets by feature selection using scores from OneR, Bayes Network, Information Gain, and Symmetrical Uncertainty. Of the set of 2548 features, a subset of 63 selected by their Information Gain values were found to be best when used with ensemble classifiers such as Random Forest. They attained an overall accuracy of around 97.89%, outperforming the current state of the art by 2.99 percentage points. The best single classifier was a deep neural network with an accuracy of 94.89%.
Conference Paper
Full-text available
This paper proposes an approach to selecting the amount of layers and neurons contained within Multilayer Perceptron hidden layers through a single-objective evolutionary approach with the goal of model accuracy. At each generation, a population of Neural Network architectures are created and ranked by their accuracy. The generated solutions are combined in a breeding process to create a larger population, and at each generation the weakest solutions are removed to retain the population size inspired by a Darwinian 'survival of the fittest'. Multiple datasets are tested, and results show that architectures can be successfully improved and derived through a hyper-heuristic evolutionary approach, in less than 10% of the exhaustive search time. The evolutionary approach was further optimised through population density increase as well as gradual solution max complexity increase throughout the simulation.
Conference Paper
Full-text available
This work aims to find discriminative EEG-based features and appropriate classification methods that can categorise brainwave patterns based on their level of activity or frequency for mental state recognition useful for human-machine interaction. By using the Muse headband with four EEG sensors (TP9, AF7, AF8, TP10), we categorised three possible states such as relaxing, neutral and concentrating based on a few states of mind defined by cognitive behavioural studies. We have created a dataset with five individuals and sessions lasting one minute for each class of mental state in order to train and test different methods. Given the proposed set of features extracted from the EEG headband five signals (alpha, beta, theta, delta, gamma), we have tested a combination of different features selection algorithms and classifier models to compare their performance in terms of recognition accuracy and number of features needed. Different tests such as 10-fold cross validation were performed. Results show that only 44 features from a set of over 2100 features are necessary when used with classical classifiers such as Bayesian Networks, Support Vector Machines and Random Forests, attaining an overall accuracy over 87%.
Article
Full-text available
Wearable inertial measurement units incorporating accelerometers and gyroscopes are increasingly used for activity analysis and recognition. In this paper an activity classification algorithm is presented which includes a novel multi-step refinement with the aim of improving the classification accuracy of traditional approaches. To do so, after the classification takes place, information is extracted from the confusion matrix to focus the computational efforts on those activities with worse classification performance. It is argued that activities differ diversely from each other, therefore a specific set of features may be informative to classify a specific set of activities, but such informativeness should not necessarily be extended to a different activity set. This approach has shown promising results, achieving important classification accuracy improvements.
Conference Paper
Full-text available
In this paper we propose an approach to a chatbot software that is able to learn from interaction via text messaging between human-bot and bot-bot. The bot listens to a user and decides whether or not it knows how to reply to the message accurately based on current knowledge, otherwise it will set about to learn a meaningful response to the message through pattern matching based on its previous experience. Similar methods are used to detect offensive messages, and are proved to be effective at overcoming the issues that other chatbots have experienced in the open domain. A philosophy of giving preference to too much censorship rather than too little is employed given the failure of Microsoft Tay. In this work, a layered approach is devised to conduct each process, and leave the architecture open to improvement with more advanced methods in the future. Preliminary results show an improvement over time in which the bot learns more responses. A novel approach of message simplification is added to the bot’s architecture, the results suggest that the algorithm has a substantial improvement on the bot’s conversational performance at a factor of three.
Conference Paper
Full-text available
Many image classification models have been introduced to help tackle the foremost issue of recognition accuracy. Image classification is one of the core problems in Computer Vision field with a large variety of practical applications. Examples include: object recognition for robotic manipulation, pedestrian or obstacle detection for autonomous vehicles, among others. A lot of attention has been associated with Machine Learning, specifically neural networks such as the Convolutional Neural Network (CNN) winning image classification competitions. This work proposes the study and investigation of such a CNN architecture model (i.e. Inception-v3) to establish whether it works best in terms of accuracy and efficiency with new image datasets via Transfer Learning. The retrained model is evaluated, and the results are compared to some state-of-the-art approaches.
Article
Full-text available
More neuroscience researchers are using scalp electroencephalography (EEG) to measure electrocortical dynamics during human locomotion and other types of movement. Motion artifacts corrupt the EEG and mask underlying neural signals of interest. The cause of motion artifacts in EEG is often attributed to electrode motion relative to the skin, but few studies have examined EEG signals under head motion. In the current study, we tested how motion artifacts are affected by the overall mass and surface area of commercially available electrodes, as well as how cable sway contributes to motion artifacts. To provide a ground-truth signal, we used a gelatin head phantom with embedded antennas broadcasting electrical signals, and recorded EEG with a commercially available electrode system. A robotic platform moved the phantom head through sinusoidal displacements at different frequencies (0-2 Hz). Results showed that a larger electrode surface area can have a small but significant effect on improving EEG signal quality during motion and that cable sway is a major contributor to motion artifacts. These results have implications in the development of future hardware for mobile brain imaging with EEG.
Article
Full-text available
Deep Evolutionary Network Structured Representation (DENSER) is a novel approach to automatically design Artificial Neural Networks (ANNs) using Evolutionary Computation (EC). The algorithm not only searches for the best network topology (e.g., number of layers, type of layers), but also tunes hyper-parameters, such as, learning parameters or data augmentation parameters. The automatic design is achieved using a representation with two distinct levels, where the outer level encodes the general structure of the network, i.e., the sequence of layers, and the inner level encodes the parameters associated with each layer. The allowed layers and hyper-parameter value ranges are defined by means of a human-readable Context-Free Grammar. DENSER was used to evolve ANNs for two widely used image classification benchmarks obtaining an average accuracy result of up to 94.27% on the CIFAR-10 dataset, and of 78.75% on the CIFAR-100. To the best of our knowledge, our CIFAR-100 results are the highest performing models generated by methods that aim at the automatic design of Convolutional Neural Networks (CNNs), and is amongst the best for manually designed and fine-tuned CNNs .