A Deep Evolutionary Approach to Bioinspired Classifier
Optimisation for Brain-Machine Interaction
Jordan J. Bird , Diego R. Faria, Luis J. Manso,
Anikó Ekárt, and Christopher D. Buckingham
School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK
Correspondence should be addressed to Jordan J. Bird; email@example.com
Received 14 December 2018; Accepted 21 February 2019; Published 13 March 2019
Academic Editor: Danilo Comminiello
Copyright © Jordan J. Bird et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
is study suggests a new approach to EEG data classication by exploring the idea of using evolutionary computation to both select
useful discriminative EEG features and optimise the topology of Articial Neural Networks. An evolutionary algorithm is applied
to select the most informative features from an initial set of EEG statistical features. Optimisation of a Multilayer Perceptron
(MLP) is performed with an evolutionary approach before classication to estimate the best hyperparameters of the network.
Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of
models is tested for each problem. ree experiments are provided for comparison using dierent classiers: one for attention
state classication, one for emotional sentiment classication, and a third experiment in which the goal is to guess the number a
subject is thinking of. e obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of .%, .%, and
.% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to
the Adaptive Boosted LSTM for the two rst experiments and signicantly higher for the number-guessing experiment with an
Adaptive Boosted DEvo MLP reaching .%, while being signicantly quicker to train and classify. In particular, the accuracy
of the nonboosted DEvo MLP was of .%, .%, and .% in the same benchmarks. Two datasets for the experiments were
gathered using a Muse EEG headband with four electrodes corresponding to TP, AF, AF, and TP locations of the international
EEG placement standard. e EEG MindBigData digits dataset was gathered from the TP, FP, FP, and TP locations.
Bioinspired algorithms have been extensively used as robust
and ecient optimisation methods. Despite the fact that
they have been criticised for being computationally expen-
sive, they have also been proven useful to solve complex
optimisation problems. With the increasing availability of
computing resources, bioinspired algorithms are growing in
popularity due to their eectiveness at optimising complex
problem solutions. Scientic studies of natural optimisation
from many generations past, such as Darwinian evolution,
are now becoming viable inspiration for solving real-world
is increasing resource availability is also allowing for
more complex computing in applications such as Internet of
ings (IoT), Human-Robot Interaction (HRI), and human-
computer interaction (HCI), providing more degrees of both
control and interaction to the user. One of these degrees of
control is the source of all others, the human brain, and it can
be observed using electroencephalography. At its beginning
EEG was an invasive and uncomfortable method, but with
the introduction of dry, commercial electrodes, EEG is now
fully accessible even outside of laboratory setups.
It has been noted that a large challenge in brain-machine
interaction is inferring the attentional and emotional states
from particular patterns and behaviours of electrical brain
activity. Large amounts of data are needed to be acquired
from EEG, since the signals are complex, nonlinear, and
nonstationary. To generate discriminative features to describe
a wave requires the statistical analysis of time window inter-
vals. is study focuses on bringing together previous related
research and improving the state-of-the-art with a Deep
Evolutionary (DEvo) approach when optimising bioinspired
classiers. e application of this study allows for a whole
Volume 2019, Article ID 4316548, 14 pages
bioinspired and optimised approach for mental attention
classication and emotional state classication and to guess
the number in which a subject thinks of. ese states can then
be taken forward as states of control in, for example, human-
In addition to the experimental results, the contributions
of the work presented in this paper are as follows:
(i) An eective framework for classication of complex
signals (brainwave data) through processes of evolu-
tionary optimisation and bioinspired classication.
(ii) A new evolutionary approach to hyperheuristic bioin-
spired classiers to prevent convergence on local
minima found in the EEG feature space.
(iii) To gain close to identical accuracies, and in one
case exceeding them, with resource-intensive deep
learning through the optimised processes found in
e remainder of this article proceeds as follows: Sec-
tion provides an exploration of the state-of-the-art works
related to this study, briey introducing the most relevant
concepts applied into the DEvo approach to machine learning
with electroencephalographic data. Section describes the
methods used to perform the experiments performed. e
results of the experiments, including graphical representa-
in Section . Section details the conclusions extracted from
the experiments and the suggested future work.
2.1. Electroencephalography and Machine Learning with EEG.
Electroencephalography, or EEG, is the measurement and
recording of electrical activity produced by the brain .
e collection of EEG data is carried out through the use
of applied electrodes, which read minute electrophysiological
currents produced by the brain due to nervous oscillation
[, ]. e most invasive form of EEG is subdural  in which
invasive techniques require electrodes to be placed around
the cranium, of which the disadvantage is that signals are
electrical data is measured in microVolts (uV), which over
time produce wave patterns.
ovheim’s study produced a new three-dimensional
way of graphically representing human emotion in terms of
categories and hormone levels . is graphical represen-
categories found in Table . Each vertex of the cube represents
a centroid of an emotional category. It is worth noting that
categories are not completely concrete, and that emotions
are experienced in gradient, as well as overlapping between
categories . It is this chemical composition that causes
certain nervous oscillation and thus electrical brainwave
activity . us, the brainwave activity can be used as data
to estimate human emotions.
e Muse headband is a commercially available EEG
recording device with four electrodes placed on the TP,
F : L¨
ovheim’s cube: mapping levels of noradrenaline,
dopamine, and serotonin to human emotion.
F : EEG sensors TP, AF, AF, and TP of the Muse
headband on the international standard EEG placement system .
placement system . ese can be seen in Figure . Because
the signals are quite weak in nature, signal noise is a major
issue due to it eectively masking the useful information
. e EEG headband employs various artefact separation
techniques to best retain the brainwave data and discard
along with machine learning techniques to measure dierent
levels of user enjoyment, treating it as a gradient much like in
sentiment analysis projects, researchers successfully managed
to measure dierent levels of a user’s enjoyment [, ]
while playing mobile phone games. Muse headbands are also
oen used in neuroscience research projects due to their
low-cost and ease of deployment (since they are a consumer
product), as well as its eectiveness in terms of classication
and accuracy . In this experiment, binary classication
of two physical tasks achieved % accuracy using Bayesian
Previous work with the Muse headband used classical
and ensemble machine learning techniques to accurately
classify both mental  and emotional  states based on
datasets generated by statistical extraction. e application of
statistical extraction as a form of data preprocessing is useful
across many platforms, e.g., for semantic place recognition
in human-robot interaction [, ]. Machine learning tech-
niques with inputs being that of statistical features of the wave
are commonly used to classify mental states [, ] for brain-
machine interaction, where states are used as dimensions
of user input. Probabilistic methods such as Deep Belief
Networks, Support Vector Machines, and various types of
neural network have been found to experience varying levels
of success in emotional state classication, particularly in
binary classication .
EEG brainwave data classication is a contemporary
focus in the medical elds; abnormalities in brainwave
activity have been successfully classied as those leading to
a stroke using a Random Forest classication method .
In addition to the detection of a stroke, researchers also
found that monitoring classied brain activity aided suc-
cessfully with rehabilitation of motor functions aer stroke
when coupled with human-robot interaction . Brainwave
classication has also been very successful in the preemptive
detection of epileptic seizures in both adults and newborn
infants [, ]. e classication of minute parts of the sleep-
wake cycle is also a focus of medical researchers in terms
of EEG data mining. Low resolution, three-state (awake,
sleep, and REM sleep) EEG data was classied with Bayesian
methods to a high accuracy of -% in both humans
of classication of these states as well as the cross-domain
application between human and rat brains. Random Forest
classication of an extracted set of statistical EEG attributes
could classify sleeping patterns with higher resolution than
that of the previous study at around % accuracy .
It is worth noting that for a real-time averaging technique
(prediction of a time series of, for example, every second),
only majority classication accuracies at >% would be
required, though the time series could be trusted at shorter
lengths with better results from the model.
Immune Clonal Algorithm, or ICA, has been suggested
as a promising method for EEG brainwave feature extrac-
tion through the generation of mathematical temporal wave
descriptors . is approach found success in classication
of epileptic brain activity through generated features as
inputs to Naive Bayes, Support Vector Machine, K-Nearest
Neighbours, and Linear Discriminant Analysis classiers.
Autonomous classication through aective computing
in human-machine interaction is a very contemporary area
of research due to the increasing amounts of computa-
tional resources available, including, but not limited to,
facial expression recognition , Sentiment Analysis ,
human activity recognition [, ], and human behaviour
recognition [, ]. In terms of social human-machine
interaction, a Long Short-Term Memory network was found
to be extremely useful in user text analysis to derive an
T : Exposition of emotional categories of Figure .
Emotion Category Emotion
aective sentiment based on negative and positive polarities
 and was used in the application of a chatbot.
2.2. Evolutionary Algorithms. An evolutionary algorithm will
search a problem space inspired by the natural process of
Darwinian evolution . Solutions are treated as living
organisms that, as a population, will produce more ospring
that can survive. Where each solution has a measurable
tness,asurvival of the ttest will occur, causing the weaker
solutions to be killed o and allowing for the stronger to
survive . e evolutionary search in its simplest form will
follow this process:
() Create an initial random population solution
() Simulate the following until termination occurs:
(a) Using a chosen method, select parent(s) for use
in generating ospring(s)
(b) Evaluate the ospring’s tness
(c) Consider the whole population, and kill o the
e aforementioned algorithm is oen used to decide on
network parameters  since there is “no free lunch” 
when it comes to certain types of optimisation problems.
In particular, it has been demonstrated that the problem of
searching for the optimal parameters for a neural network
cannot be solved in polynomial time .
2.3. Multilayer Perceptron. AMultilayerPerceptronisatype
of Articial Neural Network (ANN) that can be used as a
universal function approximator and classier. It computes
a number of inputs through a series of layers of neurons,
nally outputting a prediction of class or real value. More than
one hidden layer forms a deep neural network. Output nodes
choice, or, if there is just one a regression output (e.g., stock
price prediction in GBP).
Learning is performed for a dened time measured in
Backpropagation is a case of automatic dierentiation in
which errors in classication or regression (when comparing
outputs of a network to ground truths) are passed backwards
from the nal layer, to derive a gradient which is then used
to calculate neuron weights within the network, dictating
their activation. at is, a gradient descent optimisation
algorithm is employed for the calculation of neuron weights
by computing the gradient of the loss function (error rate).
Aer learning, a more optimal neural network is generated
or attributes to class.
e process of weight renement for the set training time
is given as follows:
() Generate the structure of the network based on input
nodes, dened hidden layers, and required outputs.
() Pass the inputs through the network and generate
predictions as well as cost (errors).
() Compute gradients.
() Backpropagate errors and adjust neuron weights.
Euclidean or non-Euclidean space for regression. In classi-
cation problems, entropy is oen used, that is, the level of
randomness or predictability for the classication of a set:
Comparing the dierence of two measurements of entropy
(two models) gives the information gain (relative entropy).
is is the value of the Kullback-Leibler (KL) divergence
when a univariate probability distribution of a given attribute
is compared to another . e calculation with the entropy
A positive information gain denotes a lower error rate and
thus a better model, i.e., a more improved matrix of network
Denser is a related novel method of evolutionary opti-
misation of an MLP . Whereas this study focuses on
the search space of layer structure within fully connected
neural networks, Denser also considers the type of layer.
is increase of parameters to optimise grows the search
space massively and is a very computationally intensive algo-
rithm, which achieves very high results. Benchmarked is an
impressive result of . on the CIFAR- image recognition
deep neural networks with varying layers; researchers found
Roulette Selection (random) to be the best for selecting two
parents for ospring, and thus such selection was chosen
for this study’s evolutionary search. A method of “Extreme
Learning Machines” wasproposedfortheoptimisationof
deep learning processes and was extended to also perform
feature extraction within the topological layers of the model
LSTM block yo
F : Diagram of a standard block within a Long Short-Term
Memory network .
2.4. Long Short-Term Memory. Long Short-Term Memory
(LSTM) is a form of Articial Neural Network in which
multiple Recurrent Neural Networks (RNN) will predict
based on state and previous states. As seen in Figure , the
data structure of a neuron within a layer is an “LSTM Block”.
e general idea is as follows.
2.4.1. Forget Gate. e forget gate will decide on which
information to store and which to delete or “forget”:
wheretisthecurrenttimestep,Wf is the matrix of weights, h
is the previous output (t-1), xt is the batch of inputs as a single
vector, and nally bf is an applied bias.
2.4.2. Data Storage and Hidden State. Aer deciding which
information to forget, the unit must also decide which
information to remember. In terms of a cell input i,Ct is a
vector of new values generated.
Using the calculated variables in the previous operations,
the unit will follow a convolutional operation to update
2.4.3. Output. In the nal step, the unit will produce an
output at output gate Ot aer the other operations are
F : A graphical representation of the Deep Evolutionary (DEvo) approach to complex signal classication. An evolutionary algorithm
simulation selects a set of natural features before a similar approach is used, then this feature set becomes the input to optimise a bioinspired
complete, and the hidden state of the node is updated:
Due to the observed consideration of time sequences, i.e.,
previously seen data, it is oen found that time dependent
data (waves; logical sequences) are very eectively classied
thanks to the addition of unit memory. LSTMs are particu-
larly powerful when dealing with speech recognition  and
brainwave classication  due to their temporal nature.
2.5. Adaptive Boosting. Adaptive Boosting (AdaBoost) is an
algorithm which will create multiple unique instances of a
certain model to attempt to mitigate situations in which
selected parameters are less eective than others at a certain
time . e models will combine their weighted predic-
tions aer training on a random data subset to improve the
previous iterations. e fusion of models is given as
where Fis the set of classiers and xis the data object being
Building on top of previous works which have succeeded
using bioinspired classiers for prediction of biological pro-
cesses, this work suggests a completely bioinspired process. It
includes biological inspiration into every step of the process
rather than just the classication stage. e system as a whole
therefore has the following stages:
() Generation of an initial dataset of biological data,
EEG signals in particular (collection).
() Selection of attributes via biologically inspired com-
puting (attribute selection).
() Optimisation of a neural network via biologically
inspired computing (hyperheuristics).
() Use of an optimised neural network for the classica-
tion of the data (classication).
e steps allow for evolutionary optimisation of data
preprocessing as well as using a similar approach for deep
neural networks which also evolve. is leads to the Deep
Evolutionary or DEvo approach. A graphical representation of
the above steps can be seen in Figure . Nature is observed to
be close to optimal in both procedure and resources; the goal
of this process therefore is to best retain high accuracies of
to execute them.
e rest of this section serves to give details to the steps
3.1. Data Acquisition. As previously mentioned, the
paper at hand provides three experiments dealing with
the classication of the attentional, emotional state,
and “thinking of” stateofsubjects.Forthersttwo
sets of experiments, two datasets were acquired from
previous studies [, ]. e rst dataset (mental state)
distinguishes three dierent states related to how focused the
subject is: relaxed, concentrative, or neutral (https://www
is data was recorded for three minutes, per state, per
person of the subject group. e subject group was made
up of two adult males and two adult females aged 22±2.
e second dataset (emotional state) was based on whether
a person was feeling positive, neutral, or negative emotions
feeling-emotions). Six minutes for each state were recorded
a total of minutes of brainwave activity data. e
experimental setup of the Muse headset being used to
gather data from the TP, AF, AF, and TP extra-cranial
electrodes during a previous study  can be seen in Figure .
An example of the raw data retrieved from the headband can
be seen in Figure . Additionally, observations of the range
F : A subject having their EEG brainwave data recorded while
being exposed to a stimulus with an emotional valence .
of subjects for the two aforementioned datasets were made;
educational level was relatively high within the subjects, two
were PhD Students, one Master’s Student, and one with a BSc
degree, all from STEM elds. All subjects were in ne health,
both physical and mental. All subjects were from the United
Kingdom, three were from the West Midlands whereas one
was from Essex. All of the subjects volunteered to take part
in this study.
e two mental state datasets are a constant work in
progress in order to become representative of a whole human
population rather than those described in this section, the
data as-is provides a preliminary point of testing and a proof
of concept of the DEvo approach to bioinspired classier
optimisation, and this would be an ongoing process if
subject diversity has a noticeable impact, since the global
demographic oen changes.
For the third experiment, the “MindBigData” dataset
was acquired and processed (http://www.mindbigdata.com/
opendb/). is publicly available data is an extremely large
dataset gathered over the course of two years from one
between and including to for two seconds. is gives a
ten class problem. Due to the massive size of the dataset and
computational resources available, experiments for each
class were extracted randomly, giving a uniform extraction
of seconds per digit class and therefore seconds
of EEG brainwave data. It must be critically noted that a
machine learning model would be classifying this single
subject’s brainwaves, and in conjecture, transfer learning is
likely impossible. Future work should concern the gathering
of similar data but from a range of subjects. e MindBigData
dataset used a slightly older version of the Muse headband,
corresponding to two slightly dierent yet still frontal lobe
sensors, collecting data from the TP, FP, FP, and TP
3.2. Full Set of Features (Preselection). As described previ-
ously, feature extraction is based on previous research into
eective statistical attributes of EEG brainwave data .
is section describes the reasoning behind the necessity of
performing statistical extraction, as well as the method to
perform the process.
e EEG sensor used for the experiments, the Muse
headband, communicates with the computer using Bluetooth
Low energy (BLE). e use of this protocol improves the
autonomy of the sensor at the expense of a nonuniform
sampling rate. e rst step applied to normalise the dataset is
using a Fourier-based method to resample the data to a xed
frequency of Hz.
Brainwave data is nonlinear and nonstationary in nature,
mental classication is based on the temporal nature of the
wave, and not the values specically. For example, a simplied
concentrative and relaxed wave can be visually recognised
due to the fact that wavelengths of concentrative mental state
class data are far shorter, and yet, a value measured at any one
point might be equal for the two states (i.e., microVolts).
Additionally, the detection of the natures that dictate alpha,
beta, theta, delta, and gamma waves also requires analysis
over time. It is for these reasons that temporal statistical
extraction is performed. For temporal statistical extraction,
sliding time windows of total length s are considered, with
an overlap of . seconds. at is, windows run from [0−
1),[1.5−2.5),[2−3),[2.5−3), continuing until the
e remainder of this subsection describes the dierent
statistical features types which are included in the initial
temporal windows 1,2,3⋅⋅⋅𝑛are considered and
mean values are computed:
(ii) e standard deviation of values is recorded:
(iii) Asymmetry and peakedness of waves are statistically
represented by the skewness and kurtosis via the
statistical moments of the third and fourth order.
TP9 - 33.91
AF7 - 4.83
AF8 - 9.22
TP10 - 31.66
Right AUX - 29.87
F : An example of a raw EEG data stream from the Muse EEG headband. e Y-axis represents measured brainwave activity in
microVolts (mV) and the X-axis is the time at which the data was recorded.
are taken where k=rd and k=th moment about the
(iv) Max value within each particular time window
(v) Minimum value within each particular time window
dividing the time window in half, and measuring the
values from either half of the window.
(vii) Performing the min and max derivatives a second
time on the presplit window, resulting in the deriva-
tives of every .s time window.
(viii) For every min, max, and mean value of the four .s
time windows, the Euclidean distance between them
is measured. For example, the maximum value of time
measured between it and max values of windows two,
three, and four of four.
(ix) From the features generated from quarter-second
min, max, and mean derivatives, the last six features
can be generated. Using the Logarithmic Covariance
matrix model , a log-cov vector and thus statistical
features can be generated for the data as such
=log (cov ()). ()
U returns the upper triangular features of the resul-
tant vector and the covariance matrix (cov(M))is
cov ()=cov𝑖𝑗 =1
𝑁𝑖𝑘 −𝑖𝑘𝑗 −𝑗.()
(x) For each full s time window, the Shannon Entropy is
measured and considered as a statistical feature:
𝑗𝑗×log 𝑗. ()
e complexity of the data is summed up as such,
where h is the statistical feature and S relates to each
signal within the time window aer normalisation of
(xi) For each .s time window, the log-energy entropy is
where iis the rst time window nto n+0.5 and jis the
second time window n+0.5 to n+1.
(xii) Analysis of a spectrum is performed by an algorithm
recorded time window, derived as follows:
𝑛−𝑖2𝜋𝑘(𝑛/𝑁), =0,...,−1. ()
waves. With these features considered for each electrode
and time window (including those formed by overlaps), this
produces a total of scalars per measure. e resulting
number of features is too large to be used in real time (i.e.,
it would be computationally intensive) and would not yield
good classication results because of the large dimensionality.
Attribute selection is therefore performed to overcome this
limitations and, additionally, make the train process signi-
3.3. Evolutionary Optimisation and Machine Learning. e
evolutionary optimisation process as detailed previously was
applied when selecting discriminative attributes from the full
dataset for more optimised classication. An initial popu-
lation of attribute subsets were generated and simulated
for generations with tournament breeding selection .
Evolutionary optimisation was also applied to explore the n-
number of hidden layers, with the goal of searching for the
best accuracy (tness metric). With the selected attributes
forming the new dataset to be used in experiments, two
models were generated: an LSTM and an MLP.
Before nalising the LSTM model, various hyperparam-
eters are explored, specically the topology of the network.
is was performed manually since evolutionary optimisa-
tion of LSTM topology would have been extremely computa-
tionally expensive. More than one hidden layer oen returned
worse results during manual exploration and thus one hidden
layer was decided upon. LSTM units within this layer would
be tested from to at steps of units. Using a vector
of the time sequence statistical data as an input in batches of
data points, an LSTM was trained for epochs to predict
class for each number of units on a layer, and thus a manually
optimised topology was derived.
A Multilayer Perceptron was rst ne-tuned via an
evolutionary algorithm  with the number of neurons and
layers as population solutions, with classication accuracy
as a tness. A maximum of three hidden layers and up to
neurons per layer were implemented into the simulation.
Using -fold cross validation, the MLP had the following
parameters manually set:
(i) -epoch training time
(ii) Learning rate of .
(iii) Momentum of .
(iv) No decay
Finally, the two models were attemptedly boosted using
the AdaBoost algorithm in an eort to mitigate both the ill-
eects of manually optimising the LSTM topology as well as
ne-tune the models overall.
4. Results and Discussion
4.1. Evolutionary Attribute Selection. An evolutionary search
within the dimensions of the datasets was executed for
generations and a population of . For mental state, the
algorithm selected attributes, whereas for the emotional
state, the algorithm selected a far greater attributes for
the optimised dataset. is suggests that emotional state has
far more useful statistical attributes for classication whereas
mental state requires approx. % fewer. e MindBigData
EEG problem set, incomparable due to the previous due to
its larger range of classes, had attributes selected by the
algorithm. is can be seen in Table .
e evolutionary search considered the information gain
(Kullback-Leibler Divergence) of the attributes and thus
their classication ability as a tness metric, i.e., where a
higher information gain represents a more eective and less
entropic a model when such attributes are considered as
input parameters. e search selected large datasets, between
sizes for the MBD dataset, to the selected for the
emotional state dataset. ough too numerous to detail the
whole process (all datasets are available freely online for full
recreation of experiments), observations were as follows:
(i) For the mental state dataset, attributes were
selected; the highest was the entropy of the TP
electrode within the rst sliding window at an IG
of .. is was followed secondly with the eigen-
placement is a good indicator for concentrative states.
It must be noted that these values may possibly
correlate with the Sternocleidomastoid Muscle’s con-
tractional behaviours during stress and ergo the stress
encountered during concentration or the lack thereof
during relaxation, and thus EMG behaviours may be
inadvertently classied rather than EEG.
(ii) Secondly, for the emotional state dataset, the most
important attribute was observed to be the mean
ping time window. is gave an information gain of
., closely followed by a measure of . for the
rst covariance matrix of the rst sliding window.
Minimum, mean, and covariance matrix values of
. until standard deviation of electrodes followed.
Maximum values did not appear until the lower half
of the ranked data, in which the highest max value of
IG of ..
(iii) Finally, for the MBD dataset, few attributes were
chosen. is was not due to their impressive ability,
but due to the lack thereof when other attributes were
observed. For example, the most eective attribute
was considered the covariance matrix of the second
sliding windows of the frontal lobe electrodes, FP
of . and . each, far lower than those observed
in the other two experiments. To the lower end of
are considered very weak and yet still chosen by the
algorithm. e MBD dataset is thus an extremely
dicult dataset to classify.
Since the algorithm showed clearly a best attribute for
each, a benchmark was performed using a simple One Rule
Classier (OneR). OneR will focus on the values of the
best attribute and attempt to separate classes by numerical
rules. In Table , the observations above are shown more
concretely with statistical evidence. Classifying MindBigData
.% accuracy, whereas the far higher attributes for the other
two datasets gain .% and .% accuracies.
e datasets generated by this algorithm are taken for-
ward in the DEvo process, and the original datasets are thus
discarded. Further experiments are performed with this data
4.2. Evolutionary Optimisation of MLP. During the algo-
rithm’s process, an issue arose with stagnation, in which the
solutions would quickly converge on a local minima and
an optimal solution was not found. On average, no further
improvement would be made aer generation . It can be
noted that the relatively at gradient in Figures and
suggests that the search space’s tness matrix possibly had a
T : Datasets generated by evolutionary attribute selection.
Dataset Population Generations No. Chosen Attributes
T : Accuracies when attempting to classify based on only one
attribute of the highest information gain.
Dataset MS ES MBD
Benchmark Accuracy (%) . . .
Global Best Accuracy (%)
F : ree evolutionary algorithm simulations to optimise an
MLP for the mental state dataset.
much lower standard deviation and thus the area was more
dicult to traverse due to the lack of noticeable peaks and
troughs. e algorithm was altered to prevent genetic collapse
with the addition of speciation. e changes were as follows:
(i) A solution would belong to one of three species, A, B,
(ii) A solution’s species label would be randomly ini-
tialised along with the population members.
(iii) During selection of parent1’s breeding partner, only a
member of parent1’s species could be chosen.
(iv) If only one member of a species remains, it will not
(v) An ospring will have a small random chance to
become another species (manually tuned to %)
e implementation of separate species in the simulation
allowed for more complex, better solutions to be discovered.
e increasing gradients as observed in Figures , , and
show that constant improvement was achieved. e evolu-
tionary optimisation of MLP topology was set to run for a
Global Best Accuracy (%)
F : ree evolutionary algorithm simulations to optimise an
MLP for the emotional state dataset.
set generations, tested for scientic benchmark accuracy
three times due to the possibility of a single random mutation
nding a good result by chance (random search), taking
approximately ten minutes for each to execute.
is was repeated three times for purposes of scientic
accuracy. Tables , , and detail the accuracy values
measured at each generation along with detail of the network
topology. Figures , , and graphically represent these
experiments to detail the gradient of solution score increase.
4.3. Manual LSTM Tuning. Manual tuning was performed to
explore the options for LSTM topology for both mental state
and emotional state classication. Evolutionary optimisation
was not applied due to the high resource usage of LSTM
training, due to many single networks taking multiple hours
Results in Table show that, for mental state, LSTM units
covered to be most optimal for emotional state classication
and LSTM units are best for the MindBigData digit set but
this result is extremely low for a uniform -class problem,
with very little information gain. Comparison of the LSTM
units to accuracy for both states can be seen in Figure .
For each of the experiments, these arrangements of LSTM
architecture will be taken forward as the selected model.
Additionally, empirical testing found that epochs for
training of units seemed best but further exploration is
T : Global best MLP solutions for mental state classication.
Accuracy ( %) . . . . . . . . . 79.8061
Neurons , , , , ,
Accuracy ( %) . . . . . . . . . .
Accuracy ( %) . . . . . . . . . .
T : Global best MLP solutions for emotional state classication.
123456 7 8 910
Accuracy ( %) . . . . . . . . . .
Neurons , ,
Accuracy ( %) . . . . . . . . . 96.1069
Neurons ,, , , , , , , , , ,
Accuracy ( %) . . . . . . . . . .
Global Best Accuracy (%)
F : ree evolutionary algorithm simulations to optimise an
MLP for the MindBigData dataset.
required to ne-tune this parameter. A batch size of
formed the input vectors of sequential statistical brainwave
data for the LSTM. Gradient descent was handled by the
Adaptive Moment Estimation (Adam) algorithm, with a
decay value of .. Weights were initialised by the commonly
50 75 100 12525
Hidden LSTM Units
F : Manual tuning of LSTM topology for mental state (MS),
emotional state (ES), and MindBigData (MBD) classication.
used XAVIER algorithm. Optimisation was performed by
Stochastic Gradient Descent. Manual experiments found that
a network with a depth of persistently outperformed deeper
networks of two or more hidden layers for this specic
context; interestingly, this too is mirrored in the evolutionary
T : Global best MLP solutions for MindBigData classication.
Accuracy ( %) . . . . . . . . . .
Neurons , ,
Accuracy ( %) . . . . . . . . . .
Neurons , , ,
Accuracy ( %) . . . . . . . . . 27.0718
T : Manual tuning of LSTM topology for mental state (MS),
emotional state (ES), and EEG MindBigData classication.
LSTM Units MS (%) ES (%) MBD (%)
. 96.86 .
. . .
. . .
83.84 . 10.77
. . .
Approx. time to train (s)
F : Graph to show the time taken to build the nal models
optimisation algorithms for the MLP which always converged
to a single layer to achieve higher tness.
4.4. Single and Boost Accuracy. Figure shows a comparison
that -fold cross validation was performed to prevent over-
tting and thus the actual time taken with this in mind is
around ten times more than the displayed value. Additionally,
this time was measured when training on the CUDA
cores of an NVidia GTX (GB) would take considerably
longer on a CPU. Although the mental state dataset had
approximately ve times the number of attributes, the time
taken to learn on this dataset was only slightly longer than
Since the LSTM topology was linearly tuned in a manual
algorithm, the processes are not scientically comparable
since the former depends on human experience and latter
upon resources available. us, time for these processes are
not given since only one is a measure of computational
resource usage; it is suggested that a future study should make
use of the evolutionary algorithm within the search space of
LSTM topologies too, in which case they can be compared.
an LSTM would take considerably longer due to the increased
resources required in every experiment performed compared
to the MLP. Additionally, with this in mind, a Multiobjective
Optimisation (MOO) implementation of DEvo that consid-
ers both accuracy and resource usage as tness metrics could
further nd more optimal models in terms of both their
classication ability and optimal execution.
e overall results of the experiments can be seen rstly
in Table and as a graphical comparison in Figure . For
the two three-state datasets, the most accurate model was an
AdaBoosted LSTM with results of .% and .% accura-
cies for the mental state and mental emotional state datasets,
respectively. e single LSTM and evolutionary-optimised
MLP models come relatively close to the best result, though
take far less time to train when the measured approximate
values in Figure are observed. On the other hand, for the
MindBigData digits dataset, the best solution by far was the
applied to the LSTM that previously improved them actually
caused a loss in accuracy.
Manual tuning of LSTM network topology was per-
formed due to the limited computational resources available;
the success in optimisation of the MLP suggests that further
improvements could be made through an automated process
of evolutionary optimisation in terms of the LSTM topology.
A further improvement to the DEvo system could be made
T : Classication accuracy on the two optimised datasets by the DEvo MLP, LSTM, and selected boost method.
Dataset Accuracy (%) Boost Accuracy (%)
DEvo MLP LSTM AB(DEvo MLP) AB(LSTM)
Mental State . 83.84 . 84.44
Emotional State . 96.86 . 97.06
MindBigData Digits . . 31.35 .
79.81 83.84 79.7
96.11 96.86 96.23 97.06
F : Final results for the experiment.
by exploring the possibility of optimising the LSTM structure
through an evolutionary approach. In addition, more bioin-
spired classication techniques should be experimented with,
for example, a convolutional neural network to better imitate
and improve on the classication ability of natural vision .
e three experiments were performed within the lim-
electrodes. Higher resolution EEG setups would allow for
further exploration of the system in terms of mental data
classication, e.g., for physical movement originating from
the motor cortex.
is study suggested DEvo, a Deep Evolutionary, approach
to optimise and classify complex signals using bioinspired
computing methods in the whole pipeline, from feature selec-
tion to classication. For mental state and mental emotional
state classication of EEG brainwaves and their mathematical
features, two best models were produced:
() A more accurate AdaBoosted LSTM, that although it
took more time and resources to train in comparison
to other methods, it managed to attain accuracies
of .% and .% for the two rst datasets
(attentional and emotional state classication).
() Secondly, a AdaBoosted Multilayer Perceptron that
was optimised using a hyperheuristic evolutionary
algorithm. ough its classication accuracy was
slightly lower than that of the AdaBoosted LSTM
(.% and .% for the same two experiments), it
took less time to train.
For the MindBigData digits dataset the most accurate
model was an Adaptive Boosted version of the DEvo opti-
mised MLP, which achieved an accuracy of %. For this
problem, none of the LSTMs were able to achieve any
meaningful or useful results, but the DEvo MLP approach
saved time and also produced results that were useful. Results
were impressive for application due to the high classication
ability along with the reduction of resource usage; real-time
training from individuals would be possible and thusprovide
a more accurate EEG-based product to the consumer, for
example, in real-time monitoring of mental state for the
grading of meditation or yoga session quality. Real-time
communication would also be possible in human-computer
interaction where the brain activity acts as a degree of input.
e goal of the experiment was successfully achieved, the
DEvo approach has led to an optimised, resource-light model
that closely matches that to an extremely resource heavy
deep learning model, losing a small amount of accuracy but
computing in approximately % of the time, except for in one
case in which it far outperformed its competitor models.
e aforementioned models were trained on a set of
attributes that were selected with a bioinspired evolutionary
e success of these processes led to future work sugges-
tions, which follow the pattern of further bioinspired opti-
misation applications within the eld of machine learning.
Future work should also consider, for better application of
the process within the eld of electroencephalography, a
much larger collection of data from a considerably more
diverse range of subjects in order to better model the classier
optimisation for the thought pattern of a global population
rather than the subjects encompassed within this study.
all datasets can be found within the data acquisition section.
Conflicts of Interest
e authors declare that they have no conicts of interest.
 H. H. Jasper, “e ten-twenty electrode system of the interna-
tional federation,” Clinical Neurophysiology,vol.,pp.–,
Schmidhuber, “LSTM: a search space odyssey,” IEEE Transac-
tions on Neural Networks and Learning Systems,vol.,no.,
pp. –, .
“Mental emotional sentiment classication with an eeg-based
brain-machine interface,” in Proceedings of theInternational
 E.NiedermeyerandF.L.daSilva,Electroencephalography: Basic
Principles, Clinical Applications, and Related Fields,Lippincott
Williams & Wilkins, .
 A. Coenen, E. Fine, and O. Zayachkivska, “Adolf beck: a
forgotten pioneer in electroencephalography,” Journal of the
History of the Neurosciences,vol.,no.,pp.–,.
 B. E. Swartz, “e advantages of digital over analog recording
techniques,” Electroencephalography and Clinical Neurophysiol-
monitoring: Indications and presurgical planning,” Annals of
Indian Academy of Neurology, vol. , no. , pp. S–S, .
 B. A. Taheri, R. T. Knight, and R. L. Smith, “A dry electrode
for EEG recording,” ElectroencephalographyandClinicalNeu-
 H. L¨
ovheim, “A new three-dimensional model for emotions and
monoamine neurotransmitters,” Medical Hypothes es,vol.,no.
, pp. –, .
 K.OatleyandJ.M.Jenkins,Understanding Emotions,Blackwell
 J. Gruzelier, “A theory of alpha/theta neurofeedback, creative
performance enhancement, long distance functional connectiv-
ity and psychological integration,” Cognitive Processing,vol.,
no. , pp. –, .
 E.-R. Symeonidou, A. D. Nordin, W. D. Hairston, and D.
P. Ferris, “Eects of cable sway, electrode surface area, and
electrode mass on electroencephalography signal qualityduring
 A. S. Oliveira, B. R. Schlink, W. D. Hairston, P. K ¨
onig, and D.
P. Ferris, “Induction and separation of motion artifacts in EEG
data using a mobile phantom head device,” Journal of Neural
 M. Abujelala, A. Sharma, C. Abellanoza, and F. Makedon,
“Brain-EE: Brain enjoyment evaluation using commercial EEG
headband,” in Proceedings of the 9th ACM International Confer-
ence on Pervasive Technologies Related to Assistive Environments,
PETRA 2016, p. , ACM, Greece, July .
time EEG analysis for assessing ow in games,” in Proceedings
of the 12th IEEE International Conference on Advanced Learning
Tech n o l o g ies, ICALT 2012, pp. -, Italy, July .
 O. E. Krigolson, C. C. Williams, A. Norton, C. D. Hassall, and F.
L. Colino, “Choosing MUSE: Validation of a low-cost, portable
EEG system for ERP research,” Frontiers in Neuroscience, vol. ,
p. , .
Faria, “A study on mental state classication using eeg-based
brain-machine interface,” in Proceedings of the 9th International
Conference on Intelligent Systems, IEEE, .
 C. Premebida, D. R. Faria, F. A. Souza, and U. Nunes, “Applying
probabilistic Mixture Models to semantic place classication in
mobile robotics,” in Proceedings of the IEEE/RSJ International
Conference on Intelligent Robots and Systems, IROS 2015,p
–, Germany, October .
 C. Premebida, D. R. Faria, and U. Nunes, “Dynamic Bayesian
network for semantic place classication in mobile robotics,”
 T. Y. Chai, S. S. Woo, M. Rizon, and C. S. Tan, “Classication of
human emotions from eeg signals using statistical features and
neural network,” International Journal of Integrated Engineering
hypnagogic EEG measured by a new scoring system,” SLEEP,
vol. , no. , pp. –, .
 W. Zheng, J. Zhu, Y. Peng, and B. Lu, “EEG-based emotion
classication using deep belief networks,” in Proceedings of the
2014 IEEE International Conference on Multimedia and Expo
(ICME), pp. –, Chengdu, China, July .
 K. G. Jordan, “Emergency EEG and continuous EEG monitor-
ing in acute ischemic stroke,” Journal of Clinical Neurophysiol-
 K. K. Ang, C. Guan, K. S. G. Chua et al., “Clinical study of
neurorehabilitation in stroke using EEG-based motor imagery
brain-computer interface with robotic feedback,” in Proceedings
of the 2010 32nd Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, EMBC’10,pp.
–, Argentina, September .
 A. T. Tzallas, M. G. Tsipouras, and D. I. Fotiadis, “Epileptic
seizure detection in EEGs using time-frequency analysis,” IEEE
Transactions on Information Technology in Biomedicine,vol.,
based system for EEG seizure detection in newborn infants,”
 K.-M. Rytk¨
onen, J. Zitting, and T. Porkka-Heiskanen, “Auto-
mated sleep scoring in rats and mice using the naive Bayes
classier,” Journal of Neuroscience Methods,vol.,no.,pp.
 L. Fraiwan, K. Lweesy, N. Khasawneh, H. Wenz, and H.
Dickhaus, “Automated sleep stage identication system based
on time-frequency analysis of a single EEG channel and random
forest classier,” Computer Methods & Programs in Biomedicine,
 Y. Peng and B.-L. Lu, “Immune clonal algorithm based feature
selection for epileptic EEG signal classication,” in Proceedings
of the 2012 11th International Conference on Information Science,
Signal Processing and their Applications, ISSPA 2012,pp.–
, Canada, July .
 D. R. Faria, M. Vieira, F. C. C. Faria, and C. Premebida,
“Aective facial expressions recognition for human-robot inter-
action,” in Proceedingsofthe26thIEEEInternationalSymposium
on Robot and Human Interactive Communication, RO-MAN
2017, pp. –, Portugal, September .
 J. J. Bird, A. Ekart, and D. R. Faria, “High resolution sentiment
analysis by ensemble classication,” in Proceedings of the SAI
Computing Conference, SAI, 2019,.
 C. Coppola, D. R. Faria, U. Nunes, and N. Bellotto, “Social
activity recognition based on probabilistic merging of skeleton
features with proximity priors from RGB-D data,” in Proceed-
ings of the 2016 IEEE/RSJ International Conference on Intelligent
Robots and Systems, IROS 2016,pp.–,Republicof
Korea, October .
 D. A. Adama, A. Lot, C. Langensiepen, and K. Lee, “Human
activities transfer learning for assistive robotics,” in Proceedings
of the UK Workshop on Computational Intelligence, pp. –,
 S. W. Yahaya, C. Langensiepen, and A. Lot, “Anomaly detec-
tion in activities of daily living using one-class support vector
machine,” in Proceedings of the UK Workshop on Computational
 D. Ortega-Anderez, A. Lot, C. Langensiepen, and K. Appiah,
“A multi-level renement approach towards the classication
of quotidian activities using accelerometer data,” Journal of
Ambient Intelligence and Humanized Computing,pp.–,.
 J. J. Bird, A. Ek ´
art, and D. R. Faria, “Learning from inter-
action: An intelligent networked-based human-bot and bot-
bot chatbot system,” in Proceedings of the UK Workshop on
Computational Intelligence, pp. –, Springer, .
 P. A. Vikhar, “Evolutionary algorithms: a critical review and
its future prospects,” in Proceedings of the 2016 International
Conference on Global Trends in Signal Processing, Information
Computing and Communication, ICGTSPICC 2016, pp. –,
India, December .
 C. Darwin, On the origin of species, 1859,Routledge,.
 J. J. Bird, A. Ekart, and D. R. Faria, “Evolutionary optimisation
of fully connected articial neural network topology,” in Pro-
ceedings of the SAI Computing Conference, SAI, 2019,.
 D. H. Wolpert and W. G. Macready, “No free lunch theorems for
optimization,” IEEE Transactions on Evolutionary Computation,
vol. , no. , pp. –, .
 D. E. Knuth, “Postscript about NP-hard problems,” ACM
 Y. Bengio, I. J. Goodfellow, and A. Courville, “Deep learning,”
 S. Kullback and R. A. Leibler, “On information and suciency,”
Annals of Mathematical Statistics,vol.,no.,pp.–,.
 F. Assunc¸˜
ao, N. Lourenc¸o,P.Machado,andB.Ribeiro,
“DENSER: deep evolutionary network structured representa-
tion,” Genetic Programming and Evolvable Machines,pp.–,
 A. Mart´
ın, R. Lara-Cabrera, F. Fuentes-Hurtado, V. Naranjo,
and D. Camacho, “EvoDeep: a new evolutionary approach for
automatic deep neural networks parametrisation,” Journal of
Parallel and Distributed Computing,vol.,pp.–,.
 Y. Peng and B.-L. Lu, “Discriminative extreme learning machine
with supervised sparsity preserving for image classication,”
 A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech
recognition with deep bidirectional LSTM,” in Proceedings
of the 2013 IEEE Workshop on Automatic Speech Recognition
and Understanding, ASRU 2013, pp. –, Czech Republic,
behavioral microsleeps using EEG and LSTM recurrent neural
networks,” in Proceedings of the 2005 27th Annual International
Conference of the Engineering in Medicine and Biology Society,
IEEE-EMBS 2005, pp. –, China, September .
 Y. Freund and R. E. Schapire, “A decision-theoretic generaliza-
tion of on-line learning and an application to boosting,” Journal
of Computer and System Sciences,vol.,no.,part,pp.–
 R. Rojas, “Adaboost and the super bowl of classiers a tutorial
introduction to adaptive boosting,” Tech. Rep., Freie University,
Berlin, Germany, .
 T. Y. M. Chiu, T. Leonard, and K.-W. Tsui, “e matrix-
logarithmic covariance model,” Journal of the American Statis-
tical Association, vol. , no. , pp. –, .
 C. van Loan, Computational Frameworks for the Fast Fourier
Transf orm ,vol.ofFrontiers in Applied Mathematics,Society
for Industrial and Applied Mathematics (SIAM), Philadelphia,
 T. Back, Evolutionary Algorithms in eory and Practice: Evolu-
tion Strategies, Evolutionary Programming, Genetic Algorithms,
Oxford University Press, .
 M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer
learning for image classication,” in Proceedings of the UK
Workshop on Computational Intelligence,pp.–,Springer,