ArticlePDF Available

Abstract and Figures

Objective The novelty of this study consists of the exploration of multiple new approaches of data pre-processing of brainwave signals, wherein statistical features are extracted and then formatted as visual images based on the order in which dimensionality reduction algorithms select them. This data is then treated as visual input for 2D and 3D CNNs which then further extract 'features of features'. Approach Statistical features derived from three electroencephalography datasets are presented in visual space and processed in 2D and 3D space as pixels and voxels respectively. Three datasets are benchmarked, mental attention states and emotional valences from the four TP9, AF7, AF8 and TP10 10-20 electrodes and an eye state data from 64 electrodes. 729 features are selected through three methods of selection in order to form 27x27 images and 9x9x9 cubes from the same datasets. CNNs engineered for the 2D and 3D preprocessing representations learn to convolve useful graphical features from the data. Main results: A 70/30 split method shows that the strongest methods for classification accuracy of feature selection are One Rule for attention state and Relative Entropy for emotional state both in 2D. In the eye state dataset 3D space is best, selected by Symmetrical Uncertainty. Finally, 10-fold cross validation is used to train best topologies. Final best 10-fold results are 97.03% for attention state (2D CNN), 98.4% for Emotional State (3D CNN), and 97.96% for Eye State (3D CNN). Significance: The findings of the framework presented by this work show that CNNs can successfully convolve useful features from a set of pre-computed statistical temporal features from raw EEG waves. The high performance of K-fold validated algorithms argue that the features learnt by the CNNs hold useful knowledge for classification in addition to the pre-computed features.
Content may be subject to copyright.
A Study on CNN Image Classification of EEG
Signals represented in 2D and 3D
Jordan J. Bird1, Diego R. Faria2, Luis J. Manso3, Pedro P.S.
Ayrosa4, and Anik´o Ek´art5
1,2,3Aston Robotics Vision and Intelligent Systems Lab (ARVIS Lab), Aston
University, United Kingdom
E-mail: {birdj11, d.faria2, l.manso3}@aston.ac.uk
4Universidade Estadual de Londrina, Londrina, Brazil
E-mail: ayrosa@uel.br
5School of Engineering and Applied Science, Aston University, United Kingdom
E-mail: a.ekart@aston.ac.uk
August 2020
Abstract.
Objective: The novelty of this study consists of the exploration of multiple new
approaches of data pre-processing of brainwave signals, wherein statistical features
are extracted and then formatted as visual images based on the order in which
dimensionality reduction algorithms select them. This data is then treated as visual
input for 2D and 3D CNNs which then further extract ’features of features’.
Approach: Statistical features derived from three electroencephalography datasets
are presented in visual space and processed in 2D and 3D space as pixels and voxels
respectively. Three datasets are benchmarked, mental attention states and emotional
valences from the four TP9, AF7, AF8 and TP10 10-20 electrodes and an eye state
data from 64 electrodes. 729 features are selected through three methods of selection in
order to form 27x27 images and 9x9x9 cubes from the same datasets. CNNs engineered
for the 2D and 3D preprocessing representations learn to convolve useful graphical
features from the data.
Main results: A 70/30 split method shows that the strongest methods for
classification accuracy of feature selection are One Rule for attention state and Relative
Entropy for emotional state both in 2D. In the eye state dataset 3D space is best,
selected by Symmetrical Uncertainty. Finally, 10-fold cross validation is used to train
best topologies. Final best 10-fold results are 97.03% for attention state (2D CNN),
98.4% for Emotional State (3D CNN), and 97.96% for Eye State (3D CNN).
Significance: The findings of the framework presented by this work show that CNNs
can successfully convolve useful features from a set of pre-computed statistical temporal
features from raw EEG waves. The high performance of K-fold validated algorithms
argue that the features learnt by the CNNs hold useful knowledge for classification in
addition to the pre-computed features.
Classification of EEG Signals represented in 2D and 3D 2
1. Introduction
Recent advances in consumer facing technologies have enabled machines to have non-
human skills. Inputs which once mirrored one’s natural senses such as vision and sound
have been expanded beyond the natural realms [1]. An important example of this is
the growing consumerist availability of the field of electroencephalography (EEG) [2,3];
the detection of thoughts, actions, and feelings from the human brain. To engineer
such technologies, researchers must consider the actual format of the data itself as
input to the machine or deep learning models, which subsequently develop the ability to
distinguish between these nominal thought patterns. Usually, this is either statistically
1-Dimensional or temporally 2-Dimensional since there is an extra consideration of time
and sequence. Due to the availability of resources in the modern day, a more enabled area
of research into a new formatting technique is graphical representation, i.e., presenting
the 1-Dimensional mathematical descriptors of waves in multiple spatial dimensions
in order to form an image or model in 3D space. This format of data can then be
further represented by feature maps from convolutional operations. With preliminary
success of the approach, a deeper understanding must be sought in order to distinguish
in which spatial dimension brainwave signals are most apt for interpretation. With the
classical method of raw wave data being used as input to a CNN in mind, dimensionality
reduction is especially difficult given the often blackbox-like nature of a CNNs internal
feature extraction processes [4]. In this work, we extract statistical temporal features
from the waves which serve as input to the CNN, which allows for direct control of input
complexity since dimensionality reduction can be used to choose the best nfeatures
within the set with the task in mind. Reduction of a CNN topology, whether that be
network depth or layer width, gives less control over which features are and are not
computed. Given the technique of feature extraction as input to the CNN, and thus the
aforementioned direct control of input complexity, reduction of CNN complexity reduces
the number of ’features of features’ computed; that is, all the chosen input attributes
are retained.
The remainder of this report is structured as follows. Firstly, the remainder of
this section outlines the scientific contributions of this work. In Section 2, technical
background and related scientific works are presented and discussed. Following the
background and related works, Section 3 then provides details of the methodology of
the experiments performed during this study. Section 4 then reports the results of
the experiments, along with comparison to related state-of-the-art scientific knowledge.
Finally, Section 6 provides an outline for suggestions of future work and presents the
final notes and conclusions from the study.
1.1. Scientific Contributions
In this work, an experimental framework is presented in which evolutionary optimisation
of neural network hyperparameters is applied in conjunction with a visual data pre-
processing technique preliminarily explored in a previous work. During the previous
Classification of EEG Signals represented in 2D and 3D 3
study [5], a 2D CNN was succesfully applied to a 2D image representation of EEG
features with a dimensionality reduction algorithm on a 4-channel EEG dataset. In this
work, we explore visual data reshaping in 2 and 3 dimensions in order to form pixel
image and voxel cube representations of statistical features extracted from electrical
brain activity, through which 2D and 3D CNN convolve ’features of features’. In
addition, we also explore multiple methods of dimensionality reduction and describe
their relationships to both the general classification ability of the model as well
as the reshaping technique. In comparison to previous works on both attention
(concentrating/relaxed) and emotional (positive/negative), many of the techniques
explored in this study produce competitive results. Finally, the application to other
EEG devices is shown by the application of the method to an open-source dataset. We
apply the three 2D and 3D approaches to classification to a 64-channel EEG dataset
acquired from an OpenBCI device, which achieves 97.96% 10-fold mean classification
accuracy on a difficult binary problem (Eyes open/closed), arguing that the approach is
dynamically applicable to BCI devices of higher resolution and for problems other than
the frontal lobe activity classification in the first two experiments. This both suggests
some future work with other devices, as well as collaboration between research fields in
order to build on and improve the framework further.
2. Background and Related Works
In this section, the technical philosophies of the related Scientific fields are outlined, as
well as important works that are related to the experiments carried out throughout this
paper.
2.1. Electroencephalography
Electroencephalography is the process of using electrodes applied to the cranium in order
to measure electrical signals produced by the brain [6,7] due to the nervous oscillations
caused by certain hormonal balances such as serotonin, dopamine and noradrenaline.
Electrodes can be placed invasively or subdurally under the skull and directly on to the
brain itself [8]. Other electrodes are able to read bioelectrical signals from on the surface
of the head and are thus less invasive; via either Electro-Gel wet electrodes or simply
placed dry electrodes [9]. The signal strength of the raw electrical data is recorded
sequentially, producing what is known as a ’brainwave’.
The Muse EEG headband is comprised of four dry electrodes placed on the TP9,
AF7, AF8 and TP10 placements. Muse operates an on-board artefact separation
algorithm in order to remove the noise from the recorded data [10]. The muse streams
over Bluetooth Low Energy (BLE) at around 220Hz, which we reduce to 150Hz in
order to make sure that all data collected is uniform. Muse has been used in various
Brain-computer interface projects since its introduction in May 2014. They have been
Classification of EEG Signals represented in 2D and 3D 4
particularly effective for use in neuroscientific research projects, since the data is of
relatively high quality and yet the device is both low-cost and easy to use since it
operates dry electrodes. This was shown through an exploration into Bayesian binary
classification [11]. Sentiment analysis via brainwave patterns has been performed in
a process of regression in order to predict a user’s level of enjoyment of a performed
task [12,13]. The works were shown to be effective for the classification of enjoyment of a
mobile phone application. The Muse produces bipolar readings from the four electrodes
with the AFz placement as a reference. According to the technical specifications, the
signals are oversampled and then downsampled to yield a the output, and the sampling
has a 2uV (RMS) noise. The noise is suppressed via the Driven-Right-Leg/Reference
feedback configuration using the AFz sensor. A Notch filter of 50Hz is applied to the
raw waves since the experiment was performed in the United Kingdom.
Attention state classification is a widely explored problem for statistical, machine
and deep learning classification [14,15]. Common Spatial Patterns (CSP) benchmarked
at 93.5% accuracy in attention state classification experiments, suggesting it is pos-
sibly one of the strongest state-of-the-art methods [16]. Researchers have found that
binary classification is often the easiest problem for EEG classification, with Deep Belief
Networks (DBN) and Multilayer Perceptron (MLP) neural networks being particularly
effective [17–19]. The best current state-of-the-art benchmark for classification of emo-
tive EEG data achieves scores of around 95% classification accuracy of three states, via
the Fisher’s Discriminant Analysis approach [20]. The study noted the importance of
the prevention of noise through introducing non-physical tasks as stimuli rather than
those that may produce strong electromyographic signals. Stimuli to evoke emotions for
EEG-based studies are often found to be best with music [21] and film [22, 23].
OpenBCI, used in the 64-channel extension of this study, is an open-source
Brain-computer interface device, which has the ability to interface with standard
Electroencephalographic [24], Electromyographic [25], and Electrocardiographic [26]
electrodes. OpenBCI with selected electrodes has seen 95% classification accuracy of
sleep states when discriminative features are considered by a Random Forest model
in the end-to-end system Light-weight In-ear BioSensing (LIBS) [27]. In this study,
OpenBCI data is used to detect eye state, that is, whether or not the subject has
opened or closed their eyes. In addition to the obvious nature of muscular activity
around the eyes, according to Brodmann’s Areas, the visual cortex is also an indicator
of visual stimuli [28, 29], and thus a higher resolution EEG is recommended for full
detection. In [30], researchers achieved an accuracy of 81.2% of the aforementioned
states through a Gaussian Support Vector Machine trained on data acquired from 14
EEG electrodes. It was suggested that with this high accuracy, the system could be
potentially used in the automatic switching of autonomous vehicle states from manual
driving to autonomous, in order to prevent a fatigue-related accident. A related work
Additional technical detail on the Muse can be found at http://developer.choosemuse.com/hardware-
firmware/hardware-specification
Classification of EEG Signals represented in 2D and 3D 5
found that K-Star clustering enabled much higher classification accuracies of these states
to around 97% [31], but it must be noted that only one subject was considered and thus
generalisation and further use beyond the subject would be considered difficult when
generalisation works are considered [32,33]; in this study, ten subjects are considered.
In a similar dataset as seen in this work, researchers found that K-Nearest Neighbour
classification (where k= 3) could produce a classification accuracy of 84.05% [34]. In
the classification problem of the states of eyes open and closed (a binary classification
problem), a recent work found that statistical classification via 7-nearest neighbours of
the data following temporal feature extraction achieved a mean accuracy of 77.92% [35].
The study extracted thirteen temporal features and found that wave kurtosis was a
strong indicator for the autonomous inference of the two states.
2.1.1. Statistical Extraction of EEG for Deep Machine Learning Due to the temporal
nature of the EEG waves, single point measures rarely harbor any useful classification
accuracy and thus make weak datasets. In this work, statistical features are
extracted through a sliding time-window approach [5, 36, 37] (https://github.com/
jordan-bird/eeg-feature-generation). The EEG signal is divided into a sequence
of windows of length one second, with consecutive windows overlapping by 0.5 seconds,
e.g., [(0s1s), [0.5s1.5s), [1s2s), . . . ]). Each time window is further halved and
quartered, which are used to extract additional features.
In this work the following statistical features were generated for each time
window via the process that can be observed in Algorithm 1 as in the previous
aforementioned works, where yk= [yk1, . . . , ykN ], within which Kare vectors of paired
observations [5, 36, 37]:
Considering the full time window:
The sample mean and sample standard deviation of each signal [38]:
¯yk=1
N
N
X
i=1
yki (1)
sy=sPn
i=1(yi¯yk)2
n1(2)
The sample skewness and sample kurtosis of each signal [39]:
g1,k =PN
i=1 (yki ¯yk)3
Ns3
k
,(3)
g2,k =PN
i=1 (yki ¯yk)4
Ns4
k
3.(4)
.
The maximum and minimum value of each signal.
Classification of EEG Signals represented in 2D and 3D 6
Result: Features extracted from raw data for every wt
User defined the size of the sliding window wt= 1s;
Input: raw wave data;
Initialisation of variables init = 1, wt= 0;
while getting sequence of raw data from sensor (>1min)do
if init then
prev lag = 0;
post lag = 1;
end
init = 0;
;
for each slide window (wtprev lag) to (wt+post lag)do
Compute mean of all wtvalues y1, y2, y3...yn; ¯yk=1
NPN
i=1 yki ;
;
Compute asymmetry and peakedness by 3rd and 4th order moments skewness and
kurtosis g1,k =PN
i=1(yki ¯yk)3
Ns3
k
and g2,k =PN
i=1(yki ¯yk)4
Ns4
k
3 ;
;
Compute the max and min value of each signal wt
max = max(wt) and
wt
min = min(wt) ;
;
Compute sample variances K×Kmatrix Sof each signal Compute sample
covariances of all signal pairs, sk` =1
N1PN
i=1 (yki ¯yk) (y`i ¯y`) ;
k, ` [1, K];
;
Compute Eigenvalues of the covariance matrix S,λsolutions to:
det (SλIK) = 0, where IKis the K×Kidentity matrix, and det(·) is the
determinant of a matrix;
;
Compute the upper triangular elements of the matrix logarithm of the covariance
matrix S, where the matrix exponential for Sis defined via Taylor expansion
eB=IK+P
n=1
Sn
n!, then BCK×Kis a matrix logarithm of S;
;
Compute magnitude of frequency components of each signal via Fast Fourier
Transform (FFT), magFFT(wt);
;
Get the frequency values of the ten most energetic components of the FFT, for each
signal, getFFT(wt, 10);
end
wt=wt+ 1s;
prev lag = 0.5s;post lag = 1.5s;
Output Features F wtextracted within the current wt
end
Algorithm 1: Algorithm to extract features from raw biological signals.
Classification of EEG Signals represented in 2D and 3D 7
The sample variances K×Kmatrix Sof each signal, plus the sample
covariances of all signal pairs [38]:
sk` =1
N1
N
X
i=1
(yki ¯yk) (y`i ¯y`) ;
k, ` [1, K ]
(5)
The eigenvalues of the covariance matrix [40] S, which are the λsolutions to:
det (SλIK) = 0 (6)
where IKis the K×Kidentity matrix, and det(·) is the determinant of a
matrix.
The upper triangular elements of the matrix logarithm of the covariance
matrix [41,42] of the covariance matrix S: where the matrix exponential for S
is defined via Taylor expansion,
eB=IK+
X
n=1
Sn
n!,(7)
then BCK×Kis a matrix logarithm of S.:
The magnitude of the frequency components of each signal, obtained using a
Fast Fourier Transform (FFT).
The frequency values of the ten most energetic components of the FFT, for
each signal.
With the above in mind, the following are calculated in regards to the 0.5s windows:
The change in the sample means and in the sample standard deviations between
the first and second half-windows, for all signals.
The change in the maximum and minimum values between the first and second
half-windows, for all signals.
And finally, for the 0.25s windows:
The sample mean of each each quarter-window, plus all paired differences of
sample means between the quarter-windows, for all signals.
The maximum (minimum) values of each quarter-window, plus all paired
differences of maximum (minimum) values between the quarter-windows, for
all signals.
Additionally, each data object is also given the features calculated in the previous
window, bar those that would be identical. This allows for further temporal
consideration. This data then follows the below process of attribute selection in order to
reduce the number of attributes to one that can be reshaped into squares and cubes, in
order to form the objects for the CNN to process. Note that not all features are specific
to EEG, given that the algorithm is a general purpose feature extraction process for
Classification of EEG Signals represented in 2D and 3D 8
temporal wave data. Due to this, it is thus important to perform feature selection in
order to isolate generated features that are useful for the specific problem in mind -
in this case, features from this large set that may be useful for an EEG classification
problem.
2.2. Attribute Selection
Attribute selection, or dimensionality reduction, is the process of reducing the dataset
by features in order to simplify the learning process. Importantly, it is the focus of
discarding weaker elements in order to simplify the process but at the smallest cost of
classification ability [43–45]. In neural networks, for an example, large input datasets
greatly increase the number of hyperparameters to be tuned by the optimisation algo-
rithms and thus the computational resources required [46]. The three methods of feature
selection chosen due to the findings of literature review are One Rule, Kullback-Leibler
Divergence, and Symmetrical Uncertainty.
One Rule feature selection is the scoring of an attribute based on how well it
can be branched to classify data based on the singular attribute [47]. Kullback-Leibler
Divergence, or Relative Entropy, is the measure of how a feature set’s probability
distribution differs from another [48,49]. Finally, Symmetrical Uncertainty is the rating
of attribute classification ability based on a mutual dependence, or lack thereof [50].
2.3. CNN and Visual Space Learning
Convolutional Neural Networks (CNN) are a form of Artificial Neural Network (ANN)
which perform autonomous feature extraction from attributes based on their spatial
positioning [51]. To perform this, data is convolved in order to form new maps from the
original data, of which the connections to an interpretation Multilayer Perceptron (MLP)
are considered parameters for loss-reducing optimisation [52]. The spatially-aware focus
of pooling is inspired by the operations of the biological photo-receptors [53, 54]. The
size of the window for this is known as the ’kernel’ and is a manual hyperparameter set
pre-training, as well as the layers of convolutional operations themselves.
Visual Space learning, is the process of projecting data as a matrix and convolving
with the above methods, but on unconventional graphical data formatted as such. Vi-
sual space learning in EEG is a relatively new approach, with most simply considering
signal strengths interpolated where the centroid is relative to the electrode placement
location [55, 56]. Recently, the static statistical descriptions of brainwaves have been
found to be extremely effective when formed as an image and convolved to feature
maps [5]. The preliminary method of graphical 2D Euclidean Space representations of
brainwave signals is to be expanded further in these studies.
Classification of EEG Signals represented in 2D and 3D 9
2.4. Evolutionary Topology Search
Result: Array of best solutions at final generation
initialise Random solutions;
for Random solutions : rs do
test accuracy of rs;
set accuracy of rs;
end
set solutions = Random Solutions;
while Simulating do
for Solutions : s do
parent2 =roulette selected Solution;
child =breed(s, parent2);
test accuracy of child;
set accuracy of child;
end
Sort Solutions best to worst;
for Solutions : s do
if s index >population size then
delete s;
end
end
increase maxPopulation by growth factor;
increase maxNeurons by growth factor;
end
Return Solutions;
Algorithm 2: Evolutionary Algorithm for ANN optimisation [57].
Deep Evolutionary Multilayer Perceptron, or DEvoMLP is an approach to hyper-
heuristically optimising a Neural Network topology through evolutionary computation
[57, 58]. Networks are treated as individual organisms in the process where their
classification ability dictates their fitness metric, thus it is a single-objective algorithm.
The pseudocode for the algorithm is given in Algorithm 2. The process to combine
two networks follows the aforementioned work, where the depth of the hidden layers
is decided by selecting one of the two parents at random or mutation at a 5% chance.
Then, for each layer, the number of neurons is decided by selecting the nth layer of
either parent at random (provided both parent networks have an nth layer), again a 5%
mutation chance dictates a random mutation resulting in the number of neurons being
a random number between 1 and maxNeurons. To give an example of a process within
the algorithm, a neural network i, 64,32,16, o (where iare the input neurons, and oare
the output neurons) which has three hidden layers of neurons (64,32,16) and a second
Classification of EEG Signals represented in 2D and 3D 10
EEG Signals
Extraction Selection Reshape
n-dimensional data
Benchmark
Optimisation
Final Model
Feature Engineering Processes
Deep Learning and Optimisation Processes
Figure 1. Overview of the Methodology. EEG Signals are Processed into 2D or 3D
data Benchmarked by a 2D or 3D CNN. Three Different Attribute Selection Processes
are Explored. Finally, the Best Models have their Interpretation Topologies Optimised
Heuristically for a Final Best Result.
neural network i, 100,10, o are chosen as the two candidates to breed and create a neural
net offspring. If, in this example, parent 2 is chosen to provide depth to the offspring,
then the offspring topology would be i, x, y, o, and neuron counts xand ynow need
to be chosen. Layer xmay be chosen from parent 1 and yfrom parent 2, creating an
offspring neural network topology i, 64,10, o which has two hidden layers of 64 and 10
neurons respectively. Layers xand ycould have both been chosen from parent 1 which
would result in the offspring i, 64,32, o since it had the hidden depth of 2 from parent 2.
Indeed, the breeding process can, and does, produce an offspring that is identical to one
of the parents. Since we already know this fitness value, a random solution is generated
instead.
Thus, after simulation, the goal of the DEvo algorithm is to derive a more effective
neural network topology for the given dataset. The algorithm is implemented due to
neural network hyperparameter tuning being a non-polynomial problem [59]. It is,
of course, extremely complex; a ten population roulette breeding simulation executed
for ten generations would produce 120 neural networks to be trained, since eleven are
produced every generation. Resource usage is extreme for the simulation, but the final
result gives a network topology apt for the given data, and can this finding can thus be
used in other experiments.
3. Method
In this section, the method of these experiments are described. A diagram of
the process described in this study can be seen in Fig. 1. Two datasets for
the experiment are sourced from a previous study [36] which made use of the
aforementioned Muse headband (TP9, AF7, AF8, TP10), see Section 2 for technical
detail. Firstly, the ’attention state’ dataset (https://www.kaggle.com/birdy654/
eeg-brainwave-dataset-mental-state), which is collected from four subjects; two
Classification of EEG Signals represented in 2D and 3D 11
Table 1. Class labels for the data belonging to the three datasets
Dataset No. Classes Labels
Concentration State 3 Relaxed, Neutral, Concentrating
Emotional State 3 Negative, Neutral, Positive
Eye State 2 Closed, Open
male, two female, at an age range of 20-24. The subjects under stimuli were either
relaxed, concentrating, or from lack of stimuli, neutral. Three minutes per state are
recorded for each subject, giving a total of thirty-six minutes of EEG brainwave data.
The concentrating class is stimulated by the ’shell game’ wherein the subjects must
concentrate to follow the movement of a ball hidden under one of three cups which
are switched around. The relaxed state is induced with classical music and is recorded
several moments after the exercise begins, and the neutral state is finally recorded free
of any stimuli.
In the second experiment, the ’Emotional State’ dataset (https://www.kaggle.
com/birdy654/eeg-brainwave-dataset-feeling-emotions) is acquired. To gather
this data, six minutes of EEG data are recorded from two subjects of ages 21 and 22.
negative or positive emotions are evoked via film-clip stimuli, and finally a stimulus-
free ’neutral’ class of EEG data is also recorded. Similarly to dataset 1, this gives a
total of thirty-six minutes of EEG brainwave data equally belonging to one of the three
classes. Unlike the first and third datasets, this experiment focuses on classification
of a more limited subject-set given that there are only two subjects involved. There
were three film clips that were intended to evoke a positive emotional response; La
La Land from Summit Entertainment, Slow Life from BioQuest Studios, and Funny
Dogs from MashupZone. Likewise, there were three clips that were intended to evoke
a negative emotional response; Marley and Me from Twentieth Century Fox, Up from
Walt Disney Pictures, and My Girl from Imagine Entertainment. Note that different
forms of positive and negative valence are collected - for the positive, an upbeat musical
and dance number, clips of marine life performing feats of nature, and clips of dogs
performing interesting and funny activities. For the negative emotion-evoking film clips,
these dealt with the final moments spent with a beloved pet, the loss of a loved one
after a long marriage, and finally a child attempting to grasp the concept of death. Also
note that subjects involved knew that the negative clips were from movies, and this may
have impacted the data.
With the subject-limited dataset (emotions) and relatively less limited dataset
(concentration), a third dataset is explored in order to benchmark the algorithms
when a large subject-set is considered. The dataset is sourced from a BCI2000 EEG
device [60–62]. This data describes a multitude of tasks performed by 109 subjects for
one to three minutes with 64 EEG electrodes. A random subset of 10 people is taken
due to the computational complexity requirements, thus the experiments are focused on
datasets of 2, 4, and 10 subjects in order to further compare performance. In this work,
Classification of EEG Signals represented in 2D and 3D 12
each subject had their EEG data recorded for 2 minutes (two 1 minute sessions) for each
class. Thus, in total, a dataset was formed of 40 minutes in length - 20 minutes for each
class, made up from ten individuals. Classes are reduced from the large set to a binary
classification problem, due to the findings of literature review into the behaviours of
binary classification in Brain-machine Interaction. The classes chosen are ”Eyes Open”
and ”Eyes Closed”, since these two tasks require no physical movement from the subjects
and thus noise from EMG interference is minimal. Table 1 gives detail on the number
of classes in the dataset as well as their class labels.
Mathematical temporal features are subsequently extracted via the aforementioned
method in Section 2.
As of the time of writing, the first two datasets (which were collected by the authors
for previous works) have not been used in experiments by other authors while the third,
from the ML repository, is popular in several recent publications. The aforementioned
concentrating and emotional EEG datasets have been explored on the Kaggle cloud
computing platform by other data scientists, but results remain unpublished as of yet
within academic works.
Firstly, a reduction of dimensionality of the datasets is performed. The chosen
number of attributes is 729; this is due to 729 being a square and a cube number and
thus therefore being directly comparable in both 2D and 3D space. 729 features thus
are reformatted into a square of 27x27 features for 2-dimensional space classification,
as well as a cube of 9x9x9 features for 3-dimensional space classification. Each of the
attributes in descending rank of their values assigned by the feature ranking algorithms
are given as the order (see Future Work for plans to improve on this as a combinatorial
optimisation problem), to which each row of the image is filled from left to right, top to
bottom. This process is repeated for the 3D process for 9x9 squares which are repeated
9 times to produce the third axis. Alternatives of 64 and 1000 are discarded; firstly, 64
in previous work has been shown to be a relatively weak set of attributes, and larger
datasets outperformed such a number by far. Secondly, 1,000 in preliminary exploration
showed numerous weak attributes selected. Reduced data is then normalised between
values of 0 to 255 in order to correlate to a pixel’s brightness value for an image. Note
that the CNN for learning will further normalise these values to the range of 0 to 1 by
dividing them by 255. The order of the visual data is dictated by the dimensionality
reduction algorithms from left to right, with the most useful feature selected by the
algorithm in the upper left and the least useful in the lower right (and front to back
for 3D). The CNN then extracts ’features of features’ by convolving over this reshaped
data.
Secondly, with the reduced data reshaped to both squares and cubes, classification
is performed by Convolutional Neural Networks operating in 2D and 3D space. In the
previous study [5], as in this work, the order of attributes represented visually are se-
lected by the feature selection algorithms. Scoring is applied by each algorithm and
attributes are sorted in descending order, and this is then reshaped into 27 ×27 square
Classification of EEG Signals represented in 2D and 3D 13
Table 2. Pre-optimisation Network Architecture for Preliminary Experiments [5]
Layer Output Params
Conv2d (ReLu) (0, 14, 14, 32) 320
Conv2d (ReLu) (0, 12, 12, 64) 18496
Max Pooling (0, 6, 6, 64) 0
Dropout (0.25) (0, 6, 6, 64) 0
Flatten (0, 2304) 0
Dense (ReLu) (0, 512) 1180160
Dropout (0.5) (0, 512) 0
Dense (Softmax) (0, 3) 1539
or 9 ×9×9 cube. Visual representation, thus, is performed in three different ways,
dependent on the scores applied by the three feature selection methods in this study.
This is discussed as a point for further exploration in the Future Work section of this
study.
In this stage, topology of networks is simply selected based on the findings of previ-
ous experiments (see Section 2). Preliminary hyperparameters from previous work are
given as a layer of 32 filters from a kernel of length and width of 3, followed by a layer of
64 filters from a kernel of the same dimensions, a dropout of 0.25 before the outputs are
flattened and interpreted by a layer of 512 ReLu neurons. These kernels are to be ex-
tended into a third dimension matching the length and width of the windows for the 3D
experiments. A generalised view of the network pre-optimisation can be seen in Table 2.
The selected methods of feature selection were those observed in previous
experiments as strong algorithms for EEG classification. These are Kullback-Leibler
Divergence (Information Gain), One Rule, and Symmetrical Uncertainty. Model
training takes place on an NVidia GTX980Ti Graphical Processing Unit, with its
implementation in TensorFlow. All models are trained via a 70/30 training/test split
for 100 epochs, with a batch size of 64. The loss metric of the models is defined as
categorical cross-entropy:
CE =
M
X
c=1
yo,c log(po,c),(8)
where Mis the number of class labels (3 or 2 in these cases), yis a binary indication of
a correct prediction (1 or 0), and pis the predicted probability of observation oof class
c. The entropy of each class within the testing split is calculated and added for a final,
overall result. In this case, this is the entropy of the three classes of attention state -
relaxed, neutral, and concentrating. Complexity of training when considering epochs,
examples, no. features, no. neurons is O(n2), computational cost is variable based on
the hardware used (e.g. if parallelisation is possible) and software (e.g. the method
in which the version of the libraries use), times to execute are noted via the hardware
Classification of EEG Signals represented in 2D and 3D 14
Figure 2. Thirty Samples of attention state EEG Data Displayed as 27x27 Images.
Row one shows Relaxed Data, Two shows Neutral Data, and the Third Row Shows
Concentrating Data.
Figure 3. Three attention state Samples Rendered as 9x9x9 Cubes of Voxels.
Leftmost Cube is Relaxed, Centre is Neutral, and Rightmost Cube represents
Concentrating Data.
Figure 4. Thirty Samples of Emotional State EEG Data Displayed as 27x27 Images.
Row one shows Negative Valence Data, Two shows Neutral Data, and the Third Row
Shows Positive Valence Data.
Figure 5. Three Emotional State Samples Rendered as 9x9x9 Cubes of Voxels.
Leftmost Cube is Negative Valence, Centre is Neutral, and Rightmost Cube represents
Positive Valence Data.
given above on a clean operating system; the evolutionary topology search for the smaller
datasets executed for approximately an hour, whereas the larger dataset took one day
for the search algorithm to complete. In terms of the final CNN training process, the
smaller datasets need only several minutes for the CNN to train since convergence for
this data was relatively fast, but the larger dataset was observed to take 24 minutes to
finish training. For unseen data prediction, a forward pass has the complexity of O(n).
Samples of visually rendered attention states can be seen in Figures 2 and 3. The
Classification of EEG Signals represented in 2D and 3D 15
examples in these figures show how the data looks when rendered as square images for
the 2D CNN and as cubes of voxels for input to the 3D CNN. Note that within the cubes,
a large difference between relaxed and the other two states can be observed where it
seemingly contains lower values (denoted by lighter shades of grey). In comparison to the
2D representations, it is visually more difficult to discern between the classes, which may
also be the case for the CNN when encountering these two forms of data as input. Firstly,
figure 2 shows thirty samples of attention state data as 27x27 images whereas figure 3
shows the topmost layer of 9x9x9 cubes rendered for each state. Likewise, examples of
the emotions dataset reshaped within 2D and 3D space can be seen in Figures 4 and
5. This process is followed for each and every data point in the set respectively for
either a 2D or 3D Convolutional Neural Network. Following this, the DEvo algorithm
as described in Section 2.4 is executed upon the best 2D and 3D combinations of models
in order to explore the possibility of a better architecture. A population size of 10 are
simulated for 10 generations. Hyperparameter limits are introduced as a maximum
of 5 hidden layers of up to 4096 neurons each. Networks train for 100 epochs. The
goal of optimisation are the interpretation layers that exist after the CNN operations.
Following this, the best sets of hyperparameters for each dataset are used in further
experiments. During these experiments, the networks are retrained but rather than the
70/30 train/test split used previously, the value of k= 10 is selected. Hyperparameters
for each 2D and 3D network are those that were observed to be best in the previous
heuristic search, this is performed due to the intense resource usage that a heuristic
search of a problem space when k-fold cross validation is considered (and would thus
be impossible). These experiments are performed due to the risk of overfitting during
hyperparameter optimisation when a train/test split is used, due to hyperparameters
possibly being overfit to the 30% of testing data, even though a dropout rate of 0.5 is
implemented.
Following the experiments on K-Fold Cross Validation, the trained models are then
applied to further unseen data through Leave One Subject Out Cross Validation. This is
performed by training the model on all the data except for one subject (n1), and then
attempting to predict the class labels of the data collected from the remaining individual
in order to examine the extent of cross-subject generalisation. This is performed for all
subjects, individual results are considered as well as an overall mean and standard
deviation of the set of results attained via the validation process.
The final step of the method of this experiment is to compare and contrast with
related studies that use these same datasets.
4. Results
4.1. Attention state Classification
Feature Selection Firstly, attribute selection for the attention state dataset is per-
formed. Overviews of these processes can be seen in Table 3. Selection via Information
Classification of EEG Signals represented in 2D and 3D 16
Table 3. Datasets Produced by Three Attribute Selection Techniques for the attention
state Dataset, with their Minimum and Maximum Kullback-Leibler Divergence Values
of the 729 Attributes Selected
Selector Max KBD Min KBD
Kullback-Leibler Divergence 1.225 0.278
One Rule 0.621 0.206
Symmetrical Uncertainty 1.225 0.233
Table 4. Benchmark Scores of the Pre-optimised 2D CNN on the attention state
Selected Attribute Datasets
Dataset Acc. (%) Prec. Rec. F1
Kullback-Leibler Divergence 91.29 0.91 0.91 0.91
One Rule 93.89 0.94 0.94 0.94
Symmetrical Uncertainty 85.06 0.85 0.85 0.85
Table 5. Benchmark Scores of the Pre-optimised 3D CNN on the attention state
Selected Attribute Datasets
Dataset Acc. (%) Prec. Rec. F1
Kullback-Leibler Divergence 91.52 0.92 0.92 0.92
One Rule 93.62 0.94 0.94 0.94
Symmetrical Uncertainty 85.2 0.85 0.85 0.85
gain selected the attribute with the highest KBD, with a value of 1.225, its minimum
KBD was also the highest at 0.278. Interestingly, the OneRule approach selected much
lower KBDs of maximum 0.621 and minimum 0.206 values. The Symmetrical Uncer-
tainty dataset was relatively similar to KBD in terms of maximum and minimum selected
values.
Classification The classification abilities of the 2D CNN can be seen in Table 4.
The strongest 2D CNN was that applied to the One Rule dataset, achieving 93.89%
classification ability.
The classification abilities of the 3D CNN can be seen in Table 5. The strongest 3D
CNN was that applied to the One Rule dataset, which achieved 93.62% classification
ability.
In comparison, results show that the 2D CNN was slightly superior with an overall
score of 93.89% as opposed to a similar score achieved by the 3D CNN benchmarking in
at 93.62%. Both superior results came from the dataset generated by One Rule selection,
even though its individual selections were much lower in terms of their relative entropy
when compared to the other two selections, which were much more difficult to classify.
Classification of EEG Signals represented in 2D and 3D 17
Table 6. Datasets Produced by Three Attribute Selection Techniques for the
Emotional State Dataset, with their Minimum and Maximum Kullback-Leibler
Divergence Values of the 729 Attributes Selected
Dataset Max KBD Min KBD
Kullback-Leibler Divergence 1.058 0.56
One Rule 0.364 0.107
Symmetrical Uncertainty 0.364 0.168
Table 7. Benchmark Scores of the Pre-optimised 2D CNN on the Emotional State
Selected Attribute Datasets
Dataset Acc. (%) Prec. Rec. F1
Kullback-Leibler Divergence 98.22 0.98 0.98 0.98
One Rule 97.28 0.97 0.97 0.97
Symmetrical Uncertainty 97.12 0.97 0.97 0.97
Table 8. Benchmark Scores of the Pre-optimised 3D CNN on the Emotional State
Selected Attribute Datasets
Dataset Acc. (%) Prec. Rec. F1
Kullback-Leibler Divergence 97.28 0.97 0.97 0.97
One Rule 96.97 0.97 0.97 0.97
Symmetrical Uncertainty 97.12 0.97 0.97 0.97
4.2. Emotional State Classification
Feature Selection Table 6 shows the range of relative entropy for the results feature
selection algorithms on the emotional state dataset. Similarly to the attention state
dataset, the KBD selection technique had much higher values in its selection, also as
previously seen, the One Rule selector preferred smaller KBD attributes. Unlike the
previous attribute selection process though, was that the Symmetrical Uncertainty this
time bares far more similarity to the One Rule process whereas in the attention state
experiment it closely followed that of the KBD process.
Table 7 shows the results for the 2D CNN on the datasets generated for emotional
state. The best model was that of which was trained on the KBD dataset, achieving a
very high accuracy of 98.22%.
Classification Table 8 shows the results for the 3D CNN when trained on datasets of
selected attributes for the emotional state dataset. The best model was trained on the
KBD dataset of features, which achieved 97.28% classification accuracy.
In comparison, the most superior method of data formatting for emotional state
EEG dataset is in two dimensions, but very scarcely with a small difference of 0.94%.
Unlike the attention state experiment, the best data in both instances on this experiment
seemed to be those selected by their relative entropy. 2D One Rule and 3D relative
Classification of EEG Signals represented in 2D and 3D 18
Figure 6. Twenty Samples of Eye State EEG Data Displayed as 27x27 Images. Row
one shows Eyes Open, Row Two shows Eyes Closed.
Figure 7. Two Eye State EEG Samples Rendered as 9x9x9 Cubes of Voxels. Left
Cube is Eyes Open and Right is Eyes Closed.
Table 9. Attribute Selection and the Relative Entropy of the Set for the Eye State
Dataset
Selector Max KBD Min KBD
Kullback-Leibler Divergence 0.349 0.102
One Rule 0.349 0.025
Symmetrical Uncertainty 0.349 0.0597
entropy achieved the same score, likewise the 2D and 3D Symmetrical Uncertainty
experiments also achieved the same score.
4.3. Extension to 64 EEG Channels
For an extended final experiment, the processes successfully explored in this article are
applied to a dataset of a differing nature. The whole process is carried out in the given
order. Details of the dataset and experimental process can be found in Section 3.
Figures 6 and 7 show samples of eye state data in both 2D and 3D. Table 9 shows
the attribute selection processes and the relative entropy of the gathered sets. As could
be logically conjectured, all of the feature selectors found much worth (0.349) in the log
covariance matrix of the Afz electrode, located in the centre of the forehead. Closely
following this in second place for all feature selectors (0.3174) was the log covariance
matrix of the Af4 electrode, placed to the right of the Afz electrode. Interestingly, as well
as this data which is arguably electromyographical in origin, many features generated
from the activities of Occipital electrodes O1,Oz and O2 were considered useful for
classification, these electrodes are place around the area of the brain that receives and
processes visual information from the retinae, the visual cortex. With this in mind,
it is logical to conjecture that such a task will produce strong binary classification
accuracies since feature selection has favoured areas around the eyes themselves and the
cortex within which visual signals are processed.
Classification of EEG Signals represented in 2D and 3D 19
Table 10. Benchmark Scores of the Pre-optimised 2D and 3D CNN on the Eye State
Selected Attribute Datasets
Dims Dataset Acc. (%) Prec. Rec. F1
2D
Kullback-Leibler Divergence 97.03 0.97 0.97 0.97
One Rule 95.34 0.95 0.95 0.95
Symmetrical Uncertainty 96.89 0.97 0.97 0.97
3D
Kullback-Leibler Divergence 96.05 0.96 0.96 0.96
One Rule 94.49 0.95 0.95 0.95
Symmetrical Uncertainty 97.46 0.97 0.97 0.97
12345678910
92
93
94
95
96
97
98
Generation
Best Solution Classification Accuracy (%)
2D CNN
3D CNN
Figure 8. Evolutionary Improvement of DEvoCNN for the attention state
Classification Problem
Table 10 shows the comparison of results for the 2D and 3D CNNs on the Eye State
dataset. As would be expected, very high classification accuracies are considered since
the eyes and visual cortex both feature in the 64-channel OpenBCI EEG. Unlike the
prior experiments, the 3D CNN on a raster cube prevails over its 2D counterpart when
Symmetrical Uncertainty is used for feature selection at a score of 97.46% classification
accuracy. As observed previously, other than this one model, all 2D models outperform
the 3D alternative.
4.4. Hyperheuristic Optimisation of Interpretation Topology
In this section, the best networks for the three datasets are evolutionarily optimised in
an attempt to improve their abilities through augmentation of interpretation network
structure and topology, the dense layers following the CNN. Figures 8, 9, and 10 show
the evolutionary simulations for the improvement of the interpretation of networks for
Classification of EEG Signals represented in 2D and 3D 20
12345678910
96
97
98
99
100
Generation
Best Solution Classification Accuracy (%)
2D CNN
3D CNN
Figure 9. Evolutionary Search of Network Topologies for the Emotional State
Classification Problem
12345678910
95
96
97
98
99
Generation
Best Solution Classification Accuracy (%)
2D CNN
3D CNN
Figure 10. Evolutionary Search of Network Topologies for the Eye State Classification
Problem
Attention, Emotional, and Eye State datasets respectively. For the deep hidden layers
following the CNN structure detailed in 2, the main findings were as follows:
Attention state: The best network was found to be a 2D CNN with three hidden
interpretation layers (2705,3856,547), which achieved 96.1% accuracy. The mean
accuracy scored by 2D CNNs was 96%. These outperformed the best 3D network
with 5 interpretation layers (3393,935,2517,697,3257) which scored 95.15%, with
a mean performance of 95.02%.
Classification of EEG Signals represented in 2D and 3D 21
Table 11. Benchmark Scores of the Pre and Post-optimised 2D and 3D CNN on all
Datasets (70/30 split Validation). Model gives Network and Best Observed Feature
Extraction Method. (Other ML metrics omitted and given in previous tables for
readability)
Experiment Model Accuracy (%)
Attention State
2D CNN, Rule Based 93.89
3D CNN, Rule Based 93.62
2D DEvoCNN, Rule Based 96.1
3D DEvoCNN, Rule Based 95.15
Emotional State
2D CNN, KLD 98.22
3D CNN, KLD 97.28
2D DEvoCNN, KLD 98.59
3D DEvoCNN, KLD 98.43
Eye State
2D CNN, KLD 97.03
3D CNN, Symm. Uncertainty 97.46
2D DEvoCNN, KLD 98.02
3D DEvoCNN, Symm. Uncertainty 98.3
Emotional State: The best network was found to be a 2D CNN with two hidden
interpretation layers (165,396), which achieved 98.59% accuracy. The mean
accuracy scored by 2D CNNs was 98.41%. Close to this was the best 3D network
with 1 interpretation layer (476) which scored 98.43%, with a mean performance of
98.07%.
Eye State: The best network was found to be a 3D CNN with three
hidden interpretation layers (400,2038,1773) which achieved 98.31% classification
accuracy. The mean accuracy scored by 3D CNNs was 98.16%. The best 2D
network was 98.02%, with a mean performance of 97.88%.
Table 11 shows the overall results gained by the four methods applied to the three
datasets, from the findings of the two previous experiments. The best results for 2D and
3D CNNs are taken forward in the following section in order to perform cross validation.
It can be observed that the DEvoCNN approach slightly improved on all networks, but
the findings in the first experiment carry over in that the best dimensional-awareness
remain so even after evolutionary optimisation.
Figures 11, 12 and 13 show the confusion matrices for the concentration, emotions,
and eye state unseen data respectively. Most errors in the concentration dataset arise
from relaxed data being misclassified as neutral data which was also observed to occur
vice versa, albeit limitedly. The small number of mistakes from the emotions dataset
occurred when misclassifying negative as positive and vice versa, the neutral class was
classifed perfectly. In the eye state dataset, eyes closed were the most misclassified data
at 0.97 to 0.03.
Classification of EEG Signals represented in 2D and 3D 22
Relaxed
Neutral
Concentrating
Relaxed
Neutral
Concentrating
0.89 0.11 0
0.01 0.99 0
0 0.01 0.99
Figure 11. Normalised confusion matrix for the unseen concentration data.
Negative
Neutral
Positive
Negative
Neutral
Positive
0.99 0 0.01
010
0.03 0 0.97
Figure 12. Normalised confusion matrix for the unseen emotions data.
Classification of EEG Signals represented in 2D and 3D 23
Closed
Open
Closed
Open
0.97 0.03
0.01 0.99
Figure 13. Normalised confusion matrix for the unseen eye state data.
Table 12. Final Benchmark Scores of the Post-optimised Best 2D and 3D CNN on
all Datasets via K-fold cross validation.
Experiment Model Acc. (%) Std. Prec. Rec. F1
Attention State 2D CNN 97.03 1.09 0.97 0.97 0.97
3D CNN 95.87 0.82 0.96 0.96 0.96
Emotional State 2D CNN 98.09 0.55 0.98 0.98 0.98
3D CNN 98.4 0.53 0.98 0.98 0.98
Eye State 2D CNN 97.33 0.79 0.97 0.97 0.97
3D CNN 97.96 0.44 0.98 0.98 0.98
4.5. K-fold Cross Validation of Selected Hyper-parameters
In this section, the best sets of hyperparameters for each dataset are used in further
experiments where each model is benchmarked through 10-fold cross validation.
Table 12 shows the mean accuracy of networks when training via 10-fold cross
validation. As was alluded to through the simpler data split experiments, the best
models for the first two datasets were found when the data was arranged as a 2-
Dimensional grid of pixels whereas the best model for the eye state dataset was in
3D with both a higher accuracy and lower standard deviation of per-fold accuracies.
Standard deviation was relatively low between folds, all below 1% except for the 2D
CNN attention state model which has a standard deviation of 1.09%.
Classification of EEG Signals represented in 2D and 3D 24
Table 13. Leave one Subject Out (Unseen Data) for the Concentration State Dataset
Subject left out 1 2 3 4 Mean Std.
Accuracy (%) 84.33 86.27 81.91 89.66 85.54 0.03
Table 14. Leave one Subject Out (Unseen Data) for the Emotions Dataset
Subject left out 1 2 Mean Std.
Accuracy (%) 91.18 84.71 87.95 0.03
Table 15. Leave one Subject Out (Unseen Data) for the Eye State Dataset (individual
109 subjects removed for readability purposes)
Subject left out Mean Std.
Accuracy (%) 83.8 3.44
Table 16. Comparison of the best concentration dataset model (2D CNN) to other
models
Model Acc. (%) Std. Prec. Rec. F1
2D CNN 97.03 1.09 0.97 0.97 0.97
Extreme Gradient Boosting 93.62 0.01 0.94 0.94 0.94
Random Forest 91.64 0.02 0.92 0.92 0.92
KNN(10) 86.03 0.03 0.87 0.86 0.86
Decision Tree 84.65 0.02 0.85 0.85 0.85
AdaBoost Long Short-Term Memory [58] 84.44 0.02 0.85 0.85 0.85
Long Short-Term Memory [58] 83.84 0.03 0.84 0.84 0.84
Deep Neural Network [58] 79.81 0.02 0.8 0.8 0.8
Linear Discriminant Analysis 79.44 0.02 0.81 0.79 0.8
Support Vector Classifier 77.46 0.02 0.78 0.78 0.77
Quadratic Discriminant Analysis 74.27 0.02 0.74 0.74 0.73
Naive Bayes 52.18 0.03 0.53 0.52 0.47
4.6. Leave One Subject Out Validation of Selected Hyperparameters
Tables 13, 14 and 15 show the leave one subject out results for each of the three datasets
with the best CNN model. The model is trained on all subjects except for one, and
classifies the data belonging to that left out subject.
5. Discussion
Tables 16, 17 and 18 show comparisons of the best models found in this study to
other machine learning models. Although the top mean scores were noted to be the
CNNs found in this study, their deviance is relatively high. In some cases such as in
the emotions and eye state datasets for example, the CNN only slightly outperforms a
Random Forest which is far less computationally expensive to execute in comparison.
Classification of EEG Signals represented in 2D and 3D 25
Table 17. Comparison of the best emotions dataset model (3D CNN) to other
statistical models
Model Acc. (%) Std. Prec. Rec. F1
3D CNN 98.4 0.53 0.98 0.98 0.98
Extreme Gradient Boosting 98.38 0.01 0.98 0.98 0.98
Random Forest 98.36 0.01 0.98 0.98 0.98
AdaBoost Long Short-Term Memory [58] 97.06 0.01 0.97 0.97 0.97
Long Short-Term Memory [58] 96.86 0.01 0.97 0.97 0.97
Deep Neural Network [58] 96.11 0.02 0.96 0.96 0.96
Decision Tree 94.98 0.02 0.95 0.95 0.95
Linear Discriminant Analysis 93.9 0.02 0.94 0.94 0.94
KNN(10) 92.64 0.01 0.93 0.93 0.93
Support Vector Classifier 92.03 0.01 0.93 0.92 0.92
Quadratic Discriminant Analysis 77.35 0.11 0.82 0.78 0.77
Naive Bayes 65.24 0.04 0.65 0.65 0.63
Table 18. Comparison of the best eye state dataset model (3D CNN) to other
statistical models
Model Acc. (%) Std. Prec. Rec. F1
3D CNN 97.96 0.44 0.98 0.98 0.98
AdaBoost Long Short-Term Memory 97.87 0.04 0.98 0.98 0.98
Long Short-Term Memory 97.87 0.04 0.98 0.98 0.98
Extreme Gradient Boosting 97.95 0.01 0.98 0.98 0.98
Deep Neural Network 97.91 0.01 0.98 0.98 0.98
Random Forest 97.9 0.01 0.98 0.98 0.98
KNN(10) 94.82 0.01 0.95 0.95 0.95
Linear Discriminant Analysis 94.32 0.01 0.94 0.94 0.94
Support Vector Classifier 92.75 0.02 0.93 0.93 0.93
Decision Tree 90.79 0.02 0.91 0.91 0.91
Quadratic Discriminant Analysis 83.12 0.02 0.84 0.83 0.83
Naive Bayes 66.61 0.03 0.7 0.67 0.65
It is also worth noting that the CNN, for these datasets, seemingly outperforms Long
Short Term Memory Networks and Multilayer Perceptrons.
6. Conclusion and Future Works
As discussed at the start of this paper, 729 features were selected in order to directly
compare 2D and 3D visual space for EEG classification, since 729 can be used to make
both a perfect square and cube. Experiments show the superiority of the 2-Dimensional
approach and there are of course many more numbers within the bounds of the attribute
set that make only a perfect square, 1273 to be exact. If cube comparison is discarded,
image size should be explored in order to explore whether there is a better set of results
totalling either more or fewer than the 729 chosen. The feature extraction for the
64-channel dataset produces 23,488 attributes and thus further studies into this can
Classification of EEG Signals represented in 2D and 3D 26
attempt to compare different sized images and cubes due to the abundance of features.
Furthermore, the method of reshaping to 2D and 3D through order of their feature
selection scores was performed in a relatively simple fashion for purposes of preliminary
exploration. In future studies, due to the success found in this work, the method of
reshaping and ordering of the attributes within the shape will be studied considering the
reshape method an additional network hyperparameter. This presents a combinatorial
optimisation problem that should be further explored and solved in order to present
more scientifically sound methods for reshaping. In addition, in future, it would be
useful to explore other methods of feature extraction using the CNN model. In this
work, we compare our approach to statistical models which also have the features as
input - it is well documented in the field that features must be extracted from the raw
signals when non-temporal learning methods are to be performed [63–65]. Otherwise,
low classification accuracies are often encountered and thus models with little use that
cannot classify unseen data. Although this would not be possible with the raw signal
domain, the raw signals may be more useful for convolutional neural networks to learn
from in future benchmarking experiments. Another limitation of this study is that
unseen data was restricted to both holdout test sets and unseen subjects, in future a
further dataset should be collected in order to enable testing on a larger amount of
unseen data.
In this work, models were explored with a train/test split and finally benchmarked
with k-fold cross validation. Ron Kohavi [66] argued that data splits are usually inferior
to k-fold cross-validation, which is further inferior to leave one out cross-validation where
a model is trained for each and every data point (k=datapoints). Since this would
require the availability of an extensive amount of computational resources before this
experiment, it is now feasible to take the best models in this work ahead and attempt
leave-one-out cross validation. As previously described, the main limitation of this
study is the method of reshaping, three methods were explored which were dictated
by the score metrics of three different dimensionality reduction techniques. In future,
a combinatorial optimisation algorithm could be used with CNN classification metrics
as a function fitness to optimise. Future work could specifically explore the affects of
reshaping on CNNs operating in different numbers of spatial dimensions and thus then
how this may be useful for future tasks. The techniques were applied generally to four
and 64-channel EEG recordings, thus applied to datasets of much different width (given
that temporal techniques are extracted from each electrode), and future would could
explore if differing successful techniques could be applied with either a task or electrode
count in mind. Datasets with larger numbers of subjects and leave-one-subject-out
testing could also be explored in future works in order to discern whether these models
improve the ability of unseen subject classification or whether calibration is required.
To finally conclude, initially, nine preliminary deep learning experiments were
carried out twice for three EEG datasets. Three in 2-Dimensional space and three in
3-Dimensional space and compared. In cases of attention and emotional state, the 2D
CNN outperformed the 3D CNN when rule-based and entropy-based feature selection
Classification of EEG Signals represented in 2D and 3D 27
is performed respectively. On the other hand, for eye state with a 64-channel EEG, the
3D CNN produced the best accuracy when feature were selected via their Symmetrical
Uncertainty. The best 2D and 3D models for each were then taken forward for topology
optimisation, and finally, to prevent overfitting, said topologies were validated using 10-
fold cross validation. Final results show that the data preprocessing methods first shown
retained their best overall score, but all were improved upon after topology optimisation
and subsequent k-fold cross validation.
7. References
[1] Lana EP, Adorno BV, Tierra-Criollo CJ. Detection of movement intention using EEG in a human-
robot interaction environment. Research on Biomedical Engineering. 2015;31(4):285–294.
[2] Cassani R, Banville H, Falk TH. MuLES: An open source EEG acquisition and streaming server
for quick and simple prototyping and recording. In: Proceedings of the 20th International
Conference on Intelligent User Interfaces Companion; 2015. p. 9–12.
[3] Maskeliunas R, Damasevicius R, Martisius I, Vasiljevas M. Consumer-grade EEG devices: are
they usable for control tasks? PeerJ. 2016;4:e1746.
[4] Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for
explaining black box models. ACM computing surveys (CSUR). 2018;51(5):1–42.
[5] Ashford J, Bird JJ, Campelo F, Faria DR. Classification of EEG Signals Based on Image
Representation. In: UK Workshop on Computational Intelligence. Springer; 2019. .
[6] Swartz BE. The advantages of digital over analog recording techniques. Electroencephalography
and clinical neurophysiology. 1998;106(2):113–117.
[7] Coenen A, Fine E, Zayachkivska O. Adolf Beck: A forgotten pioneer in electroencephalography.
Journal of the History of the Neurosciences. 2014;23(3):276–286.
[8] Shah AK, Mittal S. Invasive electroencephalography monitoring: Indications and presurgical
planning. Annals of Indian Academy of Neurology. 2014;17(Suppl 1):S89.
[9] Taheri BA, Knight RT, Smith RL. A dry electrode for EEG recording. Electroencephalography
and clinical neurophysiology. 1994;90(5):376–383.
[10] Oliveira AS, Schlink BR, Hairston WD, K¨onig P, Ferris DP. Induction and separation of motion
artifacts in EEG data using a mobile phantom head device. Journal of neural engineering.
2016;13(3):036014.
[11] Krigolson OE, Williams CC, Norton A, Hassall CD, Colino FL. Choosing MUSE: Validation of a
low-cost, portable EEG system for ERP research. Frontiers in neuroscience. 2017;11:109.
[12] Abujelala M, Abellanoza C, Sharma A, Makedon F. Brain-ee: Brain enjoyment evaluation using
commercial eeg headband. In: Proceedings of the 9th acm international conference on pervasive
technologies related to assistive environments. ACM; 2016. p. 33.
[13] Plotnikov A, Stakheika N, De Gloria A, Schatten C, Bellotti F, Berta R, et al. Exploiting real-
time EEG analysis for assessing flow in games. In: 2012 IEEE 12th International Conference
on Advanced Learning Technologies. IEEE; 2012. p. 688–689.
[14] Chai TY, Woo SS, Rizon M, Tan CS. Classification of human emotions from EEG signals using
statistical features and neural network. In: International. vol. 1. Penerbit UTHM; 2010. p. 1–6.
[15] Tanaka H, Hayashi M, Hori T. Statistical features of hypnagogic EEG measured by a new scoring
system. Sleep. 1996;19(9):731–738.
[16] Li M, Lu BL. Emotion classification based on gamma-band EEG. In: Engineering in medicine
and biology society, 2009. EMBC 2009. Annual international conference of the IEEE. IEEE;
2009. p. 1223–1226.
[17] Zheng WL, Zhu JY, Peng Y, Lu BL. EEG-based emotion classification using deep belief networks.
In: Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE; 2014. p. 1–6.
Classification of EEG Signals represented in 2D and 3D 28
[18] Ren Y, Wu Y. Convolutional deep belief networks for feature extraction of EEG signal. In: 2014
International Joint Conference on Neural Networks (IJCNN). IEEE; 2014. p. 2850–2853.
[19] Li K, Li X, Zhang Y, Zhang A. Affective state recognition from EEG with deep belief networks.
In: 2013 IEEE International Conference on Bioinformatics and Biomedicine. IEEE; 2013. p.
305–310.
[20] Bos DO, et al. EEG-based emotion recognition. The Influence of Visual and Auditory Stimuli.
2006;56(3):1–17.
[21] Lin YP, Wang CH, Jung TP, Wu TL, Jeng SK, Duann JR, et al. EEG-based emotion recognition
in music listening. IEEE Transactions on Biomedical Engineering. 2010;57(7):1798–1806.
[22] Wang XW, Nie D, Lu BL. Emotional state classification from EEG data using machine learning
approach. Neurocomputing. 2014;129:94–106.
[23] Koelstra S, Yazdani A, Soleymani M, M¨uhl C, Lee JS, Nijholt A, et al. Single trial classification of
EEG and peripheral physiological signals for recognition of emotions induced by music videos.
In: International Conference on Brain Informatics. Springer; 2010. p. 89–100.
[24] Suryotrisongko H, Samopa F. Evaluating OpenBCI spiderclaw V1 headwear’s electrodes
placements for brain-computer interface (BCI) motor imagery application. Procedia Computer
Science. 2015;72:398–405.
[25] Buchwald M, Jukiewicz M. Project and evaluation EMG/EOG human-computer interface.
Przeglad Elektrotechniczny. 2017;93.
[26] Apiwattanadej T, Zhang L, Li H. Electrospun polyurethane microfiber membrane on conductive
textile for water-supported textile electrode in continuous ECG monitoring application. In:
2018 Symposium on Design, Test, Integration & Packaging of MEMS and MOEMS (DTIP).
IEEE; 2018. p. 1–5.
[27] Nguyen A, Alqurashi R, Raghebi Z, Banaei-Kashani F, Halbower AC, Vu T. LIBS: a lightweight
and inexpensive in-ear sensing system for automatic whole-night sleep stage monitoring.
GetMobile: Mobile Computing and Communications. 2017;21(3):31–34.
[28] Jacobs KM. Brodmann’s areas of the cortex. Encyclopedia of Clinical Neuropsychology. 2011;p.
459–459.
[29] Finney EM, Fine I, Dobkins KR. Visual stimuli activate auditory cortex in the deaf. Nature
neuroscience. 2001;4(12):1171.
[30] Karuppusamy NS, Kang BY. Driver fatigue prediction using eeg for autonomous vehicle.
Advanced Science Letters. 2017;23(10):9561–9564.
[31] osler O, Suendermann D. A first step towards eye state prediction using eeg. Proc of the AIHLS.
2013;.
[32] Tu W, Sun S. A subject transfer framework for EEG classification. Neurocomputing. 2012;82:109–
116.
[33] Zheng WL, Lu BL. Personalizing EEG-based affective models with transfer learning. In:
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI
Press; 2016. p. 2732–2738.
[34] Sabancı K, Koklu M. The classification of eye state by using kNN and MLP classification models
according to the EEG signals. International Journal of Intelligent Systems and Applications in
Engineering. 2015;3(4):127–130.
[35] Sinha N, Babu D, et al. Statistical feature analysis for EEG baseline classification: Eyes Open vs
Eyes Closed. In: 2016 IEEE region 10 conference (TENCON). IEEE; 2016. p. 2466–2469.
[36] Bird JJ, Manso LJ, Ribiero EP, Ekart A, Faria DR. A Study on Mental State Classification using
EEG-based Brain-Machine Interface. In: 9th International Conference on Intelligent Systems.
IEEE; 2018. .
[37] Bird JJ, Ekart A, Buckingham CD, Faria DR. Mental Emotional Sentiment Classification with an
EEG-based Brain-Machine Interface. In: The International Conference on Digital Image and
Signal Processing (DISP’19). Springer; 2019. .
[38] Montgomery DC, Runger GC. Applied Statistics and Probability for Engineers. John Wiley &
Classification of EEG Signals represented in 2D and 3D 29
Sons; 2010.
[39] Zwillinger D, Kokoska S. CRC Standard Probability and Statistics Tables and Formulae.
Chapman & Hall; 2000.
[40] Strang G. Linear algebra and its applications. Brooks Cole; 2006.
[41] Chiu TY, Leonard T, Tsui KW. The matrix-logarithmic covariance model. Journal of the
American Statistical Association. 1996;91(433):198–210.
[42] Haber HE. Notes on the Matrix Exponential and Logarithm; 2019. online. Available from:
http://scipp.ucsc.edu/~haber/webpage/MatrixExpLog.pdf.
[43] James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. vol. 112.
Springer; 2013.
[44] Kohavi R, John GH. Wrappers for feature subset selection. Artificial intelligence. 1997;97(1-
2):273–324.
[45] John GH, Kohavi R, Pfleger K. Irrelevant features and the subset selection problem. In: Machine
Learning Proceedings 1994. Elsevier; 1994. p. 121–129.
[46] LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521(7553):436.
[47] Mej´ıa-Lavalle M, Sucar E, Arroyo G. Feature selection with a perceptron neural net. In:
Proceedings of the international workshop on feature selection for data mining; 2006. p. 131–135.
[48] Kullback S, Leibler RA. On information and sufficiency. The annals of mathematical statistics.
1951;22(1):79–86.
[49] Kullback S. Information theory and statistics. Courier Corporation; 1997.
[50] Yu L, Liu H. Feature selection for high-dimensional data: A fast correlation-based filter solution.
In: Proceedings of the 20th international conference on machine learning (ICML-03); 2003. p.
856–863.
[51] Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J. Flexible, high performance
convolutional neural networks for image classification. In: Twenty-Second International Joint
Conference on Artificial Intelligence; 2011. .
[52] Cire¸san D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification.
arXiv preprint arXiv:12022745. 2012;.
[53] Nave R. HyperPhysics. Georgia State University, Department of Physics and Astronomy; 2000.
[54] Hubel DH, Wiesel TN. Receptive fields and functional architecture of monkey striate cortex. The
Journal of physiology. 1968;195(1):215–243.
[55] Abhang PA, Gawali BW. Correlation of EEG images and speech signals for emotion analysis.
British Journal of Applied Science & Technology. 2015;10(5):1–13.
[56] Gevins A, Smith ME, McEvoy L, Yu D. High-resolution EEG mapping of cortical activation
related to working memory: effects of task difficulty, type of processing, and practice. Cerebral
cortex (New York, NY: 1991). 1997;7(4):374–385.
[57] Bird JJ, Ekart A, Faria DR. Evolutionary Optimisation of Fully Connected Artificial Neural
Network Topology. In: SAI Computing Conference 2019. SAI; 2019. .
[58] Bird JJ, Faria DR, Manso LJ, Ekart A, Buckingham CD. A Deep Evolutionary Approach
to Bioinspired Classifier Optimisation for Brain-Machine Interaction. Complexity. 2019;2019.
Available from: https://doi.org/10.1155/2019/4316548.
[59] Knuth DE. Postscript about NP-hard problems. ACM SIGACT News. 1974;6(2):15–16.
[60] Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, Wolpaw JR. BCI2000: a general-
purpose brain-computer interface (BCI) system. IEEE Transactions on biomedical engineering.
2004;51(6):1034–1043.
[61] Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, et al. Components of a new
research resource for complex physiologic signals. PhysioBank, PhysioToolkit, and Physionet;.
[62] Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank,
PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic
signals. Circulation. 2000;101(23):e215–e220.
[63] Subasi A. EEG signal classification using wavelet feature extraction and a mixture of expert
Classification of EEG Signals represented in 2D and 3D 30
model. Expert Systems with Applications. 2007;32(4):1084–1093.
[64] Azlan WAW, Low YF. Feature extraction of electroencephalogram (EEG) signal-A review. In:
2014 IEEE Conference on Biomedical Engineering and Sciences (IECBES). IEEE; 2014. p. 801–
806.
[65] Krishnan S, Athavale Y. Trends in biomedical signal feature extraction. Biomedical Signal
Processing and Control. 2018;43:41–63.
[66] Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model
selection. In: Ijcai. vol. 14. Montreal, Canada; 1995. p. 1137–1145.
... The concentration task, subjects followed a "shell game" where a ball was hidden under one of three cups, challenging them to track the cup concealing the ball. Moreover, this dataset has been utilized in previous studies for the development of neural networks aimed at mental state classification [15][16] [17]. The Muse, a noninvasive EEG device worn like a headband, is equipped with four EEG sensors-AF7, AF8, TP9, and TP10-strategically positioned to capture signals from the frontal and temporal regions of the brain. ...
... The one-second window is divided into two half windows, and four quarter-windows with /2 and /4 samples in each, respectively [22]. Classical statistical features are extracted for each of these windows as shown in algorithm 1 proposed in previous works [15] and [22]. After eliminating redundant features resulting from overlapping windows, a total of 989 features is retained from each file. ...
... Prior studies leveraging CNNs for classifying mental states through EEG data have demonstrated notable success. A notable example is presented by [15], where CNNs achieved a test accuracy of 89.38%. The CNNs developed for this study were crafted utilizing the Keras and TensorFlow Python libraries, and the training was executed on an NVIDIA GeForce GTX 1650 with 1024 CUDA Cores, 4GB, and 8Gbps GDDR5 VRAM. ...
Chapter
Full-text available
This study is divided into two main components. Firstly, it involves the design and training of multiple Convolutional Neural Networks (CNN) for the classification of brainwaves, predicting the mental state of an individual. Secondly, it encompasses the development of a Brain-Computer Interface (BCI) designed to record brainwaves, offering a user-friendly means to predict mental states using the recorded data and the trained neural network. The study utilizes a publicly available electroencephalographic (EEG) dataset collected with the Muse EEG headband. Various preprocessing techniques such as wavelet transform (WT), feature extraction, and feature selection are explored. The chosen temporal and statistical features are transformed into 2D grayscale images to facilitate the training of CNN models, classifying mental states into three categories: concentrated, neutral, and relaxed. The achieved highest accuracy is 91.72%, demonstrating competitiveness and improvement compared to previous works using the same dataset. The selected CNN model performs a fusion of the selected features and is integrated into the BCI, enabling users to predict mental states using EEG data. This BCI also holds the potential for enhancing model accuracy through continuous testing and incorporation of valuable data into the training dataset.
... The concentration task, subjects followed a "shell game" where a ball was hidden under one of three cups, challenging them to track the cup concealing the ball. Moreover, this dataset has been utilized in previous studies for the development of neural networks aimed at mental state classification [15][16] [17]. The Muse, a noninvasive EEG device worn like a headband, is equipped with four EEG sensors-AF7, AF8, TP9, and TP10-strategically positioned to capture signals from the frontal and temporal regions of the brain. ...
... The one-second window is divided into two half windows, and four quarter-windows with /2 and /4 samples in each, respectively [24]. Classical statistical features are extracted for each of these windows as shown in algorithm 1 proposed in previous works [15] [24]. After eliminating redundant features resulting from overlapping windows, a total of 989 features is retained from each file. ...
... Prior studies leveraging CNNs for classifying mental states through EEG data have demonstrated notable success. A notable example is presented by [15], where CNNs achieved a test accuracy of 89.38%. The CNNs developed for this study were crafted utilizing the Keras and TensorFlow Python libraries, and the training was executed on an NVIDIA GeForce GTX 1650 with 1024 CUDA Cores, 4GB, and 8Gbps GDDR5 VRAM. ...
Conference Paper
Full-text available
This study is divided into two main components. Firstly, it involves the design and training of multiple Convolutional Neural Networks (CNN) for the classification of brainwaves, predicting the mental state of an individual. Secondly, it encompasses the development of a Brain-Computer Interface (BCI) designed to record brainwaves, offering a user-friendly means to predict mental states using the recorded data and the trained neural network. The study utilizes a publicly available electroencephalographic (EEG) dataset collected with the Muse EEG headband. Various preprocessing techniques such as wavelet transform (WT), feature extraction, and feature selection are explored. The chosen temporal and statistical features are transformed into 2D grayscale images to facilitate the training of CNN models, classifying mental states into three categories: concentrated, neutral, and relaxed. The achieved highest accuracy is 91.72%, demonstrating competitiveness and improvement compared to previous works using the same dataset. The selected CNN model performs a fusion of the selected features and is integrated into the BCI, enabling users to predict mental states using EEG data. This BCI also holds the potential for enhancing model accuracy through continuous testing and incorporation of valuable data into the training dataset.
... With the rapid development of AI, deep learning has been used successfully in delivering a quantitative analysis of suspicious lesions in a relatively short time. Convolution Neural Network (CNN) is a type of artificial neural network that is commonly used for image recognition and classification [13], [14], where the model learns an internal representation of a two-dimensional (2D) input, a process referred to as feature learning. This process is also applied to a one-dimensional (1D) sequence of data, such as in the case of natural language processing [15] and gyroscopic data for human activity recognition [16]. ...
... This process is also applied to a one-dimensional (1D) sequence of data, such as in the case of natural language processing [15] and gyroscopic data for human activity recognition [16]. CNN is widely used in medical image classification to classify electroencephalography (EEG) [13] and Alzheimer's diseases [14], respectively. In 2019, [17] proposed using CNN with an LSTM model to detect QRS complexes in noisy electrocardiogram (ECG) raw signal images. ...
Article
Full-text available
Investigation of microcirculation is the key to diagnose circulatory dysfunction. Tissue circulation monitoring is a crucial part of the care of patients with severe chronic illnesses because it affects oxygen delivery to tissue. Recent technology, such as hyperspectral imaging, has allowed visualization of microcirculation at the price of high computation resources. Meanwhile, pulse oximeter performance varies with factors like the subject's skin colour. This study explores the feasibility of using an in-house assembled multispectral photoacoustic (PA) system to investigate microcirculation performance in human subjects. We used pretrained Alexnet, Long Short-Term Memory (LSTM), and a hybrid Alexnet-LSTM network for the prediction task. This research included thirty-seven healthy participants in this cross-sectional study. The ultrasonic waves collected from their posterior left arm under two experimental settings, namely at rest (i.e., control) and with arterial blood flow occlusions, were used to predict the microcirculation changes in tissue using the deep networks. Our findings showed the superiority of the hybrid model over the Alexnet and LSTM, with an average testing accuracy of 95.7 % and precision of 98.2 %, making it an ideal deep learning model for the task. This study concluded that the proposed deep learning incorporated photoacoustic system has a promising future for diagnosing and treating patients with compromised microcirculatory conditions.
... Therefore, in this study, CNN architecture was chosen to classify EEG data. The study evaluated the performance of the model on various experiments and different preprocessing methods applied to the data [55]. ...
Article
Full-text available
The primary aim of this study was to assess the classification performance of deep learning models in distinguishing between resting state and motor imagery swallowing, utilizing various preprocessing and data visualization techniques applied to electroencephalography (EEG) data. In this study, we performed experiments using four distinct paradigms such as natural swallowing, induced saliva swallowing, induced water swallowing, and induced tongue protrusion on 30 right-handed individuals (aged 18 to 56). We utilized a 16-channel wearable EEG headset.We thoroughly investigated the impact of different preprocessing methods (Independent Component Analysis, Empirical Mode Decomposition, bandpass filtering) and visualization techniques (spectrograms, scalograms) on the classification performance of multichannel EEG signals. Additionally, we explored the utilization and potential contributions of deep learning models, particularly Convolutional Neural Networks (CNNs), in EEG-based classification processes. The novelty of this study lies in its comprehensive examination of the potential of deep learning models, specifically in distinguishing between resting state and motor imagery swallowing processes, using a diverse combination of EEG signal preprocessing and visualization techniques. The results showed that it was possible to distinguish the resting state from the imagination of swallowing with 89.8% accuracy, especially using continuous wavelet transform (CWT) based scalograms. The findings of this study may provide significant contributions to the development of effective methods for the rehabilitation and treatment of swallowing difficulties based on motor imagery-based brain computer interfaces.
... This technique involves placing electrodes on the scalp to detect and measure voltage fluctuations resulting from ionic flow within brain neurons. EEG signals are categorized into different frequency bands: delta (0.5-4 Hz), theta (4)(5)(6)(7)(8), alpha (8)(9)(10)(11)(12)(13), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma , each associated with different brain states and functions. Compared to other tests like fMRI, EGG offers advantages such as low cost, portability, and noninvasive measurement, reducing the burden on subjects and minimizing side effects. ...
Article
Full-text available
Emotion classification is a challenge in affective computing, with applications ranging from human–computer interaction to mental health monitoring. In this study, the classification of emotional states using electroencephalography (EEG) data were investigated. Specifically, the efficacy of the combination of various feature selection methods and hyperparameter tuning of machine learning algorithms for accurate and robust emotion recognition was studied. The following feature selection methods were explored: filter (SelectKBest with analysis of variance (ANOVA) F-test), embedded (least absolute shrinkage and selection operator (LASSO) tuned using Bayesian optimization (BO)), and wrapper (genetic algorithm (GA)) methods. We also executed hyperparameter tuning of machine learning algorithms using BO. The performance of each method was assessed. Two different EEG datasets, EEG Emotion and DEAP Dataset, containing 2548 and 160 features, respectively, were evaluated using random forest (RF), logistic regression, XGBoost, and support vector machine (SVM). For both datasets, the experimented three feature selection methods consistently improved the accuracy of the models. For EEG Emotion dataset, RF with LASSO achieved the best result among all the experimented methods increasing the accuracy from 98.78% to 99.39%. In the DEAP dataset experiment, XGBoost with GA showed the best result, increasing the accuracy by 1.59% and 2.84% for valence and arousal. We also show that these results are superior to those by the previous other methods in the literature.
... In signal processing for physiological data, various transformative techniques have emerged as powerful tools to convert time-domain signals into image representations, enabling effective feature engineering for CNNs [45]. Although 1D feature extraction methods like time and frequency domain analysis are essential for understanding signal characteristics, they have inherent limitations. ...
Article
Full-text available
Physiological signals such as electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG) provide valuable clinical information but pose challenges for analysis due to their high-dimensional nature. Traditional machine learning techniques, relying on hand-crafted features from fixed analysis windows, can lead to the loss of discriminative information. Recent studies have demonstrated the effectiveness of deep convolutional neural networks (CNNs) for robust automated feature learning from raw physiological signals. However, standard CNN architectures require two-dimensional image data as input. This has motivated research into innovative signal-to-image (STI) transformation techniques to convert one-dimensional time series into images preserving spectral, spatial, and temporal characteristics. This paper reviews recent advances in strategies for physiological signal-to-image conversion and their applications using CNNs for automated processing tasks. A systematic analysis of EEG, EMG, and ECG signal transformation and CNN-based analysis techniques spanning diverse applications, including brain-computer interfaces, seizure detection, motor control, sleep stage classification, arrhythmia detection, and more, are presented. Key insights are synthesised regarding the relative merits of different transformation approaches, CNN model architectures, training procedures, and benchmark performance. Current challenges and promising research directions at the intersection of deep learning and physiological signal processing are discussed. This review aims to catalyse continued innovations in effective end-to-end systems for clinically relevant information extraction from multidimensional physiological data using deep neural networks by providing a comprehensive overview of state-of-the-art techniques.
... DL can directly use raw EEG data without the need to preprocess the data or extract features manually (Altaheri et al. 2022;Luo et al. 2018;Schirrmeister et al. 2017). Furthermore, as a modern network learning structure, DL has shown the stronger feature extraction capability and excellent performance in several applications, such as speech recognition (SR) (Tang et al. 2017), image recognition (Lee and Kwon 2017;Bird et al. 2021), EEG recognition (Lawhern et al. 2018), and others. Some researchers have also clarified that DL plays a key role in decoding brain activities accurately. ...
Article
Full-text available
Currently, electroencephalogram (EEG)-based motor imagery (MI) signals have been received extensive attention, which can assist disabled subjects to control wheelchair, automatic driving and other activities. However, EEG signals are easily affected by some factors, such as muscle movements, wireless devices, power line, etc., resulting in the low signal-to-noise ratios and the worse recognition results on EEG decoding. Therefore, it is crucial to develop a stable model for decoding MI-EEG signals. To address this issue and further improve the decoding performance for MI tasks, a hybrid structure combining convolutional neural networks and bidirectional long short-term memory (BLSTM) model, namely CBLSTM, is developed in this study to handle the various EEG-based MI tasks. Besides, the attention mechanism (AM) model is further adopted to adaptively assign the weight of EEG vital features and enhance the expression which beneficial to classification for MI tasks. First of all, the spatial features and the time series features are extracted by CBLSTM from preprocessed MI-EEG data, respectively. Meanwhile, more effective features information can be mined by the AM model, and the softmax function is utilized to recognize intention categories. Ultimately, the numerical results illustrate that the model presented achieves an average accuracy of 98.40% on the public physioNet dataset and faster training process for decoding MI tasks, which is superior to some other advanced models. Ablation experiment performed also verifies the effectiveness and feasibility of the developed model. Moreover, the established network model provides a good basis for the application of brain-computer interface in rehabilitation medicine.
... Alternatively, some studies have applied Wavelet transform or CSP prior to using neural networks [20], or have used time-domain AM EEG features to train deep network architectures [8]. When it comes to the architecture of the deep learning model, studies that focused on MI as the mental strategy have preferred convolutional frameworks to capture spatial relationships among different brain areas [17,[21][22][23]. Another advantage of deep learning is the ability to perform transfer learning, where models can be trained with data from different domains and then fine-tuned for the desired one [22,24,25]. ...
Article
Full-text available
Background This research focused on the development of a motor imagery (MI) based brain–machine interface (BMI) using deep learning algorithms to control a lower-limb robotic exoskeleton. The study aimed to overcome the limitations of traditional BMI approaches by leveraging the advantages of deep learning, such as automated feature extraction and transfer learning. The experimental protocol to evaluate the BMI was designed as asynchronous, allowing subjects to perform mental tasks at their own will. Methods A total of five healthy able-bodied subjects were enrolled in this study to participate in a series of experimental sessions. The brain signals from two of these sessions were used to develop a generic deep learning model through transfer learning. Subsequently, this model was fine-tuned during the remaining sessions and subjected to evaluation. Three distinct deep learning approaches were compared: one that did not undergo fine-tuning, another that fine-tuned all layers of the model, and a third one that fine-tuned only the last three layers. The evaluation phase involved the exclusive closed-loop control of the exoskeleton device by the participants’ neural activity using the second deep learning approach for the decoding. Results The three deep learning approaches were assessed in comparison to an approach based on spatial features that was trained for each subject and experimental session, demonstrating their superior performance. Interestingly, the deep learning approach without fine-tuning achieved comparable performance to the features-based approach, indicating that a generic model trained on data from different individuals and previous sessions can yield similar efficacy. Among the three deep learning approaches compared, fine-tuning all layer weights demonstrated the highest performance. Conclusion This research represents an initial stride toward future calibration-free methods. Despite the efforts to diminish calibration time by leveraging data from other subjects, complete elimination proved unattainable. The study’s discoveries hold notable significance for advancing calibration-free approaches, offering the promise of minimizing the need for training trials. Furthermore, the experimental evaluation protocol employed in this study aimed to replicate real-life scenarios, granting participants a higher degree of autonomy in decision-making regarding actions such as walking or stopping gait.
Chapter
Full-text available
This work presents an image classification approach to EEG brainwave classification. The proposed method is based on the representation of temporal and statistical features as a 2D image, which is then classified using a deep Convolutional Neural Network. A three-class mental state problem is investigated, in which subjects experience either relaxation, concentration, or neutral states. Using publicly available EEG data from a Muse Electroencephalography headband, a large number of features describing the wave are extracted, and subsequently reduced to 256 based on the Information Gain measure. These 256 features are then normalised and reshaped into a 16×1616\times 16 grid, which can be expressed as a grayscale image. A deep Convolutional Neural Network is then trained on this data in order to classify the mental state of subjects. The proposed method obtained an out-of-sample classification accuracy of 89.38%, which is competitive with the 87.16% of the current best method from a previous work.
Conference Paper
Full-text available
This work presents an image classification approach to EEG brainwave classification. The proposed method is based on the representation of temporal and statistical features as a 2D image, which is then classified using a deep Convolutional Neural Network. A three-class mental state problem is investigated, in which subjects experience either relaxation, concentration, or neutral states. Using publicly available EEG data from a Muse Electroencephalography headband, a large number of features describing the wave are extracted, and subsequently reduced to 256 based on the Information Gain measure. These 256 features are then normalised and reshaped into a 16 × 16 grid, which can be expressed as a grayscale image. A deep Convolutional Neural Network is then trained on this data in order to classify the mental state of subjects. The proposed method obtained an out-of-sample classification accuracy of 89.38%, which is competitive with the 87.16% of the current best method from a previous work.
Chapter
Full-text available
This paper proposes an approach to selecting the amount of layers and neurons contained within Multilayer Perceptron hidden layers through a single-objective evolutionary approach with the goal of model accuracy. At each generation, a population of Neural Network architectures are created and ranked by their accuracy. The generated solutions are combined in a breeding process to create a larger population, and at each generation the weakest solutions are removed to retain the population size inspired by a Darwinian ‘survival of the fittest’. Multiple datasets are tested, and results show that architectures can be successfully improved and derived through a hyper-heuristic evolutionary approach, in less than 10% of the exhaustive search time. The evolutionary approach was further optimised through population density increase as well as gradual solution max complexity increase throughout the simulation.
Article
Full-text available
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: one for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations.
Conference Paper
Full-text available
This paper explores single and ensemble methods to classify emotional experiences based on EEG brainwave data. A commercial MUSE EEG headband is used with a resolution of four (TP9, AF7, AF8, TP10) electrodes. Positive and negative emotional states are invoked using film clips with an obvious valence, and neutral resting data is also recorded with no stimuli involved, all for one minute per session. Statistical extraction of the alpha, beta, theta, delta and gamma brainwaves is performed to generate a large dataset that is then reduced to smaller datasets by feature selection using scores from OneR, Bayes Network, Information Gain, and Symmetrical Uncertainty. Of the set of 2548 features, a subset of 63 selected by their Information Gain values were found to be best when used with ensemble classifiers such as Random Forest. They attained an overall accuracy of around 97.89%, outperforming the current state of the art by 2.99 percentage points. The best single classifier was a deep neural network with an accuracy of 94.89%.
Conference Paper
Full-text available
This paper proposes an approach to selecting the amount of layers and neurons contained within Multilayer Perceptron hidden layers through a single-objective evolutionary approach with the goal of model accuracy. At each generation, a population of Neural Network architectures are created and ranked by their accuracy. The generated solutions are combined in a breeding process to create a larger population, and at each generation the weakest solutions are removed to retain the population size inspired by a Darwinian 'survival of the fittest'. Multiple datasets are tested, and results show that architectures can be successfully improved and derived through a hyper-heuristic evolutionary approach, in less than 10% of the exhaustive search time. The evolutionary approach was further optimised through population density increase as well as gradual solution max complexity increase throughout the simulation.
Conference Paper
Full-text available
This work aims to find discriminative EEG-based features and appropriate classification methods that can categorise brainwave patterns based on their level of activity or frequency for mental state recognition useful for human-machine interaction. By using the Muse headband with four EEG sensors (TP9, AF7, AF8, TP10), we categorised three possible states such as relaxing, neutral and concentrating based on a few states of mind defined by cognitive behavioural studies. We have created a dataset with five individuals and sessions lasting one minute for each class of mental state in order to train and test different methods. Given the proposed set of features extracted from the EEG headband five signals (alpha, beta, theta, delta, gamma), we have tested a combination of different features selection algorithms and classifier models to compare their performance in terms of recognition accuracy and number of features needed. Different tests such as 10-fold cross validation were performed. Results show that only 44 features from a set of over 2100 features are necessary when used with classical classifiers such as Bayesian Networks, Support Vector Machines and Random Forests, attaining an overall accuracy over 87%.
Article
Full-text available
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
Article
Signal analysis involves identifying signal behaviour, extracting linear and non-linear properties, compression or expansion into higher or lower dimensions, and recognizing patterns. Over the last few decades, signal processing has taken notable evolutionary leaps in terms of measurement – from being simple techniques for analysing analog or digital signals in time, frequency or joint time–frequency (TF) domain, to being complex techniques for analysis and interpretation in a higher dimensional domain. The intention behind this is simple – robust and efficient feature extraction; i.e. to identify specific signal markers or properties exhibited in one event, and use them to distinguish from characteristics exhibited in another event. The objective of our study is to give the reader a bird's eye view of the biomedical signal processing world with a zoomed-in perspective of feature extraction methodologies which form the basis of machine learning and hence, artificial intelligence. We delve into the vast world of feature extraction going across the evolutionary chain starting with basic A-to-D conversion, to domain transformations, to sparse signal representations and compressive sensing. It should be noted that in this manuscript we have attempted to explain key biomedical signal feature extraction methods in simpler fashion without detailing over mathematical representations. Additionally we have briefly touched upon the aspects of curse and blessings of signal dimensionality which would finally help us in determining the best combination of signal processing methods which could yield an efficient feature extractor. In other words, similar to how the laws of science behind some common engineering techniques are explained, in this review study we have attempted to postulate an approach towards a meaningful explanation behind those methods in developing a convincing and explainable reason as to which feature extraction method is suitable for a given biomedical signal.