Conference PaperPDF Available

Classification of Motor Imagery EEG Signals with multi-input Convolutional Neural Network by augmenting STFT


Abstract and Figures

Motor imagery EEG classification is a crucial task in the Brain Computer Interface (BCI) system. In this paper, we propose a Motor Imagery EEG signal classification framework based on Convolutional Neural Network (CNN) to inhance the classification accuracy. For the classification of 2 class motor imagery signals, firstly we apply Short Time Fourier Transform (STFT) on EEG time series signals to transform signals into 2D images. Next, we train our proposed multi-input convolutional neural network with feature concatenation to achieve robust classification from the images. Batch normalization is added to regularize the network. Data augmentation is used to increase samples and as a secondary regularizer. A three input CNN was proposed to feed the three channel EEG signals. In our work, the dataset of EEG signal collected from BCI Competition IV dataset 2b and dataset III of BCI Competition II were used. Experimental results show that average classification accuracy achieved was 89.19% on dataset 2b, whereas our model achieved the best performance of 97.7% accuracy for subject 7 on dataset III. We also extended our approach and explored a transfer learning based scheme with pre-trained ResNet-50 model which showed promising result. Overall, our approach showed competitive performance when compared with other methods.
Content may be subject to copyright.
Proceedings of the 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), 26-28 September, Dhaka, Bangladesh
Classification of Motor Imagery EEG Signals with
multi-input Convolutional Neural Network by
augmenting STFT
Tanvir Hasan Shovon, Zabir Al Nazi, Shovon Dash, Md. Foisal Hossain
Dept. of Electronics and Communication Engineering
Khulna University of Engineering & Technology
Khulna-9203, Bangladesh,,,
Abstract—Motor imagery EEG classification is a crucial task
in the Brain Computer Interface (BCI) system. In this paper, we
propose a Motor Imagery EEG signal classification framework
based on Convolutional Neural Network (CNN) to enhance the
classification accuracy. For the classification of 2 class motor
imagery signals, firstly we apply Short Time Fourier Transform
(STFT) on EEG time series signals to transform signals into 2D
images. Next, we train our proposed multi-input convolutional
neural network with feature concatenation to achieve robust
classification from the images. Batch normalization is added to
regularize the network. Data augmentation is used to increase
samples and as a secondary regularizer. A three input CNN was
proposed to feed the three channel EEG signals. In our work, the
dataset of EEG signal collected from BCI Competition IV dataset
2b and dataset III of BCI Competition II were used. Experimental
results show that average classification accuracy achieved was
89.19% on dataset 2b, whereas our model achieved the best
performance of 97.7% accuracy for subject 7 on dataset III.
We also extended our approach and explored a transfer learning
based scheme with pre-trained ResNet-50 model which showed
promising result. Overall, our approach showed competitive
performance when compared with other methods.
Index Terms—EEG, BCI, STFT, CNN, Augmentation, Transfer
Brain computer interaction is a rapid-growing technology
that assists disabled people like paralytic patients and gives
solutions to these neurologically disabled patients [1]. Day by
day the enthusiasm in this field has dramatically increased.
Brain computer interface (BCI) system gives the users with
a complete communication system between their body and
external devices. It's a two-way process. The motor imaginary
process is a mental activity without any actual body movement.
In a study [2], 4 class EEG data was classified and compared
with namely minimum distance analysis (MDA), linear dis-
criminant analysis (LDA), k-nearest-neighbor (kNN) classifier
and support vector machine (SVM) classifier. Kalman filtering
technique was used for parameter extraction. By using an
adaptive autoregressive (AAR) theory, the EEG signal was
both authors contributed equally.
modeled. However, in the part of classification, support vector
machine (SVM) performed as the best classifier.
In another work [3], the author proposed a Deep Convolu-
tional Neural Network (DCNN) methodology for identifying
4 movements (left-hand , right-hand , feet and tongue move-
ment), where EEG signals were represented by power spectral
density (PSD). Authors proposed method got mean accuracy
of 0.8797 ±0.0296 for the BCI Competition IV Dataset 2a.
In recent year, researchers began to apply Deep Learning
for multi-class motor imagery classification. For instance, in
[4], the authors proposed a novel method that was robust
to unwanted noise and subject variation. In this method,
EEG data was converted into multi-dimensional tensor images
using Azimuth Equipment Projection (AEP) technique that
preserved the spatial, spectral and temporal structure of human
brain signals. In the classification part, ConvNets and LSTM
network were used that reduced the error rate from 15.3%
to 8.9% and increased the accuracy also. One of the main
requirement was the information about the coordinates of
electrodes in this work. In [5], a new signal to angle-amplitude
graph image conversion was proposed. The scale-invariant
feature transform (SIFT) technique and the bag of visual
words (BoW) were used in the feature extraction stage. In this
work, K-NN classifier was employed both for EEG and MEG
dataset and performed well. In [6], the authors proposed a
method called Weighted Difference of Power Spectral Density
(WDPSD) for feature extraction of BCI system based on
2-class motor imagery. Here, the PSD difference matrix of
EEG signal was obtained by STFT, CWT and HHT methods
form optimal channels. The method experimented on BCI
Competition IV Dataset 2a and Dataset 2b and showed good
classification accuracy.
In this paper, we propose a multi-input convolutional neural
network. Firstly we converted the EEG signals into STFT
images. Furthermore, Augmentation was applied to increase
the number of images. The performance of this model has
been evaluated with two datasets. Such results indicate that
proposed multi-input convolutional neural network reduces
the classification error and improves the motor imagery EEG
978-1-7281-4934-9/19/$31.00 ©2019IEEE
signal classification accuracy.
Methodology section contains STFT, data augmentation and
model architecture discussion. Our proposed method is in-
spired by computer vision and medical image analysis domain.
Computer vision filed is flourishing since the development
of Convolutional Neural Network. Data Augmentation, Con-
volutional Neural Network, Feature Concatenation, Transfer
learning have been successfully employed in medical signal,
and image analysis domain [7–10].
Fig. 1. Basic framework of proposed classification method.
In our approach, EEG signals of Motor Imagery are con-
verted into 2D images with the help of Short Time Fourier
Transform. Then the input images are feed into the multi-input
Convolutional Neural Network (CNN). The proposed method
is evaluated. Fig.1 represents the framework of the proposed
A. Short Time Fourier Transform
Short-time Fourier transform (STFT) is a time-frequency
analysis used for non-stationary signals analysis. It divides a
long time signal into segment with same size of window and
apply Fourier transform on it. STFT of signal s(t) is F(τ, ω).
F(τ, ω ) =
s(t)h(tτ)ejωt dt (1)
h(t) is the window function. PSD of F(τ, ω)is P(τ , ω)which
is defined by
Ps(τ, ω ) = |F(τ, ω)|2(2)
PSD gives a 2D matrix which represents the power of EEG
signal with fixed resolution. For better frequency analysis
window size is kept wider and on the other hand narrow
window is for better time analysis. In our study, STFT was
applied on EEG signal. Hanning window was used, where
length of each segment is 256.
The EEG signal Xis considered to have length dand
there are 3channels. For simplicity, the sampling rate for all
channels has been set to fs. The length of fft is denoted by
nfft. The signals are separated channel-wise, and an STFT
has been performed for each one.
After STFT, we get the vector of sample frequencies f,
vector of segment times t, and the STFT which is of dimension
lf×lt. We have only considered the absolute value of the
STFT, which preserves good amount of information. After
taking STFT, we transform the magnitude to RGB color
Algorithm 1 Algorithm for EEG Signal to STFT Image
Input: Xd×3,fs,nfft
Output: Im×m×3×3
1: X1=X1:d,1
2: X2=X1:d,2
3: X3=X1:d,3
4: I=0m×m×3×3
5: for i= 1 to 3do
6: f,t,Z=ST F T (Xi, fs, nfft)
7: Zr=abs(Z)
8: I0=C(Zr)
9: Ii=Resize(I0, m ×m×3)
10: end for
11: return I
information through function C. Finally, we get RGB images,
which are resized to match the input of the CNN. We have
used, m= 224 so that the transfer learning experiments
become simpler.
B. Data Augmentation
Scarcity of samples problem can be solved by using data
augmentation technique that takes images and converts to all
the possible form we specify like rotation, flipping, zoom
in, zoom out etc. The parameter we have used are rotation
of 0 5, flipping probability 0.2, zooming coverage area .08,
and brightness min factor 0.7 with max factor 1.3 [11–13]. It
is expected that data augmentation will regularize our CNN
model and will solve the overfitting problem. Fig.2 represents
the augmented images.
Fig. 2. Augmented images.
C. Model Architecture
Convolutional neural network (CNN) is a special form of
deep neural networks. CNN consists of one input layer, one
output layer and in between them are multiple convolutional,
and dense layers. ConvNet is able to adopt the spatial and
temporal information in an image. The objectives of con-
volution operation is to extract the features. CNN reduces
the preprocessing part compared to traditional classification
algorithms and hidden layers of CNN carry out the input data
in a filtered form. Deeper convolutional layers learn more
complex features [14, 15].
We propose a novel methodology for EEG classification
with multi-channel CNN and data augmentation to analyze
the EEG images. This network has scalable high-performance
architecture. Input image size is (224 ×224 ×3) for each
channel. The input images are convolved with filters in the
convolution layers. In this network all convolutional layer use
3×3size kernel with relu activation and max pooling layer.
Max pooling layer minimizes the high dimensional input into
smaller ones. The size of window of each max pooling layer
is 2×2. Three channel EEG images are used as input of
three convolutional layers. Then these convolutional layers
were concatenated. Next layer contains 16 kernels of size
3×3with relu activation layer. After this layer, number of
2D convolutional layers have been applied with number of
kernels 16, and 8 respectively. Max pooling layer has been
applied after each convolution layer. Batch normalization is
added between CNN layers to regularize the CNN model. In
later experiments, batch normalization layers were removed
as it helped to decrease the loss quickly. Finally, a fully
connected layer is attached on the top of the convolutional
layer with 8 neurons. The last layer is softmax classification
layer. The model has in total only 573,704 parameters, so it’s
easy and very fast to train. The proposed model architecture
is illustrated in Fig.3.
Layer Output
tensor shape # parameters
Input 1 B×224 ×224 ×30
Input 2 B×224 ×224 ×30
Input 3 B×224 ×224 ×30
Conv2D 1 B×224 ×224 ×32 896
Conv2D 2 B×224 ×224 ×32 896
Conv2D 3 B×224 ×224 ×32 896
Concatenate 1 B×672 ×224 ×32 0
Conv2D 4 B×670 ×222 ×16 4624
Activation 1 B×670 ×222 ×16 0
Conv2D 5 B×668 ×220 ×16 2320
Activation 2 B×668 ×220 ×16 0
Max pooling 1 B×334 ×110 ×16 0
Conv2D 6 B×332 ×108 ×81160
Max pooling 2 B×166 ×54 ×80
Flatten B×71712 0
Dense 1 B×8573704
Dense 2 B×218
* Here, Bdenotes the batch size.
A. Dataset
The EEG dataset used in this work were dataset III from
BCI Competition 2 [16] and training part of dataset 2b from
BCI Competition IV [17]. Both datasets contain EEG data for
Input 1:
Input Layer
Input 2:
Input Layer
Input 3:
Input Layer
Conv2d 1:
Conv 2D
Conv2d 2:
Conv 2D
Conv2d 3:
Conv 2D
Concatenate 1: Concatenate
Conv2d 4: Conv 2D
Activation 1: Activation
Conv2d 5: Conv 2D
Activation 2: Activation
Maxpooling2d 1: Maxpooling2D
Conv2d 6: Conv 2D
Maxpooling2d 2: Maxpooling2D
Flatten 1: Flatten
Dense 1: Dense
Dense 2: Dense
Fig. 3. CNN Architecture
Motor Imagery task. EEG was collected form 3 channels C3,
Cz and C4. Table II describes the information of datasets.
Dataset 2b of BCI Competition 4 includes three sessions as
training set. Third session includes online feedback at a 250
Hz sampling frequency and other two sessions are without
feedback. Per session resulted in 20 trials per run and 120
trials in total. Visual cue is displayed for 1.25s. After that the
subjects perform 4s motor imagery task. In the Dataset III of
BCI competition 2, Dataset sampling frequency was 128 Hz.
The data set forms of 140 trials for training set. The length of
recording per trial was 9s.
Dataset Subjects Channels Trials Rate
Competition 2 dataset III [16] 1 C3, Cz and C4 280 128
Competition 4 dataset IIb [17] 9 C3, Cz and C4 400 250
B. Experimental Study
Transfer learning is a learning technique in machine learning
domain, where a model learned on one task is re-purposed on
a second task. It is an optimization technique which improves
the performance of the model. In transfer learning, firstly we
train a network on specific dataset then, after learning the
features we transfer them to a second network to train on
another target dataset.
Subject Classifier Accuracy(%) Recall(%) Precision(%)
S1 SVM 93.33 96.33 88.98
Forest 80.35 77.98 73.84
tree 79.91 82.56 72.54
KNN 86.61 89.91 80.21
S2 SVM 94.64 95.412 91.63
Forest 83.48 83.48 77.10
tree 76.78 79.81 69.17
KNN 86.60 91.74 79.83
ResNet is a class of Residual Networks. In the residual
learning, we try to learn some residual,in lieu of learning some
features. ResNet introduced the concept of skip connections.
Residual can be easily realized as elimination of feature which
has learned from any layer and the input of that layer. ResNet
makes this using shortcut connections (straightly connecting
input of nth layer to some (n+x)th layer. The ResNet-50
model made up of 5 stages, each has a convolution and Identity
block. Each convolution block has 3 convolution layers and
each identity block also has 3 convolution layers. The ResNet-
50 has over 23 million trainable parameters.
We used pre-trained ResNet-50 weight, trained on ImageNet
natural image dataset, and used it as a feature extractor for our
EEG STFT images [18]. After feature extraction, we compared
multiple classifiers for EEG classification. It can be observed
from Table III, the results are very promising. So, the features
were useful for EEG STFT images, even though the ResNet-
50 model was trained on natural images. This experiment
demonstrates the greater applicability and scopes for transfer
The t-SNE embedding for the extracted features by ResNet-
50 model is shown in Fig.4. Small clusters formed by two
classes can be observed in the embedding plot.
Fig. 4. t-SNE embedding for subject 1 and 2.
C. Results
In this study, the proposed method was experimented on
BCI Competition IV dataset 2b. Additional 1000 augmented
images were generated. The images were split into train and
test images for each subject. For each subject performance was
evaluated with four performance metrics: Accuracy, Precision,
Sensitivity, and Specificity. Table IV shows that for subject 1,
4 and 7 where our model achieved highest accuracy compared
to [19] but for subject 2 our model performed poorly in BCI
Competition 4 dataset 2b. Average accuracy for all subjects
was 89.19%. In each session 90% images were selected for
training and 10% were selected for testing. We used 32 epochs
with batch size 32 for training our model. Table V represents
the result of dataset III of BCI Competition II We compared
our results with other works based on same dataset. The
authors in [19] proposed CNN and Stacked Autoencoders
(SAE) based deep learning method to classify 2 class motor
imagery signals. In study [20], sparse kernel machine was
proposed to classify left and right hand signals.
In order to evaluate our model we compare our results with
2 other recent works. Table VII represents the comparison
for dataset 2b of BCI Competition IV. For all 9 subjects
the performance of our model is better than other results.
Though subject 2 performance accuracy is not high but still
competitive. In reference [19], author proposed sparse kernel
mechanism and got highest accuracy for subject 4 and 9.
On the other hand In reference paper [20] author proposed
Subject Accuracy(%) Precision(%) Sensitivity(%) Specificity(%)
1 93.36 93.91 93.10 93.64
2 63.33 68.38 61.29 65.61
3 94.7 93.33 96.55 92.86
4 96.68 98.21 95.65 98.17
5 84.76 85.46 84.29 83.62
6 93.75 97.39 91.06 97.03
7 97.77 97.35 98.21 97.32
8 91.09 87.90 94.4 87.93
9 95.09 94.74 95.58 94.59
a deep learning approach named CNN-SAE model and got
highest accuracy for subject 4. For all other cases, our model
outperformed those approaches.
Fig. 5. Accuracy and loss curve for subject 4.
From Fig.5, the convergence of model after epoch 16 can
be observed, the training and validation accuracy plots are also
very close, so the model shows no overfitting on training data.
And training loss curve decreases after increasing number of
Dataset Accuracy(%) Precision(%) Sensitivity (%) Specificity (%)
BCI II 89.73 90.09 89.29 90.18
The comparison results for BCI Competition II dataset
III are shown in table VI. In our case, the proposed model
accuracy is 89.73% which is better than reference papers.
Dataset Accuracy (%)
[Proposed method]
Accuracy (%)
Accuracy (%)
BCI II 89.73 90.0 88.2
The result shows that, the proposed method is more reliable
and robust for motor imagery EEG classification.
Subject Accuracy (%)
[Proposed method] Accuracy (%) [19] Accuracy (%) [11]
1 93.08 78.1 75.94
2 63.33 63.1 61.79
3 94.74 60.6 61.25
4 96.88 95.6 95.63
5 84.76 78.1 93.75
6 93.75 73.8 84.38
7 97.76 70 77.81
8 91.09 70.6 91.25
9 95.09 82.5 87.19
In this work, we proposed multi-input CNN to classify
multi-channel EEG Motor Imagery Signals which is an im-
portant and novel approach towards Brain Computer Interface
Research. Our method uses computer vision techniques in time
series analysis and showed good performance. Here, we use
both the time-domain and frequency domain information from
the EEG as input. As, STFT uses coarse representation of the
EEG, we can control the resolution to scale the network. As,
increasing sampling rate increases the length of input EEG
signals, taking the STFT will be helpful. STFT was utilized
to generate images from time-series multi EEG data. Aug-
mentation was useful to increase number of training samples
which is mandatory to train Convolutional Neural Network in
our case due to data scarcity. Finally, the model was evaluated
on data from multiple subjects and showed good performance.
This work will be useful in Brain Computer Interface and time
series analysis research.
[1] B. Graimann, B. Allison, and G. Pfurtscheller, “Brain–
computer interfaces: A gentle introduction,” in Brain-
computer interfaces. Springer, 2009, pp. 1–27.
[2] A. Schl¨
ogl, F. Lee, H. Bischof, and G. Pfurtscheller,
“Characterization of four-class motor imagery eeg data
for the bci-competition 2005,” Journal of neural engi-
neering, vol. 2, no. 4, p. L14, 2005.
[3] A. P´
erez-Zapata, A. F. Cardona-Escobar, J. A. Jaramillo-
on, and G. M. D´
ıaz, “Deep convolutional neural
networks and power spectral density features for motor
imagery classification of eeg signals,” in International
Conference on Augmented Cognition. Springer, 2018,
pp. 158–169.
[4] P. Bashivan, I. Rish, M. Yeasin, and N. Codella,
“Learning representations from eeg with deep
recurrent-convolutional neural networks,arXiv preprint
arXiv:1511.06448, 2015.
[5] B. Hatipoglu, C. M. Yilmaz, and C. Kose, “A signal-
to-image transformation approach for eeg and meg sig-
nal classification,” Signal, Image and Video Processing,
vol. 13, no. 3, pp. 483–490, 2019.
[6] C. Kim, J. Sun, D. Liu, Q. Wang, and S. Paek, “An
effective feature extraction method by power spectral
density of eeg signal for 2-class motor imagery-based
bci,” Medical & biological engineering & computing,
vol. 56, no. 9, pp. 1645–1658, 2018.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,
in Advances in neural information processing systems,
2012, pp. 1097–1105.
[8] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Se-
tio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak,
B. Van Ginneken, and C. I. S´
anchez, “A survey on
deep learning in medical image analysis,” Medical image
analysis, vol. 42, pp. 60–88, 2017.
[9] Z. Al Nazi and T. A. Abir, “Automatic skin lesion
segmentation and melanoma detection: Transfer learning
approach with u-net and dcnn-svm,” in Proceedings of
International Joint Conference on Computational Intel-
ligence. Springer, 2020, pp. 371–381.
[10] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Gold-
berger, and H. Greenspan, “Gan-based synthetic medical
image augmentation for increased cnn performance in
liver lesion classification,Neurocomputing, vol. 321, pp.
321–331, 2018.
[11] M. D. Bloice, C. Stocker, and A. Holzinger, “Augmentor:
an image augmentation library for machine learning,”
arXiv preprint arXiv:1708.04680, 2017.
[12] F. Wang, S.-h. Zhong, J. Peng, J. Jiang, and Y. Liu, “Data
augmentation for eeg-based emotion recognition with
deep convolutional neural networks,” in International
Conference on Multimedia Modeling. Springer, 2018,
pp. 82–93.
[13] M. D. Bloice, P. M. Roth, and A. Holzinger, “Biomedical
image augmentation using augmentor,Bioinformatics,
[14] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai,
T. Liu, X. Wang, G. Wang, J. Cai et al., “Recent advances
in convolutional neural networks,Pattern Recognition,
vol. 77, pp. 354–377, 2018.
[15] E. Nurse, B. S. Mashford, A. J. Yepes, I. Kiral-Kornek,
S. Harrer, and D. R. Freestone, “Decoding eeg and
lfp signals using deep learning: heading truenorth,” in
Proceedings of the ACM International Conference on
Computing Frontiers. ACM, 2016, pp. 259–266.
[16] [Online]. Available:
[17] [Online]. Available:
[18] F. Chollet et al., “Keras,” 2015.
[19] Y. R. Tabar and U. Halici, “A novel deep learning
approach for classification of eeg motor imagery signals,
Journal of neural engineering, vol. 14, no. 1, p. 016003,
[20] V. P. Oikonomou, S. Nikolopoulos, P. Petrantonakis,
and I. Kompatsiaris, “Sparse kernel machines for motor
imagery eeg classification,” in 2018 40th Annual Interna-
tional Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC). IEEE, 2018, pp. 207–210.
... Shovon et. al utilized STFT images of three EEG channels as inputs to a multi-input CNN model for motor imagery EEG signal classification [31]. Also, some works have done for EEG classification of depressed patients from healthy subjects using CNNs [6,8,[32][33][34]. ...
Detection of mental disorders such as schizophrenia (SZ) through investigating brain activities recorded via Electroencephalogram (EEG) signals is a promising field in neuroscience. This study presents a hybrid brain effective connectivity and deep learning framework for SZ detection on multichannel EEG signals. First, the effective connectivity matrix is measured based on the Transfer Entropy (TE) method that estimates directed causalities in terms of brain information flow from 19 EEG channels for each subject. Then, TE effective connectivity elements were represented by colors and formed a 19 × 19 connectivity image which, simultaneously, represents the time and spatial information of EEG signals. Created images are used to be fed into the five pre-trained Convolutional Neural Networks (CNN) models named VGG-16, ResNet50V2, InceptionV3, EfficientNetB0, and DenseNet121 as Transfer Learning (TL) models. Finally, deep features from these TL models equipped with the Long Short-Term Memory (LSTM) model for the extraction of most discriminative spatiotemporal features are used to classify 14 SZ patients from 14 healthy controls. Results show that the hybrid framework of pre-trained CNN-LSTM models achieved higher accuracy than pre-trained CNN models. The highest average accuracy and F1-score were achieved using the EfficientNetB0-LSTM model through the 10-fold cross-validation method equal to 99.90% and 99.93%, respectively. Therefore, the superior performance of the hybrid framework of brain effective connectivity images from EEG signals and pre-trained CNN-LSTM models show that the proposed method is highly capable of detecting SZ patients from healthy controls.
... This is not readily apparent from only applying the FFT to the entire time-domain signal, as this gives one set of components that are not time dependent [23,24]. Hence, the STFT method is commonly used in feature extraction of a vibration signal [25][26][27]. A time-domain signal is converted into a time-frequency-domain signal in the STFT method. ...
Full-text available
Since artificial intelligence (AI) was introduced into engineering fields, it has made many breakthroughs. Machine learning (ML) algorithms have been very commonly used in structural health monitoring (SHM) systems in the last decade. In this study, a vibration-based early stage of bolt loosening detection and identification technique is proposed using ML algorithms, for a motor fastened with four bolts (M8×1.5) to a stationary support. First, several cases with fastened and loosened bolts were established, and the motor was operated in three different types of working condition (800 rpm, 1000 rpm, and 1200 rpm), in order to obtain enough vibration data. Second, for feature extraction of the dataset, the short-time Fourier transform (STFT) method was performed. Third, different types of classifier of ML were trained, and a new test dataset was applied to evaluate the performance of the classifiers. Finally, the classifier with the greatest accuracy was identified. The test results showed that the capability of the classifier was satisfactory for detecting bolt loosening and identifying which bolt or bolts started to lose their preload in each working condition. The identified classifier will be implemented for online monitoring of the early stage of bolt loosening of a multi-bolt structure in future works.
... For instance, in biopsy examinations, brain image analyses and other image processing tasks, transfer learning has been shown to significantly improve prediction results. The popular deep learning pre-trained networks such as the VGG networks, the Inception networks and the Residual networks [92][93][94] have been used for knowledge transfer and have been adapted for EEG processing [95][96][97]. Transfer learning offers significant gain in time for the calibration of the system. Typically, training a model with huge amounts of data can be timeexhausting, whereas the use of pre-trained networks significantly reduces training time. ...
Conference Paper
Full-text available
There is the need for enhanced processing techniques that aid the development of Brain-Computer Interfaces (BCIs), considering their wide use for communication and control. Several paradigms exist for developing BCIs. One of such is motor imagery (MI). MI-based BCIs have been implemented in a variety of ways. Key factors such as the device type, task paradigm, preprocessing, feature extraction and selection and classification techniques, must be properly considered in building BCIs. Also, factors such as the task at hand, target population, processing rate and usability of the online system must be considered. Considering this need, this review presents a summary of the existing techniques for motor imagery classification, stating common trends and challenges facing MI studies, with potential improvements that might be seen. Specifically, the review focuses on electroencephalography (EEG)-based MI BCIs, with works sampled over a wide range of time.
... The multi-channel acquisition method provides the spatial features. In this study, the motor imagery data with 2s of subjects is selected for STFT analysis [27]. The frequency bands of µ rhythm and β rhythm are 6~13 Hz and 17~30 Hz respectively. ...
In order to solve the problems of weak generalization ability and low classification accuracy in motor imagery EEG signal classification, this paper proposes a channel space weighted fusion-oriented feature pyramid network for motor imagery EEG signal recognition. First, the short-time Fourier transform is used to obtain the EEG time-frequency map. Then, it builds a new feature pyramid network(FPN). The attention mechanism module is integrated into the FPN module, and the channel spatial weighted fusion-oriented feature pyramid network is proposed. This new structure can not only learn the weight of important channel features in the feature map, but also learn the representation of important feature areas in the network layers. Meanwhile, Skip-FPN module is added into the network structure, which fuses more details of EEG signals through short connections. The Dropout layer is added to prevent network training from over-fitting. In the classification model, we improve the AdaBoost algorithm to automatically update the base learner according to the classification error rate. Finally, the proposed model is used to classify the test data and the Kappa value is used as the evaluation index. Compared with the state-of-the-art motor image EEG signal recognition methods, the proposed method achieves better performance on the BCI Competition IV 2b data set. It has good generalization ability and can improve the classification effect.
When deep learning techniques are introduced for Motor Imagery(MI) EEG signal classification, a multitude of state-of-the-art models, cannot be trained effectively because of the relatively small datasets. Proposing a model specialized for MI EEG signals classification plays a prominent role in promoting the combination of deep learning technology and MI EEG signal classification. In this paper, a novel Lightweight Feature Fusion Network(LFANN) based on an improved attention mechanism and tensor decomposition approach has been introduced. The proposed algorithm has been evaluated on a public benchmark dataset from BCI Competition IV, and the original dataset has been augmented with Enhance-Super-Resolution Generative Adversarial Network(ESRGAN). The experimental results demonstrate that the average accuracy of 91.58% and the average Kappa value of 0.881 can be achieved through the proposed algorithm. Furthermore, the compressed LAFFN, whose parameters have been compressed nearly ten times, creates no significant difference in performance compared to LAFFN. The investigation carried out through this experiment has provided novel insights into the classification research for MI EEG signals.
Full-text available
In modern Human-Robot Interaction, much thought has been given to accessibility regarding robotic locomotion, specifically the enhancement of awareness and lowering of cognitive load. On the other hand, with social Human-Robot Interaction considered, published research is far sparser given that the problem is less explored than pathfinding and locomotion. This thesis studies how one can endow a robot with affective perception for social awareness in verbal and non-verbal communication. This is possible by the creation of a Human-Robot Interaction framework which abstracts machine learning and artificial intelligence technologies which allow for further accessibility to non-technical users compared to the current State-of-the-Art in the field. These studies thus initially focus on individual robotic abilities in the verbal, non-verbal and multimodality domains. Multimodality studies show that late data fusion of image and sound can improve environment recognition, and similarly that late fusion of Leap Motion Controller and image data can improve sign language recognition ability. To alleviate several of the open issues currently faced by researchers in the field, guidelines are reviewed from the relevant literature and met by the design and structure of the framework that this thesis ultimately presents. The framework recognises a user's request for a task through a chatbot-like architecture. Through research in this thesis that recognises human data augmentation (paraphrasing) and subsequent classification via language transformers, the robot's more advanced Natural Language Processing abilities allow for a wider range of recognised inputs. That is, as examples show, phrases that could be expected to be uttered during a natural human-human interaction are easily recognised by the robot. This allows for accessibility to robotics without the need to physically interact with a computer or write any code, with only the ability of natural interaction (an ability which most humans have) required for access to all the modular machine learning and artificial intelligence technologies embedded within the architecture. Following the research on individual abilities, this thesis then unifies all of the technologies into a deliberative interaction framework, wherein abilities are accessed from long-term memory modules and short-term memory information such as the user's tasks, sensor data, retrieved models, and finally output information. In addition, algorithms for model improvement are also explored, such as through transfer learning and synthetic data augmentation and so the framework performs autonomous learning to these extents to constantly improve its learning abilities. It is found that transfer learning between electroencephalographic and electromyographic biological signals improves the classification of one another given their slight physical similarities. Transfer learning also aids in environment recognition, when transferring knowledge from virtual environments to the real world. In another example of non-verbal communication, it is found that learning from a scarce dataset of American Sign Language for recognition can be improved by multi-modality transfer learning from hand features and images taken from a larger British Sign Language dataset. Data augmentation is shown to aid in electroencephalographic signal classification by learning from synthetic signals generated by a GPT-2 transformer model, and, in addition, augmenting training with synthetic data also shows improvements when performing speaker recognition from human speech. Given the importance of platform independence due to the growing range of available consumer robots, four use cases are detailed, and examples of behaviour are given by the Pepper, Nao, and Romeo robots as well as a computer terminal. The use cases involve a user requesting their electroencephalographic brainwave data to be classified by simply asking the robot whether or not they are concentrating. In a subsequent use case, the user asks if a given text is positive or negative, to which the robot correctly recognises the task of natural language processing at hand and then classifies the text, this is output and the physical robots react accordingly by showing emotion. The third use case has a request for sign language recognition, to which the robot recognises and thus switches from listening to watching the user communicate with them. The final use case focuses on a request for environment recognition, which has the robot perform multimodality recognition of its surroundings and note them accordingly. The results presented by this thesis show that several of the open issues in the field are alleviated through the technologies within, structuring of, and examples of interaction with the framework. The results also show the achievement of the three main goals set out by the research questions; the endowment of a robot with affective perception and social awareness for verbal and non-verbal communication, whether we can create a Human-Robot Interaction framework to abstract machine learning and artificial intelligence technologies which allow for the accessibility of non-technical users, and, as previously noted, which current issues in the field can be alleviated by the framework presented and to what extent.
Full-text available
Classification of electroencephalogram (EEG) is a key approach to measure the rhythmic oscillations of neural activity, which is one of the core technologies of brain-computer interface systems (BCIs). However, extraction of the features from non-linear and non-stationary EEG signals is still a challenging task in current algorithms. With the development of artificial intelligence, various advanced algorithms have been proposed for signal classification in recent years. Among them, deep neural networks (DNNs) have become the most attractive type of method due to their end-to-end structure and powerful ability of automatic feature extraction. However, it is difficult to collect large-scale datasets in practical applications of BCIs, which may lead to overfitting or weak generalizability of the classifier. To address these issues, a promising technique has been proposed to improve the performance of the decoding model based on data augmentation (DA). In this article, we investigate recent studies and development of various DA strategies for EEG classification based on DNNs. The review consists of three parts: what kind of paradigms of EEG-based on BCIs are used, what types of DA methods are adopted to improve the DNN models, and what kind of accuracy can be obtained. Our survey summarizes the current practices and performance outcomes that aim to promote or guide the deployment of DA to EEG classification in future research and development.
Background Recently, convolutional neural networks (CNN) are widely applied in motor imagery electroencephalography (MI-EEG) signal classification tasks. However, a simple CNN framework is challenging to satisfy the complex MI-EEG signal decoding. New method In this study, we propose a multiscale Siamese convolutional neural network with cross-channel fusion (MSCCF-Net) for MI-EEG classification tasks. The proposed network consists of three parts: Siamese cross-channel fusion streams, similarity module and classification module. Each Siamese cross-channel fusion stream contains multiple branches, and each branch is supplemented by cross-channel fusion modules to improve multiscale temporal feature representation capability. The similarity module is adopted to measure the feature similarity between multiple branches. At the same time, the classification module provides a strong constraint to classify the features from all Siamese cross-channel fusion streams. The combination of the similarity module and classification module constitutes a new joint training strategy to further optimize the network performance. Results The experiment is conducted on the public BCI Competition IV 2a and 2b datasets, and the results show that the proposed network achieves an average accuracy of 87.36% and 87.33%, respectively. Comparison with existing methods and Conclusions The proposed network adopts cross-channel fusion to learn multiscale temporal characteristics and joint training strategy to optimize the training process. Therefore, the performance outperforms other state-of-the-art MI-EEG signal classification methods.
Conference Paper
Full-text available
Industrial pollution resulting in ozone layer depletion has influenced increased UV radiation in recent years which is a major environmental risk factor for invasive skin cancer Melanoma and other keratinocyte cancers. The incidence of deaths from Melanoma has risen worldwide in past two decades. Deep learning has been employed successfully for dermatologic diagnosis. In this work, we present a deep learning based scheme to automatically segment skin lesions and detect melanoma from dermoscopy images. U-Net was used for segmenting out the lesion from surrounding skin. The limitation of utilizing deep neural networks with limited medical data was solved with data augmentation and transfer learning. In our experiments, U-Net was used with spatial dropout to solve the problem of overfitting and different augmentation effects were applied on the training images to increase data samples. The model was evaluated on two different datasets. It achieved a mean dice score of 0.87 and a mean jaccard index of 0.80 on ISIC 2018 dataset. The trained model was assessed on PH² dataset where it achieved a mean dice score of 0.93 and a mean jaccard index of 0.87 with transfer learning. For classification of malignant melanoma, a DCNN-SVM model was used where we compared state of the art deep nets as feature extractors to find the applicability of transfer learning in dermatologic diagnosis domain. Our best model achieved a mean accuracy of 92% on PH² dataset. The findings of this study is expected to be useful in cancer diagnosis research.
Conference Paper
Full-text available
Brain-computer interfaces (BCIs) make humancomputer interaction more natural, especially for people with neuro-muscular disabilities. Among various data acquisition modalities the electroencephalograms (EEG) occupy the most prominent place due to their non-invasiveness. In this work, a method based on sparse kernel machines is proposed for the classification of motor imagery (MI) EEG data. More specifically, a new sparse prior is proposed for the selection of the most important information and the estimation of model parameters is performed using the bayesian framework. The experimental results obtained on a benchmarking EEG dataset for MI, have shown that the proposed method compares favorably with state of the art approaches in BCI literature.
Full-text available
The classification of magnetic signals has become one of the challenging research problems in brain computer interfaces (BCIs). Magnetic signals, as measured with electroencephalography (EEG) and magnetoencephalography (MEG), contain lots of additional information on the bioelectrical activity of the brain. In this paper, we propose a simple transformation method that utilises signal-to-image conversion. This conversion is a kind of finite amplitude frequency transformation based on the changing points of the signals. In other words, arbitrary time domain signals are converted to two-dimensional finite images, which are then used in the classification of the signals. In feature extraction, the Harris corner detector and scale-invariant feature transform are combined with the bag of visual words, and these features are then classified by using the k-nearest neighbour algorithm. To confirm the validity of the proposed method, experiments are conducted on the BCI Competition 2003 Datasets Ia and BCI Competition 2008 Dataset III. The classification accuracy of the proposed method is over 96.21% for Dataset Ia and 78.99% for Dataset III. It is apparent from the results that the EEG and MEG signals are quite successfully classified by employing the proposed method.
Full-text available
Deep learning methods, and in particular convolutional neural networks (CNNs), have led to an enormous breakthrough in a wide range of computer vision tasks, primarily by using large-scale annotated datasets. However, obtaining such datasets in the medical domain remains a challenge. In this paper, we present methods for generating synthetic medical images using recently presented deep learning Generative Adversarial Networks (GANs). Furthermore, we show that generated medical images can be used for synthetic data augmentation, and improve the performance of CNN for medical image classification. Our novel method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). We first exploit GAN architectures for synthesizing high quality liver lesion ROIs. Then we present a novel scheme for liver lesion classification using CNN. Finally, we train the CNN using classic data augmentation and our synthetic data augmentation and compare performance. In addition, we explore the quality of our synthesized examples using visualization and expert assessment. The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results increased to 85.7% sensitivity and 92.4% specificity. We believe that this approach to synthetic data augmentation can generalize to other medical classification applications and thus support radiologists' efforts to improve diagnosis.
Full-text available
The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package, available in both Python and Julia versions, that provides a high level API for the expansion of image data using a stochastic, pipeline-based approach which effectively allows for images to be sampled from a distribution of augmented images at runtime. Augmentor provides methods for most standard augmentation practices as well as several advanced features such as label-preserving, randomised elastic distortions, and provides many helper functions for typical augmentation tasks used in machine learning.
Motivation: Image augmentation is a frequently used technique in computer vision and has been seeing increased interest since the popularity of deep learning. Its usefulness is becoming more and more recognized due to deep neural networks requiring larger amounts of data to train, and because in certain fields, such as biomedical imaging, large amounts of labelled data are difficult to come by or expensive to produce. In biomedical imaging, features specific to this domain need to be addressed. Results: Here we present the Augmentor software package for image augmentation. It provides a stochastic, pipeline-based approach to image augmentation with a number of features that are relevant to biomedical imaging, such as z-stack augmentation and randomized elastic distortions. The software has been designed to be highly extensible meaning an operation that might be specific to a highly specialized task can easily be added to the library, even at runtime. Although it has been designed as a general software library, it has features that are particularly relevant to biomedical imaging and the techniques required for this domain. Availability and implementation: Augmentor is a Python package made available under the terms of the MIT licence. Source code can be found on GitHub under and installation is via the pip package manager (A Julia version of the package, developed in parallel by Christof Stocker, is also available under
A Brain-Computer Interface (BCI) is a communication and control system that attempts to provide real-time interaction between a user and a computer device, based on the brain electrical signals that are generated when user imagine specific movements or actions. For doing so, classification models are developed to identify the user movement intention according to specific signal features. This paper presents a classification model to BCI that is based on the processing of Electroencephalography (EEG) signals. The power spectral density (PSD) representation of EEG signals is used for training a deep Convolutional Neural Network (CNN) that is able to differentiate among four different movement intentions: left-hand movement, right-hand movement, feet movement, and tongue movement. Performance evaluation results reported a mean accuracy of \(0.8797 \pm 0.0296\) for the well-known BCI Competition IV Dataset 2a, which outperform state-of-the-art approaches.
EEG signals have weak intensity, low signal-to-noise ratio, non-stationary, non-linear, time-frequency-spatial characteristics. Therefore, it is important to extract adaptive and robust features that reflect time, frequency and spatial characteristics. This paper proposes an effective feature extraction method WDPSD (feature extraction from the Weighted Difference of Power Spectral Density in an optimal channel couple) that can reflect time, frequency and spatial characteristics for 2-class motor imagery-based BCI system. In the WDPSD method, firstly, Power Spectral Density (PSD) matrices of EEG signals are calculated in all channels, and an optimal channel couple is selected from all possible channel couples by checking non-stationary and class separability, and then a weight matrix which reflects non-stationary of PSD difference matrix in selected channel couple is calculated; finally, the robust and adaptive features are extracted from the PSD difference matrix weighted by the weight matrix. The proposed method is evaluated from EEG signals of BCI Competition IV Dataset 2a and Dataset 2b. The experimental results show a good classification accuracy in single session, session-to-session, and the different types of 2-class motor imagery for different subjects.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry