ArticlePDF Available

Abstract and Figures

In this study, we present a transfer learning method for gesture classification via an inductive and supervised transductive approach with an electromyographic dataset gathered via the Myo armband. A ternary gesture classification problem is presented by states of ’thumbs up’, ’thumbs down’, and ’relax’ in order to communicate in the affirmative or negative in a non-verbal fashion to a machine. Of the nine statistical learning paradigms benchmarked over 10-fold cross validation (with three methods of feature selection), an ensemble of Random Forest and Support Vector Machine through voting achieves the best score of 91.74% with a rule-based feature selection method. When new subjects are considered, this machine learning approach fails to generalise new data, and thus the processes of Inductive and Supervised Transductive Transfer Learning are introduced with a short calibration exercise (15 s). Failure of generalisation shows that 5 s of data per-class is the strongest for classification (versus one through seven seconds) with only an accuracy of 55%, but when a short 5 s per class calibration task is introduced via the suggested transfer method, a Random Forest can then classify unseen data from the calibrated subject at an accuracy of around 97%, outperforming the 83% accuracy boasted by the proprietary Myo system. Finally, a preliminary application is presented through social interaction with a humanoid Pepper robot, where the use of our approach and a most-common-class metaclassifier achieves 100% accuracy for all trials of a ‘20 Questions’ game.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
Journal of Ambient Intelligence and Humanized Computing (2020) 11:6021–6031
https://doi.org/10.1007/s12652-020-01852-z
ORIGINAL RESEARCH
Thumbs up, thumbs down: non‑verbal human‑robot interaction
throughreal‑time EMG classication viainductive andsupervised
transductive transfer learning
JhonatanKobylarz1· JordanJ.Bird2· DiegoR.Faria2· EduardoParenteRibeiro1· AnikóEkárt2
Received: 11 October 2019 / Accepted: 27 February 2020 / Published online: 7 March 2020
© The Author(s) 2020
Abstract
In this study, we present a transfer learning method for gesture classification via an inductive and supervised transductive
approach with an electromyographic dataset gathered via the Myo armband. A ternary gesture classification problem is
presented by states of ’thumbs up’, ’thumbs down’, and ’relax’ in order to communicate in the affirmative or negative in a
non-verbal fashion to a machine. Of the nine statistical learning paradigms benchmarked over 10-fold cross validation (with
three methods of feature selection), an ensemble of Random Forest and Support Vector Machine through voting achieves the
best score of 91.74% with a rule-based feature selection method. When new subjects are considered, this machine learning
approach fails to generalise new data, and thus the processes of Inductive and Supervised Transductive Transfer Learning are
introduced with a short calibration exercise (15 s). Failure of generalisation shows that 5 s of data per-class is the strongest
for classification (versus one through seven seconds) with only an accuracy of 55%, but when a short 5 s per class calibra-
tion task is introduced via the suggested transfer method, a Random Forest can then classify unseen data from the calibrated
subject at an accuracy of around 97%, outperforming the 83% accuracy boasted by the proprietary Myo system. Finally, a
preliminary application is presented through social interaction with a humanoid Pepper robot, where the use of our approach
and a most-common-class metaclassifier achieves 100% accuracy for all trials of a ‘20 Questions’ game.
Keywords Gesture classification· Human-robot interaction· Electromyography· Machine learning· Transfer learning·
Inductive transfer learning· Supervised transductive transfer Learning· Myo armband· Pepper robot
1 Introduction
Within a social context, the current state of Human-Robot
Interaction is arguably most often concerned with the
domain of verbal, spoken communication. That is, the tran-
scription of spoken language to text, and further Natural
Language Processing (NLP) in order to extract meaning;
this framework is oftentimes multi-modally combined with
other data, such as the tone of voice, which too carries useful
information. With this in mind, a recent National GP Survey
carried out in the United Kingdom found that 125,000 adults
and 20,000 children had the ability to converse in British
Sign Language (BSL)(Ipsos 2016), and of those surveyed,
15,000 people reported it as their primary language. With
those statistics in mind, this shows that those 15,000 people
only have the ability to directly converse with approximately
0.22% of the UK population. This argues for the importance
of non-verbal communication, such as through gesture.
Jhonatan Kobylarz and Jordan J. Bird are co-first authors.
* Jordan J. Bird
birdj1@aston.ac.uk
Jhonatan Kobylarz
jhonatankobylarz@gmail.com
Diego R. Faria
d.faria@aston.ac.uk
Eduardo Parente Ribeiro
edu@eletrica.ufpr.br
Anikó Ekárt
a.ekart@aston.ac.uk
1 Department ofElectrical Engineering, Federal University
ofParana, Curitiba, Brazil
2 School ofEngineering andApplied Science, Aston
University, Birmingham, UK
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6022 J.Kobylarz et al.
1 3
To answer in the affirmative, negative, or to not answer
at all are three very important responses when it comes to
meaningful conversation, especially in a goal-based sce-
nario. In this study, a ternary classification experiment is
performed towards the domain of non-verbal communication
with robots; the electromyographic signals produced when
performing a thumbs up, thumbs down, and resting state
with either the left or right arms are considered, and statis-
tical classification techniques are benchmarked in terms of
validation, generalisation to new data, and transfer learning
to better generalise to new data in order to increase reli-
ability to within the realms of classical speech recognition.
That is, to reach interchangeable accuracies between the two
domains and thus enable those who do not have the ability of
speech to effectively communicate with machines.
The main contributions of this work are as follows:
An original dataset is collected from five subjects for
three-class gesture classification.1 A ternary classifica-
tion problem is thus presented; thumbs up, thumbs down,
and relaxed.
A feature extraction process retrieved from previous work
is used to extract features from electromyographic waves,
the process prior to this has only been explored in elec-
troencephalography (EEG) and in this work is adapted
for electromyographic gesture classification.2
Multiple feature selection algorithms and statistical/
ensemble classifiers are benchmarked in order to derive
a best statistical classifier for the ground truth data.
Multiple best-performing models attempt to predict new
and unseen data towards the exploration of generalisa-
tion, which ultimately fails. Findings during this experi-
ment show that 15 s (5 s per class) performs considerably
better than 3, 6, 9, 12, 18, and 21 s of data. Model gener-
alisation only slightly outperforms random guessing.
Failure of generalisation is then remedied through the
suggestion of a calibration framework via inductive and
supervised transductive transfer learning. Inspired by
the findings of the experiment described in the previous
point, models are then able to reach extremely high clas-
sification ability on further unseen data presented post-
calibration. Findings show that although a confidence-
weighted Vote of Random Forest and Support Vector
Machine performed better on the original, full dataset,
the Random Forest alone outperforms this method for
calibration and classification of unseen data (97% vs.
95.7% respectively).
Finally, a real-time application of the work is preliminary
explored. Social interaction is enabled with a humanoid
robot (Softbank’s Pepper) in the form of a game, through
gestural interaction and subsequent EMG classification
of the gestures in order to answer yes/no questions while
playing 20 Questions.
In order to present the aforementioned findings in a struc-
tured manner, exploration and results are presented in chron-
ological order, since a failed generalisation experiment is
then remedied with the aid of the findings through limita-
tion. The remainder of this article is structured as follows:
firstly, important state-of-the-art work within the field of
gesture recognition and electromyography are presented
in Sect.2, along with important background information
regarding Feature Selection and Machine Learning tech-
niques explored within this study. Section3 then outlines
the processes followed towards dataset acquisition, feature
extraction, experimental methodologies, as well as important
hyperparameters and hardware information required for rep-
licability of the experiments. Results and discussion are then
presented in Sect.4, followed by a preliminary application
of the findings in Sect.5. Finally, possible future works are
discussed in Sect.6 with regards to the limitations of this
work and a final conclusion of the findings presented.
2 Background
In this section, state-of-the-art literature in electromyo-
graphic gesture classification are considered. Additionally,
a short overview of the statistical techniques are given.
Fig. 1 The MYO EMG Armband (Thalmic Labs)
1 Available online, https ://www.kaggl e.com/birdy 654/emg-gestu re-
class ifica tion-thumb s-up-and-down/ Last Accessed: 25/02/2020.
2 Available online, https ://githu b.com/jorda n-bird/eeg-featu re-gener
ation / Last Accessed: 25/02/2020.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6023
Thumbs up, thumbs down: non-verbal human-robot interaction throughreal-time EMG…
1 3
2.1 EMG gesture classication andcalibration
The MYO Armband, as shown in Fig.1, is a device com-
prised of 8 electrodes ergonomically designed to read
electromyographic data from on and around the arm by an
embedded chip within the device. Researchers have noted
the MYO’s quality as well as its ease of availability to both
researchers and consumers(Rawat etal. 2016), and is thus
recognised as having great potential in EMG-signal based
experiments. In this section, notable state-of-the-art litera-
ture is presented within which the MYO armband has suc-
cesfully provided EMG data for experimentation.
The Myo Armband was found to be accurate enough to
control a robotic arm with 6 Degrees of Freedom (DoF)
with similar speed and precision to the controlling subject’s
movements(Widodo etal. 2018). In this work, researchers
found an effective method of classification through the train-
ing of a novel Convolutional Neural Network (CNN) archi-
tecture at a mean accuracy of 97.81%. A related study, also
performing classification with CNN succesfully classified
9 physical movements from 9 subjects at a mean accuracy
of 94.18%(Mendez etal. 2017); it must be noted, that in
this work, the model was not tested for generalisation abil-
ity. This has shown to be important in this study, since the
strongest method for classification of the dataset was ulti-
mately weaker than another model when it came to transfer
of ability to unseen data.
Researchers have noted that gesture classification with
Myo has real-world application and benefits(Kaur etal.
2016), showing that physiotherapy patients often exhibit
much higher levels of satisfaction when interfacing via EMG
and receiving digital feedback(Sathiyanarayanan and Rajan
2016). Likewise in the medical field, Myo has shown to be
competitively effective with far more expensive methods
of non-invasive electromyography in the rehabilitation of
amputation patients(Abduo and Galster 2015), and follow-
ing this, much work has explored the application of gesture
classification for the control of a robotic hand(Ganiev etal.
2016; Tatarian etal. 2018). Since the armband is worn on
the lower arm, the goal of the robotic hand is to be teleoper-
ated by non-amputees and likewise to be operated by ampu-
tation patients in place of the amputated hand. Work from
the United States has also shown that EMG classification is
useful for exercises designed to strengthen the glenohumeral
muscles towards rehabilitation in Baseball(Townsend etal.
1991).
Recently, work in Brazilian Sign Language classifica-
tion via the Myo armband found high classification ability
of results through a Support Vector Machine on a 20-class
problem(Abreu etal. 2016). Researchers noted substantial
limitations’ in the form of realtime classification applica-
tion and generalisation, with models performing sub-par on
unseen data. For example, letters A, T, and U had worthless
classification abilities of 4%, 4%, and 5% respectively. This
work aims to set out to both train models, and also explore
methods of generalisation to new, unseen data in real-time.
The Myo armband’s proprietary framework, through a short
exercise, boasts up to an 83% real-time classification abil-
ity. Although seemingly relatively high, this margin of error
that is a statistical risk in 17% of cases prevents the Myo
from being deployed insituations where such a rate of error
is unacceptable and considered critical. Though it may be
considered acceptable to possibly miscommunicate 17% of
the time in sign language dictation, this error rate would
unacceptable, for example, for the control of a drone where
a physical risk is presented. Thus, the goal of many works is
to improve this ability. In terms of real-time classification,
there are limited works, and many of them suggest a system
of calibration during short exercises (similarly to the Myo
framework) in order to fine-tune a Machine Learning model.
In (Benalcázar etal. 2017), authors suggested a solution
of a ten second exercise (5, 2 s activities) in order to gain
89.5% real-time classification accuracy. This was performed
through K-Nearest Neighbour (KNN) and the Dynamic Time
Warping (DTW) algorithms. EMG has also been applied to
other bodily surfaces for classification, for example, to the
face in order to classify emotional response based on mus-
cular activity(Tan etal. 2012).
In 2017, researchers found that certain early layers of a
CNN could be applied to unseen subjects when further train-
ing is performed on subsequent layers of the network on new
subject data(Côté-Allard etal. 2019). This study showed not
only that a physical task (’pick up the cube) could be com-
pleted on average in less time than with joystick hardware,
but that the transfer learning process allowed for 97.81%
classification accuracy of the EMG data produced by the
movements of 17 individual subjects. It must be noted,
that this deep learning technique (along with some afore-
mentioned) is heavy in terms of resource usage(Shi etal.
2016), and thus, in this study, classical statistical methods
are explored which require far fewer resources to train and
classify data. This paradigm is followed in order to allow
autonomous machines (usually operating a single CPU) the
ability to perform training, calibration, and classification
without the need for comparatively more expensive GPU
capabilities, or access to a cloud system with similar means.
Discrimination of affirmative and negative responses
in the form of thumbs up and thumbs down was shown to
be possible in a related study(Huang etal. 2015b), within
which the two actions were part of a larger eight-class data-
set which achieved 87.6% on average for four individual
subjects. Linear Discriminant Analysis (LDA) was used to
classify features generated by a sliding window of 200ms
in size with a 50ms overlap technique similar to that fol-
lowed in this work; the features were mean absolute value,
waveform length, zero crossing and sign slope change for the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6024 J.Kobylarz et al.
1 3
EMG itself and mean value and standard deviation observed
by the accelerometer. In (Huang etal. 2015a), researchers
followed a similar process of the classification of minute
thumb movements when using an Android mobile phone.
Results showed that accuracies of 89.2% and 82.9% are
achieved for a subject holding a phone and not holding a
phone respectively when 2 s of EMG data is classified with
a K-Nearest Neighbour (KNN) classification algorithm. A
more recent work explored the preliminary applications of
image enhancement to surface electromyographs show-
ing their potential to improve the classification of muscle
characteristics(ul Islam etal. 2019).
Calibration in the related works, where performed, are
through the process of Inductive Transfer Learning (ITL)
and Supervised Transductive Transfer Learning (STTL).
According to (Pan and Yang 2009) and (Arnold etal. 2007),
ITL is the process satisfied when the source domain labels
are available as well as the target labels, this is leveraged in
the calibration stage, in which the gesture being performed
is known. STTL is the process in which the source domain
labels are available but the target is not, this is the validation
stage in this study, when a calibrated model is benchmarked
on further unknown data during application of a calibrated
model. Transfer learning is the process of knowledge transfer
from one learned task to another(Zhuang etal. 2019), in
this study, it is shown to be difficult to generalise a model
to new subjects and thus application of a model to new data
is considered a task to be solved by transfer learning; trans-
fer learning often shows strong results in the application of
gesture classification in related state-of-the-art works(Liu
etal. 2010; Goussies etal. 2014; Costante etal. 2014; Yang
etal. 2018; Demir etal. 2019).
Numerous open issues arising from this literature review
can be observed, and this is experiment seeks to address
said issues:
1. Often, only one method of Machine Learning is applied,
and thus different statistical techniques are rarely com-
pared as benchmarks on the same dataset.
In this work, many statistical techniques of feature
selection and machine learning are applied in order
to explore the abilities of each in EMG classification.
2. Very little exploration of generalisation has been per-
formed, researchers usually opt to present classification
ability of a dataset and there is a distinct lack of explora-
tion when unseen subjects are concerned. This is impor-
tant for real-world application.
In this work, models attempt to classify data gath-
ered from new subjects and experience failure. This
is further remedied by the suggestion of a short cali-
bration task, in which the generalisaton then succeeds
through the process of inductive transfer learning and
transductive transfer learning.
3. When applications are presented, there is often a lack of
exposition in the real-time results for that application.
In this work, where real-world, real-time applications
are concerned, classification abilities are given at each
step where required. This is important for exploration
of ability, and thus, exploration of areas for future
work.
2.2 Selected feature selection algorithms
Feature selection is the process of reducing a dataset’s
dimensionality in order to reduce the complexities of
machine learning algorithms while still effectively main-
taining effective classification ability(Dash and Liu 1997;
Guyon and Elisseeff 2003). Thus, the main goal of feature
selection is to disregard worthless attributes that have no
bearing on class, and if stricter rules are in place, to also
disregard those with very little classification ability which is
not considered worth their contribution to model complex-
ity. In this section, the chosen feature selection algorithms
employed within this study are described.3
Information Gain is the scoring of an attribute’s classi-
fication ability in regards to comparing a change in entropy
when said attribute is used for classification(Kullback and
Leibler 1951). The entropy measured for a specific attribute
is given as:
That is, the Entropy E is the sum of the probability mass
function of the value p times by its negative logarithm. The
change in entropy (Information Gain) when different attrib-
utes are observed for classification thus allow for scoring
of ability.
Symmetrical Uncertainty is a method of dimensional-
ity reduction by comparison of two attributes in regards
to classification entropy and Information Gain given a
pair(Gel’Fand and Yaglom 1959; Piao etal. 2019). This
allows for comparative scores to be applied to attributes
within the vector. For attributes X and Y, Symmetrical
Uncertainty is given as:
where Entropy E and Information Gain IG are calculated as
previously described.
(1)
E
(s)=−
k
pk×log(pk)
.
(2)
SymmU
(X,Y)=2×
(IG(X|Y))
E(X)+E(Y),
3 For the One Rule Feature Selection process, please see Sect.2.3.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6025
Thumbs up, thumbs down: non-verbal human-robot interaction throughreal-time EMG…
1 3
2.3 Selected machine learning algorithms
A Machine Learning (ML) algorithm, in general terms, is
the process of building an analytical or predictive model
with inspiration from labelled (known) data(Bishop 2006;
Michie etal. 1994). The process of classification is to
develop rules to label unseen (validation) data based on seen
(training) data. This section details the general background
of the learning models selected in this study. A wide range of
models are chosen in order to explore the differing abilities
of multiple statistical techniques.
One Rule classification is an extremely simplistic process
in order to generate a best-fit ruleset based on one attrib-
ute. A single attribute is identified as the best for classifica-
tion, and rules are generated based upon it, that is, effective
splits to disseminate the data object (eg. for an attribute a,
IF
a>10
THEN
Class =Y
, IF
a>10
THEN
Class =Z
)
Decision Trees are tree-like branched data structures,
where at each node, a conditional control statement is used
to provide a rule based on attribute values where an end node
without connections represents a class(Pal 2005). Classifi-
cation follows a process of cascading the data objects from
start to end of the tree and their predicted class is given
as the one reached. Fitness of a tree layout is given as the
entropy within the end nodes and their classified instances4.
A Random Decision Tree (RDT) with parameter K will
select K random attributes at each node and develop split-
ting rules based on them(Prasad etal. 2006). The model is
simple since no pruning is performed and thus an overfitted
tree is produced to classify all input data points, therefore
cross-validation is used to create an average of the best per-
forming random trees, or with a testing set of unseen data.
Support Vector Machines (SVM) classify data points by
optimising a data-dimensional hyperplane to most aptly
separate them, and then classifying based on the distance
vector measured from the hyperplane(Cortes and Vapnik
1995). Optimisation follows the goal of the average mar-
gins between points and the separator to be at the maxi-
mum possible value. Generation of an SVM is performed
through Sequential Minimal Optimisation (SMO), a high-
performing algorithm to generate and implement an SVM
classifier(Platt 1998). To perform this, the large optimi-
sation problem is broken down into smaller sub-problems,
these can then be solved linearly. For multipliers a, reduced
constraints are given as:
(3)
0
a
1
,a
2
C,
y1
,
a
1+
y2
,
a2
=
k,
where there are data classes y and k are the negative of the
sum over the remaining terms of the equality constraint.
Naive Bayes is a probabilistic model given by Bayes’ The-
orem which aims to find the posterior probability for a num-
ber of different hypotheses, then select the hypothesis with
the highest probability. The posterior probability is given by:
Where P(h|d) is the probability of hypothesis h given the
data d, P(d|h) is the probability of data d given that the
hypothesis h is true. P(h) is the probability of hypothesis h
being true and
P(d)=P(d|h)P(h)
is the probability of the
data. The algorithm assumes each probability value as con-
ditionally independent for a given target (ergo naive), cal-
culated as P(d1|h)P(d2|h) and so on. Despite its simplicity,
related work has shown its effectiveness in some complex
problems(Wood etal. 2019), showing that Naive Bayes clas-
sification achieves 96% in negative predicted value with the
Wisconsin breast cancer data set.
Bayesian Networks are graphic probabilistic models that
satisfy the local Markov property, and are used for computa-
tion of probability. This network is a Directed Acyclic Graph
(DAG) in which each edge is a conditional dependency, and
each node corresponds to a unique random variable and is
conditionally independent of its non-descendants. Thus the
probability of an arbitrary event
can be com-
puted as
P
(X)=
k
i=1
P(X
i
X
i
, ..., X
i1)
.
Logistic Regression is a process of symmetric statis-
tics where a numerical value is linked to a probability of
event occurring, ie. the number of driving lessons to pre-
dict pass or fail (Walker and Duncan 1967). In a two class
problem within a dataset containing i number of attrib-
utes and
𝛽
model parameters, the log odds l is derived via
l
=𝛽
0
+
x
i=0
𝛽
i
+x
i
and the odds of an outcome are shown
through
o=b
𝛽0+
x
i=0
𝛽i+x
i
which can be used to predict an
outcome based on previous observation.
Voting allows for multiple trained models to act as an
ensemble through democratic or weighted voting. Each
model will vote on their outcome (prediction) by way of
methods such as simply applying a single vote or voting by
weight of probability experienced from training and valida-
tion. The final decision of the model is the class receiv-
ing the highest number of votes or weighted votes, and is
given as the outcome prediction. A Random Decision Forest
(RDF) is an example of a voting model. A specified number
of n RDTs are generated on randomly selected subsets of the
input data (Bootstrap Aggregation), and produce an overall
prediction by presenting the majority vote(Ho 1995).
(4)
P
(h
|
d)=
P(d|h)P(h)
P(d)
4 For details on Information Gain, please see Sect.2.2.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6026 J.Kobylarz et al.
1 3
3 Method
In this section, the methodology of the experiments in
this study are described. Initially, data is acquired prior to
the generation of a full dataset through feature extraction.
Machine Learning paradigms are then benchmarked on the
dataset, before the exploration of real-time classification of
unseen data.
The experiments performed in this study were executed
on a AMD FX-8520 eight-core processor with a clock speed
of 3.8 GHz. In terms of software, the algorithms are exe-
cuted via the Weka API (implemented in Java). The machine
learning algorithms are validated through a process of k-fold
cross validation, where k is set to 10 folds. The voting pro-
cess is to vote by average probabilities of the models, since
two models are considered and thus a democratic voting
process would result in a tie should the two models disagree.
3.1 Data acquisition
The Myo Armband records EMG data at a rate of 200 Hz
via 8 dry sensors worn on the arm, and it also has a 9-axis
Inertial Measurement Unit (IMU) performing at a sample
rate of 50 Hz. For this study, data acquisition is performed
with 5 subjects, which are three males and two females (aged
22–40). For model generalisation, 4 more subjects ware
taken into account, of which two of them are new subjects
and two are performing the movements again. The gestures
performed were, thumbs up, thumbs down, and resting (a
neutral gesture in which the subject is asked to rest their
hand). For training, 60 s of forearm muscle activity data
was recorded for each arm (two minutes, per subject, per
gesture). In the case of benchmark data, the muscle waves
were recorded in intervals of 1–7 s each.
3.2 Feature extraction
In this study, time series are considered through a sliding
window technique in order to generate statistics and thus
extract features or attributes from the 8-dimensional data.
Related work in biological signal processing argues for the
need of feature extraction prior to data mining(Mendoza-
Palechor etal. 2019; Seo etal. 2019) This is performed due
to wave data being complex and temporal in nature and thus
single points are difficult to classify (since they depend on
both past and future events). The feature extraction process
in this study is based on previous works with electroenceph-
alographic signals(Bird etal. 2018, 2019)5, which have been
noted to bare some similarity to EMG signals(Grosse etal.
2002). A general overview of the process is as follows:
Initially, a sliding window of length 1s at an overlap of
0.5s divides the data into short wave segments.
For each time window, the following is performed:
Considering the full time window, the following statistics
are measured:
The mean and standard deviation of the wave.
The skewness and kurtosis of each signal(Zwillinger
and Kokoska 2000).
The maximum and minimum values.
The sample variances of each signal, plus the sample
covariances of all pairs of waves(Montgomery and
Runger 2010).
The eigenvalues of the covariance matrix(Strang
2006).
The upper triangular elements of the matrix loga-
rithm of the covariance matrix(Chiu etal. 1996).
The magnitude of the frequency components of each
signal by Fast Fourier Transform (FFT)(VanLoan
1992).
The frequency values of the ten most energetic com-
ponents of the FFT, for each signal.
Considering the two 0.5s windows produced due to offset
(overlap of two 1s windows resulting in 0.5s windows):
The change in both the sample means and in the sam-
ple standard deviations between the 1st and 2nd 0.5s
windows.
The change in both the maximum and minimum val-
ues between the first and second 0.5s windows.
Considering the two 0.25 s quarter windows produced
due to offset:
The mean of each each quarter-window.
All paired differences of means between the quarter-
windows.
The maximum (minimum) values of each quarter-
window, plus all paired differences of maximum
(minimum) values between the quarter-windows.
Change in attributes is also treated as a feature, in which
each window is passed the previous extracted value vector
sans maximum, mean, and minimum values of quarter win-
dows. The first window does not receive this vector since no
window preceded it.
Feature extraction thus produced a dataset of 2040
numerical attributes from the 8 electrodes, of which there
are 159 megabytes of data produced from the five subjects.
A minor original contribution is also presented in the form
5 Available online,
https ://githu b.com/jorda n-bird/eeg-featu re-gener ation /
Last Accessed: 25/02/2020
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6027
Thumbs up, thumbs down: non-verbal human-robot interaction throughreal-time EMG…
1 3
of the application of these features to EMG data, since
they have only been shown to be effective thus far in EEG
signal processing.
3.3 Machine learning andbenchmarking
towardsreal‑time classication
Following data acquisition and feature extraction, multiple
ML models are benchmarked in order to compare their
classification abilities on the EMG data. The particularly
strong models are then considered for generalisation and
real-time classification.
In this work, two approaches towards real-time classi-
fication are explored. Small datasets are recorded sequen-
tially from four subjects, varying from lengths of 1 s, from
1 to 7 s per class. These then constitute seven datasets per
person {3,6..21}.
Initially, the best four models observed by the previous
experiments are used to classify these datasets in order
to derive the ideal amount of time that an action must be
observed before the most accurate classification can be
performed.
Following this, a method of calibration through transfer
learning is also explored. The result from the aforemen-
tioned experiment (the ideal amount of observation time)
is taken forward and, for each person, appended to the full
dataset recorded for the classification experiments. Each
of the chosen ML techniques are then retrained and used
to classify further unseen data from said subject.
4 Results
In this section, the preliminary results from the experiments
are given. Firstly, the chosen machine learning techniques are
benchmarked in order to select the most promising method
for the problem presented in this study. Secondly, generalisa-
tion of models to unseen data is benchmarked before a similar
experiment is performed within which transfer learning is lev-
eraged to enable generalisation of models to new data through
calibration to a subject.
4.1 Feature selection andmachine learning
Table1 shows the results of attribute selection performed on
the full dataset of 2040 numerical attributes. One Rule fea-
ture selection found that the majority of attributes held strong
One Rule classification ability, as is often expected(Ali and
Smith 2006). Information Gain and Symmetrical Uncertainty
produced slightly smaller datasets both of 1898, and it must
be noted that the two datasets are comprised of differing
attributes.
In Table2, the full matrix of benchmarking results are pre-
sented. An interesting pattern occurs throughout all datasets,
both reduced and full; an SVM is always the best single classi-
fier, scoring between 87.11 and 87.14%. Additionally, a voting
ensemble of Random Forest and SVM always produce the
strongest classifiers at results of between 91.3 and 91.74%.
Interestingly, the One Rule dataset is slightly less complex
than the full dataset but produces a slightly superior result. The
Information Gain and Symmetrical Uncertainty datasets are far
less complex, and yet are only behind the best One Rule score
by 0.44% and 0.34% respectively. Logistic Regression on the
whole dataset fails due to its high resource requirements, but
is observed to be viable on the datasets that have been reduced.
Table 1 A comparison of the three attribute selection experiments
Note that Scoring methods are Unique and thus not Comparable
between the Three
Method No. attributes
selected
Max score Min score
One rule 2000 64.39 30.51
Information gain 1898 0.62 0.004
Symmetrical uncertainty 1898 0.32 0.003
Table 2 10-fold classification ability of both single and ensemble methods on the datasets
Voting does not include random tree due to the inclusion of random forest
Dataset Single Model Accuracy (%) Ensemble Model Accuracy (%)
OneR RT SVM NB BN LR RF Vote (best two) Vote (best three)
OneR 61.33 74.03 87.14 64.32 69.9 60.76 91.30 91.74 74.67
InfoGain 61.49 75.39 87.11 64.13 69.9 61.45 91.7 91.30 75.13
Symmetrical uncertainty 61.48 74.37 87.11 64.13 69.9 61.55 91.36 91.4 75.16
Whole dataset 61.33 74.09 87.14 64.32 69.9 x 91.3 91.71 74.72
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6028 J.Kobylarz et al.
1 3
4.2 Benchmarking requirements forrealtime
classication
In this section, very short segments of unseen data are
collected from four subjects in order to attempt to apply
the previously generated models to new data. That is, to
experiment on the generalisation ability or lack thereof of
the models on the 5-subject dataset. Generalisation ini-
tially fails, but with the least catastrophic model in mind,
leading the focus to calibration of a ’user’ in ideally short
amounts of time via transfer learning.
When the best model from Table2 is used, the ensemble
vote of average probabilities between a Random Forest and
SVM fails in being able to classify unseen data. Observe
Fig.2, in which 15 s of unseen data performs, on average, in
excess of any other amount of data, but yet still only reaches
a mean classification ability of 55.12% (which is unaccepta-
ble for a ternary classification problem).
In Fig.3, the mean classification ability of other highly
performing models from the previous experiment are given
when unseen data are attemptedly classified. Likewise to the
Vote model observed in Fig.2, generalisation has failed for
all models. Two interesting insights emerge from the failed
experiments; firstly, 15 s of data (5 s per class) most often
leads to the best limited generalisation as opposed to both
shorter and longer experiments. Furthermore, the ability of
the Random Forest can be seen to exceed all of the other
three methods, suggesting that it is superior (albeit limited)
when generalisation is considered.
As previously described, calibration is attempted through
a short experiment. Due to the findings aforementioned, 15 s
of known data (that is, requested during ’setup’) is collected.
36912151
82
1
0
10
20
30
40
50
60
70
80
90
100
Secondsof Data
Classification Accuracy (%)
Subject 1
Subject 2
Subject 3
Subject 4
Fig. 2 Benchmarking of vote (Best Two) model generalisation abil-
ity for unseen data segments per subject, in which generalisation has
failed due to low classification accuracies
36912 15 18 21
45
50
55
60
Seconds of Data
Classification Accuracy (%)
RF
SVM
Vote (RF,SVM, BN)
Vote (RF, SVM)
Fig. 3 Initial pre-calibration mean generalisation ability of models
on unseen data from four subjects in a three-class scenario. Time is
given for total data observed Equally for three classes. Generalisation
has failed
Table 3 Results of the models generalisation ability to 15 s of unseen
data once calibration has been performed
Model Generalisa-
tion Ability
(%)
Single models
OneR 63
RT 91.86
SVM 94
NB 53.35
BN 66.05
LR 90.1
Ensemble models
RF 97
Vote (RF, SVM) 95.7
Vote (RF, SVM, BN) 87.8
Table 4 Confusion matrix for the random forest once calibrated by
the subject for 15 s when used to predict unseen data
Counts have been compiled from all subjects. Class imbalance occurs
in real-time due to bluetooth sampling rate
Prediction Ground Truth
Rest Up Down
300 0 1 Rest
0 324 1 Up
0 19 376 Down
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6029
Thumbs up, thumbs down: non-verbal human-robot interaction throughreal-time EMG…
1 3
These labelled data are then added to the training data, in
order to expand knowledge at a personal level. Once this is
performed, and the models are trained, they are then bench-
marked with a further unseen dataset of 15 s of data, again,
5 s per class. No further training of models are performed,
and they simply attempt to classify this unseen data. Table3
shows the abilities of all previously benchmarked models
once the short calibration process is followed, with far
greater success than observed in the previous failed experi-
ments, where those previous were benchmarked. As was
conjectured from said failed experiments, the Random For-
est showed to be the most successful calibration experiment
for generalisation towards a new subject. The error matrix
for the best model is given in Table4. The most difficult task
was the prediction of ’thumbs down’, which, when a subject
had a particularly smaller arm would sometimes be classi-
fied as a resting state. Observed errors are extremely low,
and thus future work to explore this is suggested in Sect.6.
5 Applications inhuman‑robot interaction
In this section, an application of the framework is presented
in a HRI context. The Random Forest model observed to be
the best model for generalisation in Sect.4.2 is calibrated
for 5 s per class in regards to the benchmark results, then
enabling the subject to interact non-verbally with machines
via EMG gesture classification. Note that only preliminary
benchmarks are presented, and Sect.6 details potential
future work in this regard, that is, these preliminary activi-
ties are not considered the main contributions of this work
which were presented in Sect.4.
5.1 20 Questions withahumanoid robot opponent
20Q, or 20 Questions, is a digital game developed by Robin
Burgener based on the 20th Century American parlor
game of the same name and rules; it is a situational puzzle.
Through Burgener’s algorithm, computer opponents play
via the dissemination and subsequent strategy presented by
an Artificial Neural Network(Burgener 2006, 2003). In the
game between man and machine, the player thinks of an
entity and the opponent is able to ask 20 yes/no questions.
Through elimination of potential answers, the opponent is
free to guess the entity that the player is thinking of. If the
opponent cannot guess the entity by the end of the 20 ques-
tions, then the player has won.
In this application the 20 Questions game is played with
a humanoid robot, Softbank Robotics’ Pepper. Initially, the
subject is calibrated with 15 s of data (5 per class) added to
the full dataset, due to the findings in this work. Following
this, for every round of questioning, the robot will listen
to 5 s of data from the player, perform feature generation,
and finally will consider the most commonly predicted class
from all data objects produced in order to derive the player’s
answer. This process can be seen in Fig.4 in which feedback
is given during data classification. Two players each play
two games each with the robot. Thus, the model used is a
calibrated Random Forest (through inductive and transduc-
tive transfer learning) and a simple meta-approach of the
most common class.
As can be seen in Table5, results from the four games
are given as average accuracy on a per-data-object basis, but
the results of the game operate on the final column, EMG
Predictions Accuracy, this is the measure of correct predic-
tions of thumb states by the most common prediction of all
data objects generated over the course of data collection and
feature generation. As can be observed, the high accuracies
of per-object classification contribute towards perfect clas-
sification of player answers, all of which were at 100%.
6 Future work andconclusion
In the calibration experiment, error rates were found to
be extremely low. Accuracy measurements exceeded the
original benchmarks and thus further experimentation is
required to explore this. Calibration was performed for a
limited group of four subjects, further experimentation
should explore a more general affect when a larger group of
participants are considered.
Fig. 4 Softbank Robotics’ pepper robot playing 20 Questions with a
human through real-time EMG signal classification
Table 5 Statistics from two games played by two subjects each
Average Accuracy is given as per-data-object, correct EMG predic-
tions are given as overall decisions
Subject Yes avg.
confidence
(accuracy)
(%)
No avg.
confidence
(accuracy)
(%)
Avg.
confidence
(accuracy)
(%)
EMG
predictions
confidence
(accuracy) (%)
1 96.9 96.5 96.7 100
2 97 97 97 100
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6030 J.Kobylarz et al.
1 3
Towards the end of this work, preliminary benchmarks
are presented for potential application of the inductive and
supervised transductive transfer learning calibration process.
The 20 Questions game with a Pepper Robot was possible
with 15 s of calibration data and 5 s of answering time per
question, and predictions were at 100% for two subjects in
two different experimental runs. Further would could both
explore more subjects as well as attempt to perform this
task with shorter answering time, ie. a deeper exploration
into how much data is enough for a confident prediction. For
example, rather than the simplistic most common class Ran-
dom Forest approach, a more complex system of meta-clas-
sification could prove more useful as the pattern of error may
be useful also for prediction; if this were so, then it stands to
reason that confident classification could be enabled sooner
than the 5 s mark. Additionally, when a a best-case para-
digm is confirmed, the method could then be compared to
other sensory techniques such as image/video classification
for gesture recognition. Furthermore, should said method
be also viable, then a multi-modal approach could also be
explored in order to fuse both visual and EMG data.
This article shows that the proposed transfer learning
system is viable to be applied to the ternary classification
problem presented. Future work could explore the robust-
ness of this approach to problems of additional classes and
gestures in order to compare how results are affected when
more problems are introduced.
To finally conclude, this experiment firstly found that a
voting ensemble was a strong performer for classification of
gesture but failed to generalise to new data. With the induc-
tive and transductive transfer learning calibration approach,
the best model for generalisation of new data was a Random
Forest technique which achieved very high accuracy. After
gathering data from a subject for only 5 s, the model could
confidently classify the gesture at 100% accuracy through
the most common class Random Forest classifier. Since
very high accuracies were achieved by the transfer learning
approach in this work when compared to the state-of-the-
art related works and the proprietary MYO system, future
applications could be enabled with our approach towards a
much higher resolution of input than is currently available
with the MYO system.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
References
Abduo M, Galster M (2015) Myo gesture control armband for medical
applications. https ://www.seman ticsc holar .org/paper /Myo-Gestu
re-Contr ol-Armba nd-for-Medic al-Abduo -Galst er/3b5ed 355b0
9beec b7b2b 6bbd2 3fead 44b50 374c6
Abreu JG, Teixeira JM, Figueiredo LS, Teichrieb V (2016) Evaluating
sign language recognition using the myo armband. In: 2016 XVIII
Symposium on Virtual and Augmented Reality (SVR), IEEE, pp
64–70
Ali S, Smith KA (2006) On learning algorithm selection for classifica-
tion. Applied Soft Computing 6(2):119–138
Arnold A, Nallapati R, Cohen WW (2007) A comparative study of
methods for transductive transfer learning. In: ICDM Work-
shops, pp 77–82
Benalcázar ME, Motoche C, Zea JA, Jaramillo AG, Anchundia CE,
Zambrano P, Segura M, Palacios FB, Pérez M (2017) Real-time
hand gesture recognition using the myo armband and muscle
activity detection. In: 2017 IEEE Second Ecuador Technical
Chapters Meeting (ETCM), IEEE, pp 1–6
Bird JJ, Manso LJ, Ribeiro EP, Ekárt A, Faria DR (2018) A study on
mental state classification using eeg-based brain-machine inter-
face. In: 2018 International Conference on Intelligent Systems
(IS), IEEE, pp 795–800
Bird JJ, Faria DR, Manso LJ, Ekárt A, Buckingham CD (2019) A
deep evolutionary approach to bioinspired classifier optimi-
sation for brain-machine interaction. Complexity. https ://doi.
org/10.1155/2019/43165 48
Bishop CM (2006) Pattern recognition and machine learning.
Springer, Berlin
Burgener R (2003) 20q twenty questions
Burgener R (2006) Artificial neural network guessing method and
game. US Patent App. 11/102,105
Chiu TY, Leonard T, Tsui KW (1996) The matrix-logarithmic covar-
iance model. Journal of the American Statistical Association
91(433):198–210
Cortes C, Vapnik V (1995) Support-vector networks. Machine learn-
ing 20(3):273–297
Costante G, Galieni V, Yan Y, Fravolini ML, Ricci E, Valigi P (2014)
Exploiting transfer learning for personalized view invariant ges-
ture recognition. In: 2014 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp
1250–1254
Côté-Allard U, Fall CL, Drouin A, Campeau-Lecours A, Gosselin
C, Glette K, Laviolette F, Gosselin B (2019) Deep learning
for electromyographic hand gesture signal classification using
transfer learning. IEEE Transactions on Neural Systems and
Rehabilitation Engineering 27(4):760–771
Dash M, Liu H (1997) Feature selection for classification. Intelligent
data analysis 1(1–4):131–156
Demir F, Bajaj V, Ince MC, Taran S, Şengür A (2019) Surface emg
signals and deep transfer learning-based physical action classi-
fication. Neural Computing and Applications 31(12):8455–8462
Ganiev A, Shin HS, Lee KH (2016) Study on virtual control of a
robotic arm via a myo armband for the selfmanipulation of a
hand amputee. Int J Appl Eng Res 11(2):775–782
Gel’Fand I, Yaglom A (1959) Calculation of amount of information
about a random function contained in another such function.
Eleven Papers on Analysis, Probability and Topology 12:199
Goussies NA, Ubalde S, Mejail M (2014) Transfer learning decision
forests for gesture recognition. The Journal of Machine Learn-
ing Research 15(1):3667–3690
Grosse P, Cassidy M, Brown P (2002) Eeg-emg, meg-emg and emg-
emg frequency analysis: physiological principles and clinical
applications. Clinical Neurophysiology 113(10):1523–1531
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
6031
Thumbs up, thumbs down: non-verbal human-robot interaction throughreal-time EMG…
1 3
Guyon I, Elisseeff A (2003) An introduction to variable and feature
selection. J Mach Learn Res 3:1157–1182
Ho TK (1995) Random decision forests. In: Proceedings of the third
international conference on document analysis and recognition,
IEEE, vol1, pp 278–282
Huang D, Zhang X, Saponas TS, Fogarty J, Gollakota S (2015a)
Leveraging dual-observable input for fine-grained thumb inter-
action using forearm emg. In: Proceedings of the 28th annual
ACM symposium on user interface software and technology,
ACM, pp 523–528
Huang Y, Guo W, Liu J, He J, Xia H, Sheng X, Wang H, Feng X,
Shull PB (2015b) Preliminary testing of a hand gesture recog-
nition wristband based on emg and inertial sensor fusion. In:
International conference on intelligent robotics and applica-
tions, Springer, pp 359–367
Ipsos M (2016) Gp patient survey-national summary report. NHS Eng-
land, London
ul Islam I, Ullah K, Afaq M, Chaudary MH, Hanif MK (2019) Spatio-
temporal semg image enhancement and motor unit action potential
(muap) detection: algorithms and their analysis. J Ambient Intell
Humaniz Comput 10(10):3809–3819
Kaur M, Singh S, Shaw D (2016) Advancements in soft comput-
ing methods for emg classification. Int J Biomed Eng Technol
20(3):253–271
Kullback S, Leibler RA (1951) On information and sufficiency. Ann
Math Stat 22(1):79–86
Liu J, Yu K, Zhang Y, Huang Y (2010) Training conditional random
fields using transfer learning for gesture recognition. In: 2010
IEEE international conference on data mining, IEEE, pp 314–323
Mendez I, Hansen BW, Grabow CM, Smedegaard EJL, Skogberg NB,
Uth XJ, Bruhn A, Geng B, Kamavuako EN (2017) Evaluation
of the myo armband for the classification of hand motions. In:
2017 International conference on rehabilitation robotics (ICORR),
IEEE, pp 1211–1214
Mendoza-Palechor F, Menezes ML, SantAnna A, Ortiz-Barrios M,
Samara A, Galway L (2019) Affective recognition from eeg
signals: an integrated data-mining approach. J Ambient Intell
Humaniz Comput 10(10):3955–3974
Michie D, Spiegelhalter DJ, Taylor C etal (1994) Machine learning.
Neural Stat Classif 13:1–298
Montgomery DC, Runger GC (2010) Applied statistics and probability
for engineers. Wiley, New York
Pal M (2005) Random forest classifier for remote sensing classification.
Int J Remote Sens 26(1):217–222
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans
Knowl Data Eng 22(10):1345–1359
Piao M, Piao Y, Lee JY (2019) Symmetrical uncertainty-based feature
subset generation and ensemble learning for electricity customer
classification. Symmetry 11(4):498
Platt J (1998) Sequential minimal optimization: a fast algorithm for
training support vector machines. https ://www.micro soft.com/
en-us/resea rch/wp-conte nt/uploa ds/2016/02/tr-98-14.pdf
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and
regression tree techniques: bagging and random forests for eco-
logical prediction. Ecosystems 9(2):181–199
Rawat S, Vats S, Kumar P (2016) Evaluating and exploring the myo
armband. In: 2016 International conference system modeling and
advancement in research trends (SMART), IEEE, pp 115–120
Sathiyanarayanan M, Rajan S (2016) Myo armband for physiotherapy
healthcare: a case study using gesture recognition application. In:
2016 8th International conference on communication systems and
networks (COMSNETS), IEEE, pp 1–6
Seo J, Laine TH, Sohn KA (2019) Machine learning approaches for
boredom classification using eeg. J Ambient Intell Humaniz Com-
put 10(10):3831–3846
Shi S, Wang Q, Xu P, Chu X (2016) Benchmarking state-of-the-art
deep learning software tools. In: 2016 7th international confer-
ence on cloud computing and big data (CCBD), IEEE, pp 99–104
Strang G (2006) Linear algebra and its applications. Brooks Cole,
London
Tan JW, Walter S, Scheck A, Hrabal D, Hoffmann H, Kessler H, Traue
HC (2012) Repeatability of facial electromyography (emg) activ-
ity over corrugator supercilii and zygomaticus major on differ-
entiating various emotions. J Ambient Intell Humaniz Comput
3(1):3–10
Tatarian K, Couceiro MS, Ribeiro EP, Faria DR (2018) Stepping-stones
to transhumanism: An emg-controlled low-cost prosthetic hand for
academia. In: 2018 International conference on intelligent systems
(IS), IEEE, pp 807–812
Townsend H, Jobe FW, Pink M, Perry J (1991) Electromyographic
analysis of the glenohumeral muscles during a baseball rehabilita-
tion program. Am J Sports Med 19(3):264–272
Van Loan C (1992) Computational frameworks for the fast Fourier
transform. SIAM 10:10
Walker SH, Duncan DB (1967) Estimation of the probability of an
event as a function of several independent variables. Biometrika
54(1–2):167–179
Widodo MS, Zikky M, Nurindiyani AK (2018) Guide gesture applica-
tion of hand exercises for post-stroke rehabilitation using myo
armband. In: 2018 international electronics symposium on knowl-
edge creation and intelligent computing (IES-KCIC), IEEE, pp
120–124
Wood A, Shpilrain V, Najarian K, Kahrobaei D (2019) Private naive
bayes classification of personal biomedical data: application in
cancer data analysis. Comput Biol Med 105:144–150
Yang S, Lee S, Byun Y (2018) Gesture recognition for home automa-
tion using transfer learning. In: 2018 International conference on
intelligent informatics and biomedical sciences (ICIIBMS), IEEE,
vol3, pp 136–138
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2019)
A comprehensive survey on transfer learning. arXiv :19110 2685
Zwillinger D, Kokoska S (2000) CRC standard probability and statis-
tics tables and formulae. Chapman and Hall, Boca Raton
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Gesture recognition dapat dilihat sebagai cara komputer untuk memahami bahasa tubuh manusia [11]. Gesture Recognition sendiri merupakan topik dalam ilmu komputer dan teknologi bahasa dengan tujuan menafsirkan gerakan manusia melalui algoritma matematika [12]. Dengan sistem gesture recognition ini, sudah bukan tidak mungkin untuk mengembangkan sistem yang bisa menerjemahkan bahasa isyarat. ...
Article
Full-text available
Gesture Recognition plays a crucial role in facilitating and enhancing communication accessibility for individuals with hearing and speech impairments. However, translating complex sign language into spoken or written language remains a significant challenge. In an effort to address this, this research utilizes the MediaPipe framework and the Random Forest Classifier algorithm to classify sign language gestures and expressions in BISINDO (Indonesian Sign Language). Considering the difficulty and complexity of sign language gestures, 10 expressions/words in BISINDO were selected, resulting in a dataset of 25,000 data points used in this study. The approach involves detecting sign language through pose, hand, and facial gesture or movement recognition. Evaluation results show that the Random Forest algorithm achieves a remarkably high level of precision, recall, F1-score, and accuracy (99.88%). Additionally, the developed system demonstrates good performance with prediction probabilities ranging from 0.50 to 0.70 for correct predictions, although challenges persist in distinguishing similar sign gestures, resulting in some predictions requiring more time to yield accurate results. The findings of this research contribute significantly to improving sign language recognition and promoting inclusivity for individuals with hearing and speech impairments. Moreover, it opens up new opportunities for further advancements in sign language detection technology. Abstrak Gesture Recognition memainkan peran penting dalam memfasilitasi dan meningkatkan aksesibilitas komunikasi bagi individu dengan gangguan pendengaran dan bicara, Namun, dalam menerjemahkan bahasa isyarat yang kompleks menjadi bahasa lisan atau tulisan tetap menjadi tantangan yang signifikan. Berupaya untuk mengatasi hal tersebut, penelitian ini memanfaatkan framework MediaPipe dan algoritma Random Forest Classifier untuk mengklasifikasikan gerakan isyarat berbentuk ungkapan dan kata dalam bahasa isyarat BISINDO. Dengan mempertimbangkan tingkat kesulitan dan kompleksitas gerakan isyarat, 10 label ungkapan/kata dalam BISINDO dipilih dan menghasilkan total 25.000 data yang dipakai pada sistem di penelitian ini. Pendekatan ini melibatkan deteksi bahasa isyarat melalui pengenalan pose, gerakan tangan, dan ekspresi wajah. Hasil evaluasi menunjukkan algoritma Random Forest mencapai tingkat presisi, recall, F1-score, dan akurasi yang sangat tinggi (99,88%). Selain itu, sistem yang dikembangkan juga menunjukkan kinerja baik dengan rata-rata probabilitas prediksi berkisar antara 0,50 hingga 0,70 untuk prediksi yang benar, meskipun terdapat tantangan dalam membedakan gerakan isyarat yang mirip dan menyebabkan beberapa prediksi memerlukan waktu lebih lama untuk mencapai hasil yang tepat. Dengan hasil yang diperoleh, penelitian ini memberikan kontribusi penting dalam meningkatkan pengenalan bahasa isyarat dan mendorong inklusivitas bagi masyarakat dengan gangguan pendengaran dan bicara. Hal ini juga membuka peluang baru untuk pengembangan lebih lanjut dalam teknologi deteksi bahasa isyarat.
... In particular, fine-tuning can optimize and adjust the pre-training model to better fit target tasks. By re-training existing DL models on domain-specific dataset, it requires minimal modifications to model architectures and training procedures [32][33][34] . Furthermore, the reliability of DL inference is pivotal to myoelectric control as patients are in the control loop and directly interacts with robots or other environment factors. ...
Article
Full-text available
Recently, robot-assisted rehabilitation has emerged as a promising solution to increase the training intensity of stroke patients while reducing workload on therapists, whilst surface electromyography (sEMG) is expected to serve as a viable control source. In this paper, we delve into the potential of deep learning (DL) for post-stroke hand gesture recognition by collecting the sEMG signals of eight chronic stroke subjects, focusing on three primary aspects: feature domains of sEMG (time, frequency, and wavelet), data structures (one or two-dimensional images), and neural network architectures (CNN, CNN-LSTM, and CNN-LSTM-Attention). A total of 18 DL models were comprehensively evaluated in both intra-subject testing and inter-subject transfer learning tasks, with two post-processing algorithms (Model Voting and Bayesian Fusion) analysed subsequently. Experiment results infer that for intra-subject testing, the average accuracy of CNN-LSTM using two-dimensional frequency features is the highest, reaching 72.95%. For inter-subject transfer learning, the average accuracy of CNN-LSTM-Attention using one-dimensional frequency features is the highest, reaching 68.38%. Through these two experiments, it was found that frequency features had significant advantages over other features in gesture recognition after stroke. Moreover, the post-processing algorithm can further improve the recognition accuracy, and the recognition effect can be increased by 2.03% through the model voting algorithm.
... We assessed various classification models, including pre-trained models, to identify potential candidates for our ensemble. We adopted the transfer learning strategy [39][40][41] by utilizing pre-trained language models like BERT and GPT, fine-tuning them on our target datasets: augmented EmoUERJ and ESD datasets. We augmented the ESD dataset with 300 randomly generated sentences per sentiment category (positive, neutral, negative), resulting in a total of 1250 labeled English sentences. ...
Article
Full-text available
Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human-robot interaction, and cross-cultural communication.
... As a kind of humancomputer interaction, it is also used as a gesture recognition technology that enables physical human actions to be input to a computer. 8 Additionally, there are initiatives to employ EMG as a signal for controlling electronic mobile devices, 5,9 prostheses, 10 and even flight control systems. 11 One of the many moving things that can be controlled by an EMG-based interface is an electric wheelchair. ...
Article
Full-text available
The diagnosis of neuromuscular diseases is complicated by overlapping symptoms from other conditions. Textile-based surface electromyography (sEMG) of skeletal muscles, offer promising potential in diagnosis, treatment, and rehabilitation of various neuromuscular disorders. However, it is important to consider the impact of load and pressure on EMG signals, as this can significantly affect the signal’s accuracy. This study seeks to investigate the influence of load and pressure on EMG signals and establish a processing framework for these signals in the diagnosis of neuromuscular diseases. The sEMG data were collected from healthy subjects using a textile electrode developed from polyester multi-filament conductive hybrid thread (CleverTex). The textrode was embroidered directly on an elastic bandage (Velcro® strap) placed on volunteer’s muscles while different activities were performed with varying loads and pressure. The collected data were pre-processed using standard techniques of the discrete wavelet transform to remove noise and artifacts. The performance of the proposed denoising algorithm was evaluated using the signal-to-noise ratio (SNR), percentage root mean square difference (PRD), and root mean square error (RMSE). Various signal processing approaches (filters) were considered and the results were compared with the proposed EMG noise reduction algorithms. Based on the experimental results, the fourth level of decomposition for the sym5 wavelets with the Rigrsure threshold method achieved the highest signal-to-noise ratio (SNR) values of 16.69 and 21.91, for soft and hard thresholding functions, respectively. The SNR values of 22.11, 21.54, and 2.78 at three different pressure levels 5 mmHg, 10 mmHg, and 20 mmHg, respectively, indicate the superior performance of wavelet multiresolution filter in de-noising applications. The results of this study suggest that our methodology is effective, precise, and reliable for analysing sEMG data and provide insights into both physiological and pathological neuromuscular conditions.
... Nevertheless, the existing methodologies primarily focus on studying verticallateral coupling features [8][9][10]31] of the SMTBS, neglecting the longitudinal motion behavior and coupling effect between the vehicle elements. Nowadays, driving safely and favorable humanoid-assisted human interaction [32][33][34][35][36] have been very important and hot topics in SMT. To investigate the 3D dynamics features of SMTBS and ensure good operations (traction and braking), it is necessary to develop a 3D vehicle-bridge interaction model by further considering the longitudinal dynamic interaction. ...
Article
Full-text available
Variable speed operation of the train cause easily the wheel-track slipping phenomenon, inducing strong nonlinear dynamic behavior of the suspended monorail train and bridge system (SMTBS), especially under an insufficient wheel-track friction coefficient. To investigate the coupled vibration features of the SMTBS under variable speed conditions, a novel 3D train–bridge interaction model for the monorail system considering nonlinear wheel-track slipping behavior is developed. Firstly, based on the D’Alembert principle, the vibration equations of the vehicle subsystem are derived by adequately considering the nonlinear interactive behavior among the vehicle components. Then, a high-efficiency modeling method for the large-scale bridge subsystem is proposed based on the component mode synthesis (CMS) method. The vehicle and bridge subsystems are coupled with a spatial wheel-track interaction model considering the nonlinear wheel-track sliding behavior. Furtherly, by a comprehensive comparison with the field test data, the effectiveness of the proposed method is verified, as well as the reasonable modal truncation frequencies of the bridge subsystem are determined. On this basis, the dynamics performances of the SMTBS are evaluated under different initial braking speeds and wheel-track interfacial adhesion conditions; besides, the nonlinear wheel-track slipping characteristics and their influences on the vehicle–bridge interaction are also revealed. The analysis results indicate that the proposed model is reliable for investigating the time-varying dynamic features of SMTBS under variable train speeds. Both the axle load transfer phenomenon and longitudinal slip of the driving tire would be easy to appear under the braking condition, which would significantly increase the longitudinal vehicle–bridge dynamic responses. To ensure a good vehicle–bridge dynamics performance, it is suggested that the wheel-track interfacial friction coefficient is larger than 0.35.
... Khushaba et al designed a canonical correlation analysis (CCA) framework to extract individualirrelevant feature sets from different users for building a cross-individual motion classifier, and achieved 83% accuracy across multiple users [17]. Kobylarz et al achieved the recognition accuracy of around 97% on new subjects by introducing the processes of inductive and supervised transductive approach and using a five-second EMG data of each motion class as calibration data [18]. Although these supervised methods can improve the accuracy of motion intent recognition for new users, they usually require a labeled dataset from the new users for TL, which will increase the burden of providing labeled calibration data. ...
Article
Full-text available
Objective. Surface electromyography pattern recognition (sEMG-PR) is considered as a promising control method for human-machine interaction systems. However, the performance of a trained classifier would greatly degrade for novel users since sEMG signals are user-dependent and largely affected by a number of individual factors such as the quantity of subcutaneous fat and the skin impedance. Approach. To solve this issue, we proposed a novel unsupervised cross-individual motion recognition method that aligned sEMG features from different individuals by self-adaptive dimensional dynamic distribution adaptation (SD-DDA) in this study. In the method, both the distances of marginal and conditional distributions between source and target features were minimized through automatically selecting the optimal feature domain dimension by using a small amount of unlabeled target data. Main results. The effectiveness of the proposed method was tested on four different feature sets, and results showed that the average classification accuracy was improved by above 10% on our collected dataset with the best accuracy reached 90.4%. Compared to six kinds of classic transfer learning methods, the proposed method showed an outstanding performance with improvements of 3.2%-13.8%. Additionally, the proposed method achieved an approximate 9% improvement on a publicly available dataset. Significance. These results suggested that the proposed SD-DDA method is feasible for cross-individual motion intention recognition, which would provide help for the application of sEMG-PR based system.
Article
Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation network has been previously proposed in the literature for effective ATR annotation. Although this method demonstrates great potential for ATR, it severely suffers from lower annotation performance, higher Fréchet Inception Distance (FID) score, and the presence of visual artifacts in the synthetic images. To address these issues, we propose a hybrid contrastive learning base unpaired domain translation (H-CUT) network that achieves a significantly lower FID score. It incorporates both attention and entropy to emphasize the domain-specific region, a noisy feature mixup module to generate high variational synthetic negative patches, and a modulated noise contrastive estimation (MoNCE) loss to reweight all negative patches using optimal transport for better performance. Our proposed contrastive learning and cycle-consistency-based TTL (C3TTL) framework consists of two H-CUT networks and two classifiers. It simultaneously optimizes cycle-consistency, MoNCE, and identity losses. In C3TTL, two H-CUT networks have been employed through a bijection mapping to feed the reconstructed source domain images into a pretrained classifier to guide the optimal target domain classifier. Extensive experimental analysis conducted on six ATR datasets demonstrates that the proposed C3TTL method is effective in annotating civilian and military vehicles, ships, planes, and human targets.
Article
Full-text available
Human physical action classification is an emerging area of research for human-to-machine interaction, which can help to disable people to interact with real world, and robotics application. EMG signals measure the electrical activity muscular systems, which involved in physical action of human. EMG signals provide more information related to physical action. In this paper, we proposed deep transfer learning-based approach of human action classification using surface EMG signals. The surface EMG signals are represented by time–frequency image (TFI) by using short-time Fourier transform. TFI is used as input to pre-trained convolutional neural network models, namely AlexNet and VGG16, for deep feature extraction, and support vector machine (SVM) classifier is used for classification of physical action of EMG signals. Also, the fine-tuning of the pre-trained AlexNet model is also considered. The experimental results show that deep feature extraction and SVM classification method and fine-tuning have obviously improved the classification accuracy when compared with various results from the literature. The 99.04% accuracy score is obtained with AlexNet fc6 + AlexNet fc7 + VGG16 fc6 + VGG16 fc7 deep feature concatenation and SVM classification. 98.65% accuracy score is performed by fine-tuning of the AlexNet model. We also compare the obtained results with some of the existing methods. The comparisons show that the deep feature concatenation and SVM classification method provide better classification accuracy than the compared methods.
Article
Full-text available
In spatiotemporal multi-channel surface electromyogram (EMG) images where the x-axis is time, the y-axis is EMG channels and the gray level is EMG amplitude, the motor unit action potential (MUAP) appears as a linear Gaussian structure. The appearance of this MUAP pattern in the spatiotemporal images is mostly distorted either by the destructive superposition of other MUAPs occurring in the conducting volume or by various noises such as a power line, bad electrode and skin contacts and movement artifacts. For accurate automatic detection of MUAP, EMG image enhancement is needed to suppress the background noises and enhance the line-like MUAP propagation patterns. This study presents several candidate filters to enhance the MUAPs propagation pattern in spatiotemporal EMG images. The filters, which can detect and enhance line-like structure in digital images, are used. Specifically, the Hermite shape filter is used for EMG image enhancement and compared with Gabor filter and steerable filters. The performance of the filters regarding accuracy, specificity, and sensitivity is evaluated with real sEMG signal measured from different muscles and computer-generated EMG signals. In the enhanced images the visibility of the MUAP region is improved. These results can help in better estimation of muscle characteristics from sEMG signals.
Article
Full-text available
The use of actual electricity consumption data provided the chance to detect the change of customer class types. This work could be done by using classification techniques. However, there are several challenges in computational techniques. The most important one is to efficiently handle a large number of dimensions to increase customer classification performance. In this paper, we proposed a symmetrical uncertainty based feature subset generation and ensemble learning method for the electricity customer classification. Redundant and significant feature sets are generated according to symmetrical uncertainty. After that, a classifier ensemble is built based on significant feature sets and the results are combined for the final decision. The results show that the proposed method can efficiently find useful feature subsets and improve classification performance.
Article
Full-text available
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: one for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations.
Article
Full-text available
Recently, commercial physiological sensors and computing devices have become cheaper and more accessible, while computer systems have become increasingly aware of their contexts, including but not limited to users’ emotions. Consequently, many studies on emotion recognition have been conducted. However, boredom has received relatively little attention as a target emotion due to its diverse nature. Moreover, only a few researchers have tried classifying boredom using electroencephalogram (EEG). In this study, to perform this classification, we first reviewed studies that tried classifying emotions using EEG. Further, we designed and executed an experiment, which used a video stimulus to evoke boredom and non-boredom, and collected EEG data from 28 Korean adult participants. After collecting the data, we extracted its absolute band power, normalized absolute band power, differential entropy, differential asymmetry, and rational asymmetry using EEG, and trained these on three machine learning algorithms: support vector machine, random forest, and k-nearest neighbors (k-NN). We validated the performance of each training model with 10-fold cross validation. As a result, we achieved the highest accuracy of 86.73% using k-NN. The findings of this study can be of interest to researchers working on emotion recognition, physiological signal processing, machine learning, and emotion-aware system development.
Article
Full-text available
Clinicians would benefit from access to predictive models for diagnosis, such as classification of tumors as malignant or benign, without compromising patients’ privacy. In addition, the medical institutions and companies who own these medical information systems wish to keep their models private when in use by outside parties. Fully homomorphic encryption (FHE) enables computation over encrypted medical data while ensuring data privacy. In this paper we use private-key fully homomorphic encryption to design a cryptographic protocol for private Naive Bayes classification. This protocol allows a data owner to privately classify his or her information without direct access to the learned model. We apply this protocol to the task of privacy-preserving classification of breast cancer data as benign or malignant. Our results show that private-key fully homomorphic encryption is able to provide fast and accurate results for privacy-preserving medical classification.
Conference Paper
Full-text available
Over the past decades, humans have redesign themselves with the intent to evolve beyond their current physical and mental limitations. This phenomenon has been known as transhumanism, wherein robotics and biomimetics have been exploiting the unique designs of the human body with the intent to develop disruptive anthropomorphic artificial appendages. Nevertheless, while lower extremity prosthetics have evolved to the point at which lower leg amputees may be competitive with professional runners in the world, there is still a gap between upper extremity prosthetics and real hands. This work intends to be a pioneer into developing a low-cost multipurpose robotic hand for research and academia. This paper describes the robotic hand, including its electromechanical development and full ROS integration. Moreover, the paper also presents a MatLab framework designed to introduce sequence data classification, namely providing the ability to control the robotic hand using electromyography (EMG) signals from the forearm. This paper expects to contribute to an ever-increasing human-robot symbiosis by motivating students to engage in transhumanism studies using more sophisticated technologies and methods.