ArticlePDF Available

Abstract and Figures

Recently, researchers in the area of biosensor based human emotion recognition have used different types of machine learning models for recognizing human emotions. However, most of them still lack the ability to recognize human emotions with higher classification accuracy incorporating a limited number of bio-sensors. In the domain of machine learning, ensemble learning methods have been successfully applied to solve different types of real-world machine learning problems which require improved classification accuracies. Emphasising on that, this research suggests an ensemble learning approach for developing a machine learning model that can recognize four major human emotions namely: anger; sadness; joy; and pleasure incorporating electrocardiogram (ECG) signals. As feature extraction methods, this analysis combines four ECG signal based techniques, namely: heart rate variability; empirical mode decomposition; with-in beat analysis; and frequency spectrum analysis. The first three feature extraction methods are well-known ECG based feature extraction techniques mentioned in the literature, and the fourth technique is a novel method proposed in this study. The machine learning procedure of this investigation evaluates the performance of a set of well-known ensemble learners for emotion classification and further improves the classification results using feature selection as a prior step to ensemble model training. Compared to the best performing single biosensor based model in the literature, the developed ensemble learner has the accuracy gain of 10.77%. Furthermore, the developed model outperforms most of the multiple biosensor based emotion recognition models with a significantly higher classification accuracy gain.
Content may be subject to copyright.
An Ensemble Learning Approach
for Electrocardiogram Sensor Based Human
Emotion Recognition
Theekshana Dissanayake * ID , Yasitha Rajapaksha, Roshan Ragel ID and Isuru Nawinne
Department of Computer Engineering, University of Peradeniya, Peradeniya 20400, Sri Lanka; (Y.R.); (R.R.); (I.N.)
*Correspondence:; Tel.: +94778495156
Received: 14 August 2019; Accepted: 5 October 2019; Published: 16 October 2019
Recently, researchers in the area of biosensor based human emotion recognition have used
dierent types of machine learning models for recognizing human emotions. However, most of them
still lack the ability to recognize human emotions with higher classification accuracy incorporating a
limited number of bio-sensors. In the domain of machine learning, ensemble learning methods have
been successfully applied to solve dierent types of real-world machine learning problems which
require improved classification accuracies. Emphasising on that, this research suggests an ensemble
learning approach for developing a machine learning model that can recognize four major human
emotions namely: anger; sadness; joy; and pleasure incorporating electrocardiogram (ECG) signals.
As feature extraction methods, this analysis combines four ECG signal based techniques, namely:
heart rate variability; empirical mode decomposition; with-in beat analysis; and frequency spectrum
analysis. The first three feature extraction methods are well-known ECG based feature extraction
techniques mentioned in the literature, and the fourth technique is a novel method proposed in
this study. The machine learning procedure of this investigation evaluates the performance of a set
of well-known ensemble learners for emotion classification and further improves the classification
results using feature selection as a prior step to ensemble model training. Compared to the best
performing single biosensor based model in the literature, the developed ensemble learner has the
accuracy gain of 10.77%. Furthermore, the developed model outperforms most of the multiple
biosensor based emotion recognition models with a significantly higher classification accuracy gain.
bio-signal processing; wearable computing; ensemble learning; electrocardiogram;
machine learning
1. Introduction
Human–Computer Interaction (HCI) research is focused on making interaction with computers
more productive and interactive. One of the methods used for improving the interaction between
humans and computers is to provide emotional intelligence to computing systems. Such systems are
capable of adapting depending on the emotional state of the user. Some examples for such systems
include entertainment systems, healthcare systems, adaptive learning systems and computer games.
Previous studies have investigated dierent methods to provide emotional intelligence to
computers. Among those methods, facial image based emotion recognition is the most widely
used method since this method can recognize a wide range of emotion types [
]. Another approach is
the speech signal analysis that determines the human emotion by analysing the patterns in the speech
signal [
]. In addition, direct examination of the person is one of the most sophisticated ways for
human emotion recognition [
]. These methods use dierent types of bio-signals to continuously
Sensors 2019,19, 4495; doi:10.3390/s19204495
Sensors 2019,19, 4495 2 of 24
monitor and detect human emotions. Due to its un-maskable nature, compared to facial emotion
recognition and speech analysis, bio-signal based methods provide highly accurate results [7,8].
Several attempts have been made to use dierent types of bio-signals for emotion recognition.
Some of the widely used bio-signals include the electrocardiogram (ECG), galvanic skin response (GSR),
electromyogram and respiration. Some studies have combined multiple bio-signal devices to recognize
emotions, whereas other studies have used a single biosensor such as ECG for capturing data for
emotion recognition [
]. Despite the increased recognition accuracy, using multiple sensors might
lead to user dissatisfaction. Compared to other biosensors, ECG is the most widely used biosensor
because ECG signals are less noisy and they contain emotion related information [9,11].
Previous studies conducted to investigate the use of ECG signals have used dierent types of
feature extraction methods. Some of those methods include heart rate variability (HRV), empirical mode
decomposition (EMD), with-in beat analysis (WIB) and wavelet transforms [
]. Although those
feature extraction methods are sophisticated methods, they suer from low prediction accuracy.
However, too little attention has been given to combining these sophisticated methods to achieve
higher emotion recognition accuracy.
This study presents an ensemble learning approach for recognising human emotions using ECG
signals. First, this investigation will discuss the protocol followed to obtain emotion-related ECG data
through an experiment. Following that, this study will describe three broadly used ECG signal feature
extraction methods, namely: HRV; EMD; and WIB analysis, and also a novel method proposed in this
study based on the frequency spectrum of the ECG wave. Finally, this research will elaborate on the
machine learning process that followed to create a classification model by combining the mentioned
feature extraction methods. Briefly, the machine learning model takes a 20s window of an ECG signal
and then classifies the signal to four distinct emotion classes, namely: anger; sadness; joy; and pleasure.
The machine learning process first evaluates the capability of dierent ensemble learners for emotion
classification. After analysing, as an additional step, a feature selection process is employed to improve
the classification accuracy of each ensemble learner. This step is based on previous related studies on
ensemble learners [
] and feature selection strategies for ensemble learners [
]. Finally, this
research presents the selected features from the feature selection process and compares the obtained
results with dierent ECG based emotion recognition models in the literature.
2. Related Work
In recent years, there has been an increasing amount of literature on human–computer interaction
methods to provide emotional intelligence to computers. Emotional intelligence is widely used
to develop emotionally-aware healthcare monitoring systems, computer games and entertainment
systems and safe driving systems. In computer games, emotional intelligence can be used to evaluate
the player’s aective state for dynamic game content generation [
]. Similarly, in vehicle safety
systems, emotion recognition models are used to monitor the aective state of the driver while
operating [
]. Furthermore, in health care systems, emotional intelligence is employed to monitor
the emotional state of patients [6,25,26].
Rattanyu and Mizukawa [
] discuss speech analysis, facial feature analysis and bio-signal
processing as the primary methods for emotion recognition. Firstly, speech based emotion recognition
methods determine the emotion by analysing a given speech signal. The main drawback of this method
is that the user needs to speak continuously if the system wants to figure out the emotional state.
Secondly, facial image based recognition is another widely used method for emotion recognition [
Although it provides accurate predictions for emotions, the main problem in this method is that
some people tend to mask their emotional states (social masking) while predicting [
]. Finally,
bio-signal processing methods use dierent types of bio-signals to predict emotions. A bio-signal based
method is an adequate solution for recognizing emotions compared to other methods. Because of
its unmasked nature, bio-signals improve the predicting accuracy compared to facial image based
Sensors 2019,19, 4495 3 of 24
recognition methods [
]. In addition, since it is available continuously, unlike speech based systems [
the system can continuously identify the emotion level.
Numerous studies have attempted to use dierent types of bio-signals for detecting
emotions [9,10,27].
Kim and André [
] developed an emotion recognition model incorporating four
dierent biosensors: electrocardiogram, skin conductivity, electromyogram and respiration. In their
investigation, they have achieved 70% accuracy for a person independent of emotion classification.
The developed model was able to classify a given set of bio-signal patterns to four emotion classes:
anger, sadness, pleasure and joy. In another major study, Rattanyu and Mizukawa [
] developed an
emotion recognition model using ECG signals with a classification accuracy of 61%. In a study that set
out to develop a neural network based emotion recognition model, Yoo et al. [
] developed a model
incorporating ECG signals and skin conductivity. In their study, they have formed a classification model
with 80.1% accuracy. In another major investigation, Ayata et al. [
] developed an emotion recognition
model to classify arousal and valence using galvanic skin response. The mentioned model incorporates
features from empirical mode decomposition and statistical analysis methods. Among multi-sensor
based emotion recognition models, the model developed by Nazos et al. [
] has the ability to recognize
sadness, anger, surprise, fear, frustration and amusement with up to 83% accuracy. Furthermore,
the recent investigation by Gouizi et al. [
] has an accuracy of 83% for recognizing six emotions by
using six dierent biosensors. More information on biosensor based emotion recognition can be found
in the recent state-of-art reviews by Jeritta et al. [32] and Egger et al. [33].
Murugappan et al. [
] discuss the challenges and limitations in multi-sensor based emotion
recognition. One of the major challenges is the increased computational complexity due to multiple
sensor data streams and algorithmic requirements. The other factor is the limitation to subjects’ freedom
(movements) due to multiple sensor probes, wires, etc. By building on the concept of simplicity, they
have been able to develop an emotion recognition model with a classification accuracy of 66.48% by
only using ECG signals. Considering all of these factors, the selected method should be able to provide
high emotion recognition accuracy with a minimum number of sensors.
A number of studies have examined the use of ECG signals for emotion recognition [
An ECG based method is an adequate solution due to four important reasons. Firstly, the ECG
signal is a result of activities in the heart that has nerve endings from the autonomic nervous system
that governs the behaviour of each emotion [
]. Secondly, ECG sensors can be used as a wearable
device [
]. Thirdly, it is convenient to use because ECG signals can be captured from dierent parts of
the body [37]. Finally, it has a high amplitude compared to other biosensors [9].
To date, various methods have been developed and introduced to extract features from ECG
signals. One commonly used method is the heart rate variability analysis [
]. HRV analysis
is a broadly used method in biomedical engineering applications [
]. The method developed by
Ferdinando et al. [
] using HRV analysis had an accuracy of around 59% for identifying arousal
and valence state of a person. Similarly, another study that developed an emotion recognition model
incorporating ECG signals and skin resistance had 80% accuracy for recognizing the four quadrants of
the discrete emotional model [40].
Another widely used method is the empirical mode decomposition. EMD is one of the
well-structured approaches to analysing non-stationary and nonlinear data [
]. Furthermore, according
to investigations done by Manjula and Sarma [
], compared to wavelets, EMD performs better while
extracting spectral power based features. Foteini et al. [
] point out that the first six intrinsic mode
functions generated from the EMD method relates to a specific activity in the heart. Developing on
that, the number of studies have used empirical mode decomposition for analysing ECG signals [
Jerrita et al. [
] investigated the use of Hilbert–Huang transform (HHT) for EMD based feature
extraction and came up with a classification model with 54% accuracy for identifying six emotions in
the discrete emotion model [40].
With-in beat analysis is another method for ECG based emotion recognition that has a high
emotion recognition accuracy compared to EMD and HRV methods. This method was introduced by
Sensors 2019,19, 4495 4 of 24
Rattanyu and Mizukawa [
] for recognising six emotions in the emotional spectrum. Their model was
able to identify an emotion with up to 61% accuracy using ECG signals.
Some of the studies have used discrete Fourier transform (DFT) to extract frequency domain
features from the ECG signal. Jerritta et al. [
] discuss the advantages of using frequency domain
features compared to EMD based features. They claim that, unlike EMD features that provide an
idea about the local properties of the ECG wave, the DFT method provides information about the
frequency content of the signal. In their study, they have achieved 54% accuracy for recognizing
neutral, happiness, sadness, fear, surprise and disgust emotions from ECG signals utilizing DFT based
features of ten intrinsic mode functions derived from the EMD method.
Collectively, most of the studies have used dierent analysis methods to extract features from
ECG signals. In HRV analysis, the HRV time series is generated only by considering the R–R interval
variations of the ECG wave. However, the features extracted from this method represent features from
both the time domain and the frequency domain of the HRV wave. Similarly, the EMD technique
decomposes the signal into a set of oscillating signals. The features extracted from this method also
correspond to a set of fragmented features that has correlations to an ECG wave. However, compared to
these two methods, the within beat method analyses the raw ECG wave in the time domain. In addition,
compared to frequency domain based features extracted by EMD and HRV methods, the DFT method
provides an overview of the frequency domain of the raw ECG wave. Each of these approaches has
its own advantages, and the features generated by all of these techniques represent a broad range of
features that correspond to dierent domains and spaces of the ECG wave. However, most of the ECG
based feature extraction methods in the literature have emotion recognition capability around 55% for
dierent types of classification requirements.
Together, these studies highlight the need for an accurate emotion recognition model with a
minimum number of biosensors. The studies presented thus far provide evidence that ECG is the best
method for capturing bio-signals because ECG signals contain emotion-related information. In addition,
considering the accuracies gained from the classification models, there is a need for higher classification
accuracy. However, the methods represented in the literature extract a wide range of features from the
ECG wave, and they are sophisticated methods for examining time-varying data. Up to now, a number
of studies have investigated dierent approaches for ECG based emotion recognition. However, up to
now, no one has investigated the feasibility of combining well-known ECG based feature extraction
methods to select an optimal set of features that gives higher emotion classification accuracy.
Considering most of the studies mentioned in the literature, it is apparent that the majority of
them have used traditional single learner algorithms as the prediction model. Most of the considered
algorithms include support vector machines, K-nearest neighbour, Fisher analysis and artificial neural
networks. Even though most of the mentioned algorithms are well-equipped techniques, a majority of
them lack the ability to recognize emotions with a higher classification accuracy. Recently, ensemble
learning methods have been used to improve the classification accuracy of various problems in
dierent domains, and they have gained significant accuracy improvements after applying these
techniques [
]. Furthermore, even the research in the domain of biomedical signal analysis have
also used these ensemble techniques to improve the model performance [12,17,18].
Even though there has been an extensive amount of research conducted in defining primal emotions
for humans, yet, while developing prediction models, research has selected dierent emotions as
their target emotions [
]. This investigation is based on the 2D emotional model proposed by
J. A Russel [45] where
the emotions are placed in a 2D arousal and valance space. To be more broad in
the aspect the emotion selection, this study considers the primal emotion of each emotional quadrant
as the selected emotion. This selection will improve the diverse nature of the predictions made in
the study. Furthermore, there are similar studies that had the same set of emotions as their targets,
and those gained classification accuracies will be beneficial while benchmarking the developed model.
Therefore, the analysis of this study is focused on recognizing four primal emotions, namely: anger;
sadness; pleasure; and joy. Additionally, this study presents the classification results of two models
Sensors 2019,19, 4495 5 of 24
developed incorporating two additional emotions in the emotional spectrum. Furthermore, a complete
overview of the emotions and their organization in the arousal valance space is described in the next
Section 3.1.3.
The main objective of this paper is to evaluate the capability of ensemble learners for
biosensor based human emotion recognition that requires higher prediction accuracies. This research
combines four ECG based feature extraction methods, namely: HRV; EMD; WIB; and DFT based.
The first two techniques are the most widely used methods in the literature, and this study uses
the with-in beat method because of its high emotion recognition accuracy. Additionally, this study
introduces a novel method that extracts a set of frequency-domain features from ten frequency bands
of the ECG wave employing discrete Fourier transform (named as TFB features). As an additional
step for the ensemble learning procedure, the machine learning analysis of this paper selects a set
of optimal features by combining the mentioned feature extraction methods for recognising anger,
sadness, joy and pleasure.
3. Material and Methods
As discussed earlier, the principal objective of this research is to suggest an ensemble learning
approach for ECG based emotion recognition by combining four ECG feature extraction methods.
The selected feature extraction methods can be listed as follows: HRV, EMD, WIB and TFB. Section 3.1
of the methodology describes the 2D emotion model and ECG signal acquisition. Moving forward,
Section 3.2 addresses the signal pre-processing algorithms. After that, Section 3.3 describes selected
feature extraction methods in detail. Finally, Section 3.4 of the methodology illustrates the machine
learning process.
3.1. Experiment for ECG Data Collection
This section describes the process conducted to acquire data from subjects to develop a machine
learning model. The first section of this subsection provides insight into selecting a suitable ECG sensor
for capturing ECG data. The second section explains the ECG data capturing algorithm developed.
Moving forward, the following section talks about the adapted discrete emotional model. The last
section explains the ECG data collecting experiment in detail.
3.1.1. ECG Sensor
As there are various types of hardware out there that records ECG signals, hardware selection was
done based on a few factors. Firstly, the subject should not feel restricted while wearing the hardware
as it has a direct impact on the comfort of the subject. Secondly, recorded ECG signals should not
be too noisy as too much noise will make the processing dicult. Finally, the hardware should be
financially aordable. After considering all, a Spiker-Shield Heart and Brain sensor was selected as it is
aordable and has an inbuilt noise sensor. The subject only needs to have a few plasters on their hand
in order to record the ECG signals, which makes it not too disturbing to the subject as well. Figure 1
shows an image of the wearable sensor used.
3.1.2. Signal Collection
A simple algorithm was developed to acquire signals from subjects by communicating with an
Arduino microcontroller. The sampling rate of the signal was set to 1000 Hz, and the baud rate of
the serial communication unit was adjusted to 115,200 bps. Since the emotion-related changes can be
observed in 3–15s of the ECG signal frame, the length of the ECG signal was set to a 20 s interval [
The algorithms sent the captured 20 s ECG wave frame to the data collection node via a Nodejs
asynchronous function interface. Then, the data collection node captures the data and writes the values
to a data file (.txt) indicating the subject ID, captured time and the emotion. Later, this information is
used to filter the emotion elicited data frame from the data space.
Sensors 2019,19, 4495 6 of 24
Figure 1. SpikerShield Heart and Brain sensor.
3.1.3. Discrete Emotional Model
As shown in Figure 2, the discrete 2D emotional model [
] places all human emotions on two
axes, namely: arousal; and valence. The first quadrant of the emotional model includes highly aroused
and valenced emotions. This quadrant holds joy as its primal emotion and other sub-emotions such
as excited, astonished, and delighted as secondary emotions. The second quadrant of the emotional
model, which represents low aroused and high valenced emotions, includes emotions like pleasure
and calm. Next, the third quadrant of the emotional model incorporates emotions such as sadness and
digest. Finally, the fourth quadrant of the emotion model includes anger, fear and annoyed emotions,
which represents the low valenced high aroused scenarios.
Figure 2. Discrete emotional model.
3.1.4. Experiment
The designed experiment captures six emotions in the discrete emotional model, namely: joy;
sadness; pleasure; anger; fear; and neutral. A majority of these emotions represent the primal emotions
of the emotion spectrum, and those are the emotions that were intended be identified in this study
using ECG signals. The other emotions (fear and neutral) were chosen to conduct the comparative
analysis with the literature.
As for the design of the experiment, firstly, a set of videos were collected by consulting with
domain experts to elicit selected target emotions. Each of these videos was 3–10 minutes in length.
Sensors 2019,19, 4495 7 of 24
Subjects were invited into a disturbance-free environment and sensors were fixed on to them in order
to record ECG signals. Then, each and every video was shown to the subjects having two minutes
of breaks in between. At the end of each video, subjects were asked to write down their emotional
experience throughout the video in a pre-designed feedback paper, highlighting points of emotional
climaxes. If the subject’s emotional climaxes match with the target emotion, then it is marked as a
successful attempt (i.e., a hit—see the Table 1). These climaxes were synchronized with the ECG data
collection unit, and later this information was used to filter the emotion-related ECG signals. This is
achieved by measuring the ECG signals with time information, and then by matching the time of the
emotional climax and the signal time.
Table 1above illustrates the information regarding the selected video clips and durations. Even
though they have marked as a specific emotion-related video, subjects were not aware of the intended
emotion level of each. Furthermore, to further eliminate the bias in the results, the order of the video
play was also changed. Collectively, compared to other three primal emotions, anger video has the
lowest hit rate, and this phenomenon was also observed in previous studies [
]. Despite that, other
selected videos had a significantly better chance of eliciting target emotions.
Table 1. Videos and subjective results.
Name Target Emotion Duration (minutes) Hits Misses
A scene from the Mr. Bean (1997) movie Joy 5 25 0
A TV commercial about a pet Sad 3.5 23 2
A 4K-HD video of space and landscape Pleasure 5 18 2
A man beating a women in the streets (a viral video) Anger 2 15 10
A movie seance from Mama (2013) Fear 10 25 0
A black screen Neutral 5 15 10
Figure 3shows the experimental setup and the environment while two subjects were going
through the designed experiment. Furthermore, subjects were of both genders, between 22–26 years of
age and a total number of 25 subjects participated in the experiment. Out of these 25, ECG signals of
three participants had to be removed due to noise issues and signal anomalies. The rest of the ECG
signals were used for the proceeding work.
Figure 3. Experiment environment.
The final filtered data set contains 488 20 s ECG waves that include, 105 anger waves, 110 sadness
waves, 174 joy waves and 99 pleasure waves. Furthermore, it comprises 165 data frames of fear emotion
and 103 from neutral emotion.
Sensors 2019,19, 4495 8 of 24
3.2. Signal Pre-Processing
The pre-processing procedure of the signal consists of three main steps: filtering, de-trending and
smoothing. First, a Butterworth bandpass filter with a frequency range of 0.05–100 Hz was used to
remove the noise from the ECG signal [
]. The resulting signal shows a trended pattern as a result of
filtering near the DC component. Therefore, Algorithm 1was used to stabilize the signal.
Algorithm 1 De-Trending
Require: x[n]|n∈ {0, 1, . . . ,N1}
Require: 0KN(de f ault 8)
− − − − Split the signal x[n]into K segments.
− − − − De f ine X[K]size (K,N/K)
X[K] = split(x[n], 8){ ∀k[1, 2, . . . ,K]}
− − − − For each segment compute
for xk[n]X[K]do
− − − − Fit the kth segment to a 2nd order polynomial
poly_coe f fk=pol y f it(xk[n],degree =2)
− − − − Predict the trend using polynomial
trendk=poly_coe f fk[t]
− − − − Remove the trend
for t[0, . . . ,N/K]do
end for
end for
− − − − Concatenate all segments
ECG[n] = concatenate(X[K])
return ECG[n]
Algorithm 1describes the de-trending procedure used to stabilize the signal. The algorithm
takes a signal
samples and stabilizes the signal by dividing the signal into small segments
number of segments). First, the algorithm fits each signal segment to a second order polynomial.
After that, the trend of the segment is estimated by providing the time variation of the signal. Then,
the predicted trend, which is a time variation of a second order polynomial, is reduced from the original
signal. After de-trending, the resulting signal was further smoothed using a Gaussian kernel, and it
was made sure that the smoothing procedure preserves the vital information of the wave. Figure 4
illustrates the resulting signals after conducting each step.
3.3. Feature Extraction Methods
This subsection of the study discusses the selected feature extraction methods in detail. The first
section of this subsection explains the PQRST detection algorithm. Then, the second section describes
the process of generating the HRV time series and the HRV analysis that is based on the generated
HRV time series. Moving on, the next section of the feature extraction methods talks about empirical
mode decomposition based features. After that, the third section discusses the with-in beat analysis
technique. Finally, the last section of this subsection presents the novel frequency band based feature
extraction technique used for ECG based feature extraction.
Sensors 2019,19, 4495 9 of 24
Figure 4. Pre-processing algorithm results.
3.3.1. PQRST Detection
As Figure 5shows, the ECG signal pattern is a result of a series of waves associated with the
activities of the heart [
]. The ECG pattern consists of the P wave, QRS complex followed by
that the T wave. Each of these waves corresponds to a specific activity in the heart (repolarization
or depolarization).
To detect PQRST wave positions in an ECG signal, first, a simple algorithm was designed to find
the R peak locations of each QRS complex. Then, the identified R peak locations were used to segment
out QRS complexes from the ECG signal. After that, since the ECG signal has a specific pattern, local
minima and local maxima detection methods were employed to figure out PQST wave locations from
each segmented QRS complex. Figure 6illustrates the PQRST positions discovered using a PQRST
detection algorithm. These computed locations were later employed in HRV analysis and WIB analysis.
In HRV analysis, only R peak locations are used to compute a set of diverse features, whereas all
statistical features of WIB analysis are computed considering dierent peak-to-peak intervals (PR, RS,
QRS, etc.).
Figure 5. PQRST wave locations.
Sensors 2019,19, 4495 10 of 24
Figure 6. PQRST detection algorithm results.
3.3.2. Heart Rate Variability Analysis
Heart rate variability analysis (HRV) is one of the most commonly used methods for ECG feature
extraction [
]. To compute the HRV time series, first, the ECG signal was processed using the PQRST
position detection algorithm. After that, the detected R positions (R peaks) were used to compute the
R–R interval variations of the ECG wave. Finally, the R–R intervals and the corresponding cumulative
sums of R–R intervals were used to compute the interpolated HRV time series. Figure 7shows a
computed HRV time series for a selected subject.
Figure 7. Heart Rate Variability (HRV) time series.
The HRV analysis method can be treated under three headings: time domain analysis, frequency
domain analysis and geometric methods based analysis. Firstly, time domain analysis relates to a
set of statistical features and heart rate variability specific features extracted from HRV time series.
Secondly, the frequency domain analysis refers to a collection of features extracted through high
(0.15 to 0.4 Hz), low frequency
(0.04 to 0.15 Hz) and very low frequency
(0.0033 to
0.04 Hz) bands of HRV time series. Thirdly, geometric based analysis associates with a set of features
obtained from Poincaré geometric plots. Table 2presents an overview of extracted features and their
respective domains.
Table 2. HRV based features.
Type Features
Time sdnn, mn_nn, rmssd, m_nn, nn50, pnn50
Frequency hf, hfnu, lf, lf_hf, lfnu, total_power, vlf
Geometric sd1, sd2
Sensors 2019,19, 4495 11 of 24
The standard deviation of NN intervals (
), the mean value of NN intervals (
), root mean
square of NN intervals (rmssd) and max value of NN (m_nn) intervals are taken as statistical features.
The number of pairs of neighbouring NN intervals varying by more than 50 ms (
50) and the NN50
over the total number of NN intervals (
50) are extracted as additional time-domain features.
Here, NN refers to the Normal-to-Normal interval, and it can be also seen as the R–R interval that
was adopted in this study. Spectral powers of LF (
l f
), VLF (
vl f
) and HF (
h f
), power ratios of LF/HF
l f _h f
), the low-frequency power in normalized units (
l f nu
) and the total power (
) are used
as frequency-domain features. The Poincaré plots transfer the R–R intervals to a dierent geometric
domain and
1 and
2 features are calculated as the geometric deviations between consecutive R–R
intervals. A detailed explanation of each method and respective feature notations can be found in [
3.3.3. With-in Beat Features of the ECG Signal
This section implements the method proposed by Rattanyu and Mizukawa [
] for emotion
recognition from ECG signals using with-in beat features. With-in beat information of ECG signal
includes PR interval, ST interval and QRS interval (i.e., QS). Unlike the HRV method, this method
considers the variation of inner pulses of the ECG signal. The with-in beat method computes five
dierent statistical features from each interval. They are mean (mean), maximum (max), minimum
(min), median (median) and standard deviation (sd).
To compute with-in beat features, first, the ECG signal was sent to the PQRST detection algorithm.
The algorithm returns the locations of identified PQRST positions, and those locations are then used to
compute the considered intervals. Table 3illustrates the computed features and their corresponding
notations. Each feature corresponds to an interval (IN) is in the form of Label (1):
IN = [min_IN,max_IN,sd_IN,mean_IN,median_IN]. (1)
Table 3. With-in beat features.
Interval Features
PR min_pr, max_pr, sd_pr, mean_pr, median_pr
ST min_st, max_st, sd_st, mean_st, median_st
QRS min_qrs, max_qrs, sd_qrs, mean_qrs, median_qrs
3.3.4. Empirical Mode Decomposition Based Features
Empirical mode decomposition decomposes a given signal to a finite number of signals called
intrinsic mode functions (IMF). This study computes four dierent features from each IMF. They
are spectral power of IMF in the time domain, the spectral power of IMF in the frequency domain,
instantaneous frequency of IMF and spectral power of the instantaneous frequency spectrum of the
IMF. Collectively, this EMD based feature extraction method extracts 24 features from ECG signal in
both time domain and frequency domain. Figure 8illustrates the first six IMFs generated from the
EMD procedure with the original ECG wave.
Sensors 2019,19, 4495 12 of 24
Figure 8. The first six IMFs and the base ECG wave.
Spectral power in the time domain (
) of each IMF was estimated using the following
Label (2). (In this equation, x[i]refers to a discrete signal with N samples.)
Power =1
x[i]2. (2)
The Welch’s [
] method was used to compute the spectral power in the frequency domain
spec_p f
), and Hilbert transform was adopted to estimate the instantaneous frequency (
mean_i f
) of the
IMF. The spectral power of the instantaneous frequency spectrum (
spec_p f
) was calculated using (2).
The feature vector computed using IMFi(i[1, 2, . . . , 6]) is in the form of Label (3),
f eaturei="spec_pispec_p fi
mean_i fiins_pi#. (3)
3.3.5. Ten Frequency Band Analysis
A number of studies have explored the use of dierent frequency band based features for emotion
classification [
]. Elaborating on that, this subsection of the study presents ten frequency band
analysis (TFB) for emotion recognition. As shown in Figure 9, the frequency range of the ECG signal
falls between 0–100 Hz. The developed method divides the ECG frequency range into ten dierent
sub-bands having 10 Hz bandwidth for each and computes the spectral power of each sub-band. This
analogy will provide a dierent set of frequency domain features compared to HRV time series based
features and EMD based features.
Figure 9. Spectral power variation.
Sensors 2019,19, 4495 13 of 24
The selection process of ten frequency bands was based on an empirical study on dierent spectral
power bands within the range of 0–100 Hz. This selection criterion does not consider the physiological
aspects of each frequency band, and even the selection of spectral power, as the measure is based on
similar related studies that used spectral power as one of the signal features. First, a smaller analysis
was conducted to find the optimal sub-band value varying from 1–20 Hz, while making sure the
collected number of bands have an adequate number of feature values for classification. Then, the final
result, which showed the highest ensemble based classification accuracy, was chosen as the number of
bands (in this case, 10 bands at 20 Hz each).
First, Welch’s method was used to compute the frequency power spectrum of the ECG signal.
This method computes the frequency power spectrum by splitting the signal into a set of overlapping
segments and taking the average squared magnitude of each frequency component (
. . .
The Welch’s method takes a discrete signal with N samples (
), sampling frequency of the
signal (
), length of a segment (
), overlapping length of a segment (
) and the window function
) as input arguments, and then returns the power spectrum (
) and the frequency distribution
(Fx) of the given signal. Algorithm 2describes an overview of the steps followed in Welch’s method.
Algorithm 2 Welch’s Method
Require: x[n]|n∈ {0, 1, . . . ,N1}
Require: 0lseg N| N
Require: 0lover N| N
− − − − Split the signal x[n]into K segments.
− − − − X[K]matrix f or all segments (K×(lseg ))
X[K][ ][ ]
X[K] = split(x[n],lseg ,lover){ ∀k[1, . . . ,K]}
− − − − Convolve each segment xk[t]with w[n]
− − − − Compute the FFT o f xk[n]
− − − − F(jω)matrix f or f requency power (K×(lseg/2))
F(jω)[ ][ ]
for xk[n]X[K]do
(xkw)[n] = convolve(xk[n],w[n])
F(jωk) = FFT(xk[n])
end for
− − − − Compute f requency values
Fx=0:(fs/2)×1/(lseg /2):fs/2
− − − − For each f requency compute squared magnitude
Px[f][ ]
for f[0, . . . ,fs/2]do
for fk[i]F(jw)do
Px[f]+ = fk[f]2
end for
end for
− − − − Compute average values
for pfPx[f]do
Px[f] = pf/K
end for
return Px[f]
A more optimized version of the Welch’s method in Scipy API [
] was used to compute the
frequency power spectrum of the ECG signal by setting
as 256,
as 128 and
as the
Hanning window.
After computing the frequency power spectrum of the ECG wave, a 0–100 Hz frequency range of
the ECG frequency spectrum was filtered out and divided into ten sub-bands. Then, the trapezoidal
Sensors 2019,19, 4495 14 of 24
integration was used to compute the spectral power of each sub-band. The derived ten spectral power
features can be expressed as following Label (4):
f eatures ={bandi|i[1, 2, . . . , 10]}. (4)
3.4. Emotion Recognition Model
This section describes the machine learning process that is used to develop a machine learning
model for identifying four major emotions: anger, sadness, pleasure, joy. The pre-processed data
contains 488 data frames and 63 features for each data frame. The machine learning procedure of this
study is divided into two parts: the (1) Ensemble learner based machine learning process, and the (2)
Ensemble learner and feature selection based machine learning process.
The first strategy of this analysis is based on the traditional way of ensemble learning where the
ensemble learner chose the features and dynamically derives a set of diverse learners. The second
technique is inspired by recent studies related to feature selection before ensemble learning [
Both of the strategies mentioned employ six popular ensemble learning methods that cover most
of the modern bagging and boosting techniques. Furthermore, adopted feature selection methods
represent a set of diverse techniques for machine learning feature selection (statistical, search based
and algorithmic).
Adopted Ensemble learners
Random Forest Classifier
Extra Tree Classifier
Gradient Boost Classifier
ADABoost Classifier with Support Vector Machine (SVM)
ADABoost Classifier with Decision Tree
ADABoost Classifier with Naive Bayes
Adopted feature selection methods
Recursive Feature Elimination
Chi-Square Test
P Test
Feature Selection by Random Forest Classifier
Feature Selection by Extra Tree Classifier
Feature Selection by Random SVM
Each of these procedures is followed by a model parameter optimization process and a model
evaluation process. The Grid Search algorithm [
] was used as the model parameter optimizer and
traditional 10-Fold Cross-validation was used to evaluate the model. Prior to the machine learning
process, the data was normalized by a Robust scalar, and all of the algorithms used in this section were
taken from Python Scikit API [50].
4. Results and Discussion
The results section of this paper will elaborate on four Sections. Section 4.1 results from ensemble
methods without feature selection, Section 4.2 results from ensemble methods with feature selection,
Section 4.3 results overview and Section 4.4 computational requirement analysis.
Section 4.4 computational requirement analysis for combined features. As explained, the first
two sections of the results and discussion will present the data gathered from ensemble learning and
prior feature selection. Furthermore, this section will investigate whether feature selection is a worthy
step for ensemble learning algorithms. Moving on, the results overview section compares the final
results with dierent classification models in the literature by discussing emotion elicitation methods,
Sensors 2019,19, 4495 15 of 24
experimental procedures and limitations. The final sector describes the computational requirements of
each adopted feature extraction method, and then provides reasons for selecting combined analysis
with ensemble learning as an optimal method for ECG based emotion recognition. It should be
noted that the results mentioned in this section is for recognizing four major emotions in the 2D
emotional model.
4.1. Ensemble Learning
Table 4illustrates the results obtained for dierent ensemble learning techniques while presenting
model parameter optimization results obtained using the Grid Search Algorithm. According to the
results, an Extra Tree Classifier shows the highest prediction capability, which is 70.09% with a standard
deviation of 3.34%. Furthermore, the ensemble model developed using the Random Forest classifier
has the second-highest classification accuracy, and this value is slightly lower than the capability of an
Extra Tree Classifier. Examining other techniques, the Gradient Boost classifier also shows adequate
performance for emotion classification. However, the ADABoost ensemble with dierent base learners
shows relatively lower prediction accuracies.
Table 4. Model evaluation results.
Classifier Optimal Parameters Accuracy
Random Forest max_features: 2, n_estimators: 81, max_depth: 11
Extra Tree Classifier max_features: 6, n_estimators: 71, max_depth: 41
Gradient Boost Classifier
n_estimators: 81, loss: deviance, learning_rate: 0.2
ADABoost Classifier with SVM n_estimators: 1, base_estimator: SVM(C=1.0,
degree=3, gamma=’auto’, kernel=’rbf’)
ADABoost Classifier with
Decision Tree
n_estimators: 40, learning_rate: 1.0, algorithm:
ADABoost Classifier with Naive
n_estimators: 12, learning_rate: 1.3, algorithm:
maximum number of features considered while splitting,
number of models
(learners) in the ensemble,
maximum depth of a tree in ensemble,
the estimator
which is used to build the ensemble,
SAMME discrete boosting algorithm,
degree of the
kernel, kernel: kernel type, C: penalty parameter.
4.2. Ensemble Learning with Feature Selection
Table 5illustrates the classification accuracies after selecting features employing dierent feature
selection techniques. This table only presents the accuracies gained from the three best performing
models observed in the previous section. In general, dierent models show dierent classification
performances while undergoing diverse feature selection methods. For instance, Random Forest
Classifier shows the highest accuracy for Model based feature selection, whereas a Gradient Boost
classifier provides better results for the Recursive Feature Elimination technique. Examining all results,
it is apparent that the model selection procedure improved the individual accuracy from a significant
value. As an example, it raised the accuracy of the Extra Tree ensemble from 70.09% to 80.00%.
Furthermore, Recursive Feature Elimination and the Feature Selection by Model methods provide
better results compared to the chi-square test with a chi statistic greater than 2.0. To summarise the
results, the best performing ensemble learner for four major emotions classification is the Extra Tree
Classifier with the selected features listed in Table 6. These features are selected using the Model based
feature selection approach by providing the Extra Tree Classifier as the feature selector.
Sensors 2019,19, 4495 16 of 24
Table 5. Ensemble methods with feature selection.
Method RF ETC GB
RFE(number of features: 10) 64.34% (3.65%) 73.03% (3.35%) 68.97% (3.40%)
RFE(number of features: 15) 67.93% (2.64%) 66.13% (3.19%) 67.90% (4.47%)
RFE(number of features: 20) 66.92% (3.71%) 73.91% (2.61%) 69.21% (2.23%)
RFE(number of features: 25) 68.77% (3.34%) 74.00% (3.28%) 72.66% (3.36%)
RFE(number of features: 30) 65.60% (3.37%) 72.12% (3.44%) 65.11% (2.46%)
RFE(number of features: 35) 66.99% (5.12%) 73.00% (2.22%) 64.06% (4.56%)
RFE(number of features: 40) 63.71% (4.43%) 71.96% (3.87%) 61.12% (3.33%)
Chi-Squared statistic(>0.1) 67.51% (4.38%) 76.30% (3.84%) 65.06% (3.18%)
Chi-Squared statistic(>0.5) 59.69% (2.82%) 72.79% (3.23%) 64.77% (3.77%)
Chi-Squared statistic(>1.0) 63.47% (3.48%) 70.19% (2.55%) 63.66% (3.36%)
Chi-Squared statistic(>2.0) 52.80% (3.77%) 54.19% (2.34%) 54.44% (3.67%)
P-Test value (>0.1) 67.22% (4.33%) 76.22% (2.25%) 67.16% (3.47%)
P-Test value (>0.5) 67.76% (3.68%) 75.19% (4.04%) 68.00% (2.34%)
P-Test value (>0.8) 65.49% (3.36%) 70.21% (2.43%) 69.99% (3.37%)
Model based(model: RT) 75.35% (4.18%) 76.14% (4.04%) 71.62%(3.84%)
Model based(model: ETC) 79.23% (3.53%) 80.00% (4.27%) 71.40% (4.12%)
Model based(model: NB) 72.21% (3.87%) 77.13% (2.23%) 70.12%(4.22%)
Random Forest,
Extra Tree Classifier,
Gradient Boost Classifier,
Recursive Feature
Elimination, NB: Naive Bayes.
Table 6. Feature selection results.
Method N D Features
24 Tspec_p_1 ,spec_p_2, spec_p_3,spec_p_4,spec_p_1, spec_p_6, ins_p_1,
ins_p_2, ins_p_3, ins_p_4, ins_p_5, ins_p_6
Fspec_pf_1, mean_if_1,spec_pf_2, mean_if_2, spec_pf_3, mean_if_3,
spec_pf_4, mean_if_4, spec_pf_5, mean_if_5, spec_pf_6, mean_if_6
14 Tsdnn,mn_nn, pnn50, m_nn, rmssd, nn50
Flf, nu, lf_hf, total_power, hfnu, vlf
Gsd1, sd2
15 Tmedian_pr, mean_pr, max_pr,sd_pr, min_pr, median_qrs, mean_qrs,
max_qrs,min_qrs, sd_qrs, median_st, max_st, min_st, mean_st, sd_st
10 F
, band_4,
, band_8, band_9,
=Time Domain,
=Frequency Domain;
=selected features as a fraction of
total features extracted from the method.
According to the results presented in Table 6, most of the features from the TBF method got chosen
as optimal features for emotion recognition. However, compared to the TFB method based features,
EMD and HRV based features show less capability for emotion recognition. Most of the HRV features
that were selected in the analysis contain statistical features of R–R interval variations. These features
are quite similar to with-in beat analysis based features introduced in the literature that also have good
capability (nearly one-third of them got selected). The only dierence is that, compared to outer beat
interval based statistical features used in HRV analysis, WIB analysis computes statistical features
of the inner beat intervals of the ECG signal. Moreover, this further supports the fact that raw ECG
Sensors 2019,19, 4495 17 of 24
patterns based features are the most ecient features for emotion recognition compared to dierent
analysis based features.
Table 7illustrate the results gathered from training dierent models with adopted emotions.
is the principal model of this study and it has the ability to recognize four major emotions
in the discrete emotion model with up to 80% accuracy. As mentioned, the experiment procedure
also collected some additional emotions to prove the eectiveness of ensemble learning, and also for
benchmarking aspects. Those models and their capabilities are also listed in the table, and the following
paragraphs will compare those classification accuracies with the literature. Additionally, the table also
depicts the individual gains of the Ten Frequency Band (TFB) analysis method introduced in this study.
Even though those features do not comprise a direct physiological aspects of human emotions, those
features tend to have a better performance compared to others. Therefore, it should be noted that more
investigations should be conducted to evaluate the physiological aspects of those features.
Table 7. Benchmark models.
Model Emotions Accuracy(TFB) Accuracy
A* anger, sadness, joy, pleasure
75.94%(4.11%) 80.00%(4.27%)
B anger, sadness, joy, pleasure, fear
72.86%(3.47%) 77.25%(3.14%)
C four emotion quadrants
72.13%(3.26%) 78.12%(4.32%)
D anger, sadness, joy, pleasure, fear, neutral
70.63%(3.77%) 75.11%(3.77%)
As Table 7shows, there is a significant accuracy improvement due to combining selected feature
extraction methods with an ensemble learning process. The accuracy for identifying four major
emotions is 80.00%, and this value is a significantly better result compared to the literature.
The method developed by Kim and André [
] combined four dierent sensors for detecting four
primal emotions in the emotional spectrum and came up with an accuracy of 70%. However, the findings
in this study provide insight into using a single sensor for developing the same classifier with an
accuracy of 80.00%. Another investigation that was conducted to develop an emotion classification
model by a neural network by Yoo et al. [
] achieved a recognition ability of 80% for identifying the
four emotion quadrants using ECG and skin resistance. They have considered six subjects for their
investigation, and the bio-signals were captured at dierent times of the day for a week. However,
the classifier still presented in this investigation includes ECG patterns from 22 dierent subjects
with a slightly lower accuracy of 78.12%. Furthermore, this study holds higher emotion recognition
accuracy compared to the investigation by Maaoui et al. [
]. In their research, they have developed
an emotion classifier to identify amusement, contentment, disgust, fear, neutral, and sadness from
five biosensors. The accuracy of the developed model was 46.5%, and the method proposed in this
investigation can still identify six emotions with up to 75.11% accuracy. Furthermore, the accuracy
gained from this investigation outperforms several other studies that investigated multi-sensor based
emotion recognition methods [52,53].
In another study, Murugappan et al. [
] developed an emotion recognition model for detecting
five emotions (disgust, sad, joy, fear, neutral) which had an accuracy of 66.48%. However, they
had 20 subjects for the experiment and some of the features that they considered include wavelet
transformation based features. However, the method developed in this investigation has higher
accuracy of 77.25% with a larger amount of subject space. The emotion recognition method developed
utilizing with-in beat based features by Rattanyu and Mizukawa [
] had an emotion recognition
accuracy of 61.44% for detecting six emotions, namely: anger; fear; sadness; joy; neutral; and digest.
They had a smaller subject space compared to this study and the only dierence between their model
and the model C produced in this investigation is the digest emotion. The digest emotion falls into
the same emotion quadrant of sadness, and, on the other hand, the pleasure emotion is in a dierent
quadrant of the emotion model. The model developed in this study that replaced the digest emotion
Sensors 2019,19, 4495 18 of 24
by pleasure had a recognition capability of 75.11% involving 22 subjects. Examining recent studies,
the emotion recognition model developed by Guo et al. [
] has the capability to recognize anger,
sadness, fear, joy and relax with up to 56.9% accuracy. In their investigation, they have used HRV
analysis for the feature extraction and SVM was used as the machine learning model. By definition,
the relaxed emotion in their study can be seen as the pleasure emotion adopted in this study, and,
by considering that, their developed model has similarities to the model B developed in this study.
Nevertheless, the model developed in this research outperforms their model by a 20% accuracy gain.
Furthermore, the accuracy gained from combining all features from dierent domains has higher
accuracies compared to emotion recognition studies that investigated the use of EMD based feature
extraction methods [7,10].
Considering everything, it is apparent that the results obtained in this study outperform most
of the methods mentioned in the literature. The next section describes additional perspectives of the
proposed methodology such as emotion data collection methods, experiment procedures and accuracy
gains of the TFB method introduced.
4.3. Results Overview
Table 8compares the final results with the models developed in the literature. According to the
data shown in the table, the combined analysis outperforms all ECG signal based emotion recognition
models and a majority of models that use multiple biosensors for recognising emotions.
Table 8. Comparison with the literature.
Study Adopted Emotions E-EM MS N Acc Gain(%)
Kim et al. (2004) [52]sadness, anger,
stress, surprise A3
61.8% +14.14 +18.20
Yoo et al. (2005) [28]
sadness, calm pleasure,
interesting pleasure, fear
(four quadrants)
V36 80% -7.87 -0.21
Rigas et al. (2007) [53] joy, disgust, fear P 39 62.7%
Kim and André (2008) [
anger, sadness, pleasure, joy A 33 70% +5.94 +10.00
Maaoui et al. (2010) [51]
amusement, contentment,
disgust, fear,
sadness, neutral
I310 46.5% +24.13 +28.61
Rattanyu and
Mizukawa (2011) [9]
anger, fear, sadness, joy,
digest, neutral P712 61.44% +9.19 +8.67
Jeritta et al. (2012) [7]neutral, happiness, sadness,
fear, surprise, disgust V715 59.78% +10.85 +15.33
Murugappan et al. (2013)
digest, sadness, fear,
joy, neutral V720 66.48% +6.38 +10.77
Jerritta et al. (2014) [10]neutral, happiness, sadness,
fear, surprise, disgust V730 54% +16.63 +21.11
Guo et al. (2016) [54]sadness, angry, fear,
happy, relaxed V725 56.9% +15.96 +20.35
=emotion elicited method,
=number of subjects,
=Multiple Sensors including ECG sensor,
=ECG signal Feature Extraction method,
=Improvement after combining.
V: Video and Acc: Accuracy.
Compared to emotion recognition models that employ multiple biosensors, the frequency spectrum
analysis technique introduced in this study has an accuracy enhancement of up to 24.13%. Furthermore,
in contrast with the best performing ECG based emotion recognition model in the literature [
the introduced ensemble model has an accuracy gain of 6.38%. Therefore, considering the accuracy
Sensors 2019,19, 4495 19 of 24
improvement of the TFB method, it is apparent that the TFB method itself is an optimal method for
emotion recognition using ECG signals.
Considering the combined analysis results, combining other broadly used techniques with the
TFB method introduced has improved the prediction accuracy from a significant value. For instance,
after incorporating other analysis based features with the TFB based model, the accuracy of the six
emotion recognition model has improved by 4.48% (see Table 7). Furthermore, compared to the best
ECG based emotion recognition model mentioned in the literature, the model developed by combining
all four methods has improved the accuracy by 10.77%. Additionally, in contrast to the best performing
multiple biosensor based emotion recognition model [
], the combined analysis based model has a
similar accuracy with a significantly larger subject space.
As shown in Table 8, studies in the literature have used dierent methods to elicit emotions in
their experimental procedure. The majority of them have used audio or video based methods to extract
emotions, and most of them have been able to achieve high emotion recognition accuracy the same as
in this study, which used video clips as the emotion elicitation method. It should be noted that, unlike
picture based methods, these types of arrangements should use a sophisticated procedure to filter the
emotion elicited climaxes from ECG signal space. Moreover, these videos should be picked carefully
with the help of domain experts. Therefore, it is safe to say that the emotion-related data filtering
protocol followed in this investigation is a reasonably fair approach for obtaining ECG signal based
emotion climaxes recorded in the data (signal space). Regarding the number of subjects involved in an
experiment, most of the single biosensor based studies have adopted a reasonably higher number of
subjects, whereas multiple biosensor based studies have conducted experiments on a smaller number
of subjects. However, as mentioned, the number of subjects involved in an experiment has a direct
impact on the accuracy of the model. Therefore, the subject count in this analysis, which is 22, is a fairly
higher value compared to other studies, and the model produced in this research still outperforms
most of them.
Collectively, the ensemble learning method based model developed in this study holds a higher
capability compared to other studies. Firstly, the combined analysis model comprises ECG emotional
data from 22 subjects within the age range of 22–26. Secondly, the developed model is able to identify
the emotion of a person from a 20 s ECG wave. Thirdly, the model uses a single biosensor for
recognising emotions.
4.4. Computational Requirement Analysis
Table 9illustrates the computational times and time and space complexities of the selected
algorithms (
indicates the number of samples in the signal). According to the table, the EMD method
takes 95.97% of the feature extraction time in the combined analysis.
Table 9. Computational requirements.
Method Computation
HRV 0.0016 (0.69%) O(N)O(Nlog N)
EMD 0.2216 (95.97%) O(N)O(Nlog N)
WIB 0.0004 (0.17%) O(1)O(N)
TFB 0.0023 (0.99%) O(N)O(Nlog N)
Combined 0.2309 O(N)O(Nlog N)
Furthermore, comparing the additional space complexity added by the employed techniques,
most of them used
additional space while computing. In addition, feature extraction methods
considered compute time domain and frequency domain features, and the computation time complexity
of majority of methods has
O(Nlog N)
. Since the machine learning model takes a 20 s ECG wave,
Sensors 2019,19, 4495 20 of 24
the combined computation time (0.23 s) will not aect the real-time nature of the system. Despite that,
this will raise the model prediction accuracy from a significant amount.
Other than considering computational requirements for the feature extraction methods, while
implementing real-life devices, several aspects related to model prediction complexity and the
transmission time should be also concerned. In general, the prediction complexity of an Ensemble
tree based classifier is
, where
is the number of features and
is the number of trees
in the ensemble. Furthermore, the transmission time of the signal solely depends on the method itself
(i.e., is wired or wireless). The method used in this study, which is wired communication, might not be
ecient in real scenarios. However, there has been an extensive amount of research going on in the
domain of wearable computing that can be adopted to develop real-life applications [55,56].
5. Conclusions
The initial objective of this research was to evaluate the capability of ensemble learners for human
emotion classification which needed an improved classification accuracy. According to the results
presented, the combined features and the selected ensemble learners provide better performance
compared to single learner models presented in the literature. Furthermore, the results presented in
the feature selection based machine learning process proves that feature selection is a worthy step even
for ensemble learners that rely on diversity.
Even though this study is not a review on ECG based emotion recognition, the results overview
section provides an extensive review on ECG based feature extraction methods, emotion elicitation
methods, experiment procedures and the evolution of ECG based human emotion recognition.
Findings from this research make several contributions to the current literature. Firstly, this
research introduces the TFB analysis, which is a simple but ecient way for ECG based feature
extraction, and the individual method has 6.38% accuracy gain compared to the best performing
model in the literature. Even though the selected method does not have a physiological implication,
the extracted features tend to have better capability in terms of signal based features. Secondly, findings
from this research confirm that the ECG signal processing is an ecient method for bio-signal based
emotion recognition. Furthermore, feature selection results of this analysis provide insight into the
capability of the raw ECG signal pattern based features compared to dierent analysis based features.
As the final contribution, this research provides empirical evidence on whether the feature selection
prior to ensemble learning is an appropriate step or not. Taken together, models derived from selected
features outperform all ECG signal based emotion recognition models mentioned in the literature with
a classification accuracy gain of up to 28.61%.
6. Future Work
Further research needs to be done on emotion recognition for dierent age ranges. In addition,
more research is also required to explore the capability of neural network based emotion recognition
because neural networks features are engineered by themselves compared to traditional machine
learning models which require pre-extracted features.
As mentioned, emotional intelligence can be applied to various situations where the interaction
between the human and the machine (maybe a computer or a smartphone) needed to be improved
and personalized. The analysis conducted in this study is based on a wired wearable device that has
the ability to transfer data at a higher rate. However, this wired setup might not be feasible for real
situations, and users might feel uncomfortable wearing these kinds of devices. However, recently,
research has found ways of transmitting ECG data wirelessly by wearable devices [
]. Some of
these designs can even be integrated into clothes themselves by using techniques such as capacitively
coupled ECG [
]. Therefore, given that the computational complexity is low, as a real-life
application, this model can be used even in a smartphone device.
Author Contributions:
T.D. proposed the approach to use ensemble learners to enhance the classification
performance. T.D. also suggested using feature selection as an additional strategy for improving the ensemble
Sensors 2019,19, 4495 21 of 24
learner classification accuracy. T.D. developed all the feature extraction methods and proposed the novel TFB
method. T.D. did the complete machine learning analysis of the research. T.D. wrote the full article and rendered
all the figures. T.D. and Y.R. designed and conducted the experiment for ECG data collection. Both R.R. and I.N.
provided assistance on this work. All authors approved the final draft.
We would like to thank Cambio Software Engineering for establishing the Cambio Wearable
Computing Lab in the department where we obtained our ECG signal capturing device. Additionally, we would
like to thank Dr. Dhammika Elkaduwe and Dr. Kamalanath Samarakoon for providing us with the facilities to
conduct the experiment.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
Cowie, R.; Douglas-Cowie, E.; Taylor, J.; Ioannou, S.; Wallace, M.; Kollias, S. An Intelligent System for Facial
Emotion Recognition. In Proceedings of the 2005 IEEE International Conference on Multimedia and Expo.
Amsterdam, Netherlands, 6 July 2005; pp. 904–907. [CrossRef]
Tu, C.-T.; Lien, J.J.J. Automatic Location of Facial Feature Points and Synthesis of Facial Sketches Using
Direct Combined Model. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2010,40, 1158–1169. [CrossRef]
Lee, C.M..; Narayanan, S. Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process.
2005,13, 293–303. [CrossRef]
Cook, N.; Fujisawa, T.; Takami, K. Evaluation of the aective valence of speech using pitch substructure.
IEEE Trans. Audio Speech Lang. Process. 2006,14, 142–151. [CrossRef]
Parsons, T.D.; Reinebold, J.L. Adaptive virtual environments for neuropsychological assessment in serious
games. IEEE Trans. Consum. Electron. 2012,58, 197–204. [CrossRef]
Tokuno, S.; Tsumatori, G.; Shono, S.; Takei, E.; Yamamoto, T.; Suzuki, G.; Mituyoshi, S.; Shimura, M. Usage of
emotion recognition in military health care. In Proceedings of the 2011 Defense Science Research Conference
and Expo (DSR), Singapore, 3–5 August 2011; pp. 1–5. [CrossRef]
Jerritta, S.; Murugappan, M.; Wan, K.; Yaacob, S. Emotion recognition from electrocardiogram signals
using Hilbert Huang Transform. In Proceedings of the 2012 IEEE Conference on Sustainable Utilization
and Development in Engineering and Technology, STUDENT 2012 - Conference Booklet, Kuala Lumpur,
Malaysia, 6–9 October 2012. [CrossRef]
Kim, J.; André, E. Emotion recognition based on physiological changes in music listening. IEEE Trans.
Pattern Anal. Mach. Intell. 2008,12, 2067–2083. [CrossRef]
Rattanyu, K.; Mizukawa, M. Emotion recognition based on ecg signals for service robots in the intelligent
space during daily life. J. Adv. Comput. Intell. Intell. Inf. 2011, 15, 582-591. [CrossRef]
Jerritta, S.; Murugappan, M.; Wan, K.; Yaacob, S. Electrocardiogram-based emotion recognition system using
empirical mode decomposition and discrete Fourier transform. Expert Syst. 2014, 31, 110–120. [CrossRef]
Bexton, R.S.; Vallin, H.O.; Camm, A.J. Diurnal variation of the QT interval–influence of the autonomic
nervous system. Br. Heart J. 1986,55, 253–258. [CrossRef]
Ayata, D.; Yaslan, Y.; Kamasak, M. Emotion recognition via random forest and galvanic skin response:
Comparison of time based feature sets, window sizes and wavelet approaches. In Proceedings of the
2016 Medical Technologies National Congress (TIPTEKNO), Antalya, Turkey, 27–29 October 2016; pp. 1–4.
Opitz, D.W. Feature Selection for Ensembles. In Proceedings of the Sixteenth National Conference on Artificial
Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of
Artificial Intelligence; AAAI ’99/IAAI ’99; American Association for Artificial Intelligence: Menlo Park, CA,
USA, 1999; pp. 379–384.
Khoshgoftaar, T.M.; Gao, K.; Napolitano, A. Improving software quality estimation by combining feature
selection strategies with sampled ensemble learning. In Proceedings of the 2014 IEEE 15th International
Conference on Information Reuse and Integration (IEEE IRI 2014), Redwood City, CA, USA, 13–15 August
2014; pp. 428–433. [CrossRef]
Maglaras, L.A.; Jiang, J.; Cruz, T.J. Combining ensemble methods and social network metrics for improving
accuracy of OCSVM on intrusion detection in SCADA systems. J. Inf. Secur. Appl.
,30, 15–26. [CrossRef]
Sensors 2019,19, 4495 22 of 24
Mahdavi-Shahri, A.; Houshmand, M.; Yaghoobi, M.; Jalali, M. Applying an ensemble learning method for
improving multi-label classification performance. In Proceedings of the 2016 2nd International Conference
of Signal Processing and Intelligent Systems (ICSPIS), Tehran, Iran, 14–15 December 2016; pp. 1–6.
Hosseini, M.P.; Hajisami, A.; Pompili, D. Real-Time Epileptic Seizure Detection from EEG Signals via Random
Subspace Ensemble Learning. In Proceedings of the 2016 IEEE International Conference on Autonomic
Computing (ICAC), Wurzburg, Germany, 17–22 July 2016; pp. 209–218.
Jin, L.p.; Dong, J. Ensemble Deep Learning for Biomedical Time Series Classification. Comput. Intell. Neurosci.
2016,2016, 1–13. [CrossRef]
Pujari, P.; Gupta, J.B. Improving Classification Accuracy by Using Feature Selection and Ensemble Model.
Int. J. Soft Comput. Eng.2012,2, 380–386.
Gao, K.; Khoshgoftaar, T.; Wald, R. Combining feature selection and ensemble learning for software
quality estimation. In Proceedings of the 27th International Florida Artificial Intelligence Research Society
Conference, Palo Alto, CA, USA, 21–23 May 2014; pp. 47–52.
Christopher, B.; Narayan, D. Biofeedback: A Player’s Anxiety as Input into a Video Game Environment.
In Proceedings of the AASRI International Conference on Industrial Electronics and Applications (2015); Atlantis
Press: Paris, France, 2015. [CrossRef]
Pejman, M.b.; Sebastian, L.; Emma, F. Understanding the Contribution of Biometrics to Games User Research.
In Proceedings of the 2011 DiGRA International Conference: Think Design Play, Hilversum, The Netherlands,
14–17 September 2011.
Katsis, C.; Katertsidis, N.; Ganiatsas, G.; Fotiadis, D. Toward Emotion Recognition in Car-Racing Drivers:
A Biosignal Processing Approach. IEEE Trans. Syst. Man Cybern. Part A Syst. Humans
,38, 502–512.
Eyben, F.; Wöllmer, M.; Poitschke, T.; Schuller, B.; Blaschke, C.; Färber, B.; Nguyen-Thien, N. Emotion on the
Road—Necessity, Acceptance, and Feasibility of Aective Computing in the Car. Adv. Hum.-Comput. Interact.
2010,2010, 1–17. [CrossRef]
Lisetti, C.; Nasoz, F.; LeRouge, C.; Ozyer, O.; Alvarez, K. Developing multimodal intelligent aective
interfaces for tele-home health care. Int. J. Hum.-Comput. Stud. 2003,59, 245–255. [CrossRef]
Olfson, M.; Gilbert, T.; Weissman, M.; Blacklow, R.S.; Broadhead, W. Recognition of emotional distress in
physically healthy primary care patients who perceive poor physical health. Gen. Hosp. Psychiatry
17, 173–180. [CrossRef]
Paithane, A.N.; Bormane, D.S.; Tahawade, R.S.C.O.E. Human Emotion Recognition using Electrocardiogram
Signals Int. J. Recent Innov. Trends Comput. Commun. 2014,2, 194–197
Yoo, S.K.; Lee, C.K.; Park, Y.J.; Kim, N.H.; Lee, B.C.; Jeong, K.S. Neural Network Based Emotion Estimation
Using Heart Rate Variability and Skin Resistance; Springer: Berlin/Heidelberg, Germany, 2005._110. [CrossRef]
Ayata, D.; Yaslan, Y.; Kama¸sak, M. Emotion Recognition via Galvanic Skin Response: Comparison of
Machine Learning Algorithms and Feature Extraction Methods. Istanb. Univ. J. Electr. Electron. Eng.
17, 3147–3156.
Nasoz, F.; Alvarez, K.; Lisetti, C.L.; Finkelstein, N. Emotion recognition from physiological signals using
wireless sensors for presence technologies. Cogn. Technol. Work 2004,6, 4–14. [CrossRef]
Gouizi, K.; Bereksi Reguig, F.; Maaoui, C. Emotion recognition from physiological signals. J. Med Eng. Technol.
2011,35, 300–307. [CrossRef]
Jerritta, S.; Murugappan, M.; Nagarajan, R.; Wan, K. Physiological signals based human emotion Recognition:
A review. In Proceedings of the 2011 IEEE 7th International Colloquium on Signal Processing and its
Applications, Penang, Malaysia, 4–6 March 2011; pp. 410–415. [CrossRef]
Egger, M.; Ley, M. Emotion Recognition from Physiological Signal Analysis: A Review. Electron. Notes Theor.
Comput. Sci. 2019,343, 35–55. [CrossRef]
Murugappan, M.; Murugappan, S.; Zheng, B.S. Frequency Band Analysis of Electrocardiogram (ECG)
Signals for Human Emotional State Classification Using Discrete Wavelet Transform (DWT). J. Phy. Ther. Sci.
2013,25, 753–759. [CrossRef] [PubMed]
35. Xu, Y.; Liu, G.; Hao, M.; Wen, W.; Huang, X. Analysis of aective ECG signals toward emotion recognition.
J. Electron. (China) 2010,27, 8–14. [CrossRef]
Sensors 2019,19, 4495 23 of 24
36. Nemati, E.; Deen, M.J.; Mondal, T. A Wireless Wearable ECG Sensor for Long-Term Applications Studying
heart failure and adverse outcomes in paediatric heart disorders View project nanomaterials View project
IEEE Commun. Mag. 2012,50, 36–43. [CrossRef]
Heart and Brain SpikerShield Bundle. Available online:
heartandbrainspikershieldbundle (accessed on 9 October 2019).
Ferdinando, H.; Seppanen, T.; Alasaarela, E. Comparing features from ECG pattern and HRV analysis for
emotion recognition system. In Proceedings of the 2016 IEEE Conference on Computational Intelligence
in Bioinformatics and Computational Biology (CIBCB), Chiang Mai, Thailand, 5–7 October 2016; pp. 1–6.
Bernardo, A.F.B.; Vanderlei, L.C.M.; Garner, D.M. HRV Analysis: A Clinical and Diagnostic Tool in Chronic
Obstructive Pulmonary Disease. Int. Sch. Res. Not. 2014,2014, 1–6. [CrossRef] [PubMed]
Izard, C.E.; Libero, D.Z.; Putnam, P.; Haynes, O.M. Stability of emotion experiences and their relations to
traits of personality. J. Personal. Soc. Psychol. 1993,64, 847–860. [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H.
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series
analysis. Proc. R. Soc. London. Ser. A: Math. Phys. Eng. Sci. 1998,454, 903–995. [CrossRef]
H., S.; Mohanty, S.; Kishor, N.; Singh, D. Comparison of Empirical Mode Decomposition and Wavelet
Transform for Power Quality Assessment in FPGA. In Proceedings of the 2018 IEEE International Conference
on Power Electronics, Drives and Energy Systems (PEDES), Chennai, India, 18–21 December 2018; pp. 1–6.
Agrafioti, F.; Hatzinakos, D.; Anderson, A.K. ECG Pattern Analysis for Emotion Detection. IEEE Trans.
Aect. Comput. 2012,3, 102–115. [CrossRef]
Emotion Recognition based on Heart Rate and Skin Conductance. In Proceedings of the 2nd International
Conference on Physiological Computing Systems, SCITEPRESS - Science and and Technology Publications,
Angers, France, 9 September 2014; pp. 26–32. [CrossRef]
45. Russell, J.A. Aective space is bipolar. J. Personal. Soc. Psychol. 1979,37, 345–356. [CrossRef]
46. Gross, J.J.; Levenson, R.W. Emotion elicitation using films. Cogn. Emot. 1995,9, 87–108. [CrossRef]
47. Ashley, E.A.; Niebauer, J. Conquering the ECG; Remedica: London, UK, 2004.
Welch, P. The use of fast Fourier transform for the estimation of power spectra: A method based on time
averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 1967,15, 70–73. [CrossRef]
Jones, E.; Oliphant, T.; Peterson, P. SciPy : Open source scientific tools for Python Available online: on 9 October 2019)
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer,
P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
12, 2825–2830.
Maaoui, C.; Pruski, A. Emotion Recognition through Physiological Signals for Human-Machine
Communication. In Cutting Edge Robotics 2010; InTech: London, UK, 2010. [CrossRef]
Kim, K.H.; Bang, S.W.; Kim, S.R. Emotion recognition system using short-term monitoring of physiological
signals. Med. Biol. Eng. Comput. 2004,42, 419–427. [CrossRef] [PubMed]
Rigas, G.; Katsis, C.D.; Ganiatsas, G.; Fotiadis, D.I. A User Independent, Biosignal Based, Emotion Recognition
Method. In User Modeling 2007; Springer: Berlin/Heidelberg, Germany; 2007; pp. 314–318._36. [CrossRef]
Guo, H.W.; Huang, Y.S.; Lin, C.H.; Chien, J.C.; Haraikawa, K.; Shieh, J.S. Heart Rate Variability Signal
Features for Emotion Recognition by Using Principal Component Analysis and Support Vectors Machine.
In Proceedings of the 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE),
Taichung, Taiwan, 31 October–2 Novomber 2016; pp. 274–277. [CrossRef]
Park, C.; Chou, P.H.; Bai, Y.; Matthews, R.; Hibbs, A. An ultra-wearable, wireless, low power ECG monitoring
system. In Proceedings of the 2006 IEEE Biomedical Circuits and Systems Conference, London, UK,
29 November–1 December 2006; pp. 241–244. [CrossRef]
Sensors 2019,19, 4495 24 of 24
Van Den Broek, E.L.; Schut, M.H.; Westerink, J.H.D.M.; Tuinenbreijer, K. Unobtrusive Sensing of Emotions
(USE). J. Ambient Intell. Smart Environ. 2009,1, 287–299. [CrossRef]
Yama, Y.; Ueno, A.; Uchikawa, Y. Development of a Wireless Capacitive Sensor for Ambulatory ECG
Monitoring over Clothes. In Proceedings of the 2007 29th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; Volume 2007; pp. 5727–5730.
2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (
... However, older studies focusing on linear and quadratic discriminant analysis (LDA, QDA) [1,5,26] and support vector machines (SVM) [12] remain highly relevant, achieving high accuracy for their respective classifications. Combinations of ML classifiers forming ensembles have demonstrated potential for binary classifications in emotion detection [6]. In comparison to other studies, [31] achieved the highest accuracy for multiple emotion detection from ECG data utilising a CNN and reported setting the new state of the art for ECG emotion detection. ...
... Certain windows may include multiple emotive annotations; hence to identify the most pertinent emotion, the mean of all annotation values per window is calculated and rounded to the nearest annotation (1-4) using Euclidean distance. Alternative approaches [6] omit these windows and the neighbouring segments to prevent overlap. ...
Conference Paper
Psychophysiology investigates the causal relationship of physiological changes resulting from psychological states. There are significant challenges with machine learning-based momentary assessments of physiology due to varying data collection methods, physiological differences , data availability and the requirement for expertly annotated data. Advances in wearable technology have significantly increased the scale, sensitivity and accuracy of devices for recording physiological signals, enabling large-scale unobtrusive physiological data gathering. This work contributes an empirical evaluation of signal variances acquired from wearables and their associated impact on the classification of affective states by (i) assessing differences occurring in features representative of affective states extracted from electrocardiograms and photoplethysmog-raphy, (ii) investigating the disparity in feature importance between signals to determine signal-specific features, and (iii) investigating the disparity in feature importance between affective states to determine affect-specific features. Results demonstrate that the degree of feature variance between ECG and PPG in a dataset is reflected in the classification performance of that dataset. Additionally, beats-per-minute, inter-beat-interval and breathing rate are identified as common best-performing features across both signals. Finally feature variance per-affective state identifies hard-to-distinguish affective states requiring one-versus-rest or additional features to enable accurate classification.
... Experiments show that Ekman's FACS, which is traditionally used in affective computation, needs to be extended to the interpretation of non-prototype emotions, for example using a psychological model from Plutchik, to achieve a more detailed classification of emotional states. In artificial intelligence Russell's model is justified because it does not only work with strictly given emotional statesexpressions that can be observed in the human face but we can use this model, for example, to classify data obtained from ECG sensors [21], GSR [22], EEG [23], [24], temperature sensor and others. Dissanayake et al. [21] point out that the standard Ekman model can be extended by another emotional states using the ECG sensor and classified as: anger; sadness; joy; and pleasure. ...
... In artificial intelligence Russell's model is justified because it does not only work with strictly given emotional statesexpressions that can be observed in the human face but we can use this model, for example, to classify data obtained from ECG sensors [21], GSR [22], EEG [23], [24], temperature sensor and others. Dissanayake et al. [21] point out that the standard Ekman model can be extended by another emotional states using the ECG sensor and classified as: anger; sadness; joy; and pleasure. They used Russell's model for classification, which can be used to record arousal and valence. ...
Full-text available
In the present, we can use more than 200 different methods and algorithms for automatic recognition systems, the basis of which is the recognition of a face in an image, subsequent extraction of areas of interest and classification of individual parameters using neural networks or other classifications. Currently, one of the most discussed research topics connecting the fields of psychology and artificial intelligence is the classification of emotions from behavioral characteristics. We created the Emotnizer application for collecting and processing behavioral characteristics. The input characteristics record 24 different ways of user behavior when working with a mouse and keyboard, e.g., when rewriting text (determining the method of clicking / key pressing speed, cursor changes, errors in the text, etc.). In the article, we analyze the obtained parameters using the decision tree method and find out whether it is possible to successfully classify the emotional state using these parameters. We found that for the successful classification of emotional states using behavioral characteristics, it is appropriate to classify mainly emotional states with a high value of valence and arousal.
... Automatic emotion recognition is expected to be an effective technology for effortless communication and forms a major research hotspot in the field of artificial intelligence. Emotions can be identified from various sources of information, such as speech (voice), utterance transcript, facial expressions, and brain waves [2][3][4][5][6][7][8][9][10]. Among these sources, speech can be used for emotion recognition in face-to-face mode as well as in remote communication via telephone or video calls. ...
Full-text available
The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a multi-input deep neural network that simultaneously learns these two audio features. The proposed model comprises three parts, specifically, for learning MelSpec in image format, learning GeMAPS in vector format, and integrating them to predict the emotion. Additionally, a focal loss function is introduced to address the imbalanced data problem among the emotion classes. The results of the recognition experiments demonstrate weighted and unweighted accuracies of 0.6657 and 0.6149, respectively, which are higher than or comparable to those of the existing state-of-the-art methods. Overall, the proposed model significantly improves the recognition accuracy of the emotion “happiness”, which has been difficult to identify in previous studies owing to limited data. Therefore, the proposed model can effectively recognize emotions from speech and can be applied for practical purposes with future development.
... Several studies have used the ECG signal to detect emotional changes [162][163][164][165]. In the research of Dissanayake et al. (2019) [166], the authors used three ECG signal-based techniques and the EMD method to recognize the primary human emotions: anger, joy, sadness, and pleasure. Tey achieved an accuracy gain of 6.8% as compared to the other methods. ...
Full-text available
The joint time-frequency analysis method represents a signal in both time and frequency. Thus, it provides more information compared to other one-dimensional methods. Several researchers recently used time-frequency methods such as the wavelet transform, short-time Fourier transform, empirical mode decomposition and reported impressive results in various electrophysiological studies. The current review provides comprehensive knowledge about different time-frequency methods and their applications in various ECG-based analyses. Typical applications include ECG signal denoising, arrhythmia detection, sleep apnea detection, biometric identification, emotion detection, and driver drowsiness detection. The paper also discusses the limitations of these methods. The review will form a reference for future researchers willing to conduct research in the same field.
... Thus, the performance of ensemble learning models is generally higher than single classification algorithms [27]. There are various applications in which ensemble learning methods are utilized such as cyber security [28][29][30][31][32][33], energy [34][35][36][37], and health informatics [38][39][40][41][42][43][44][45][46][47]. ...
Full-text available
Walking ability of elderly individuals, who suffer from walking difficulties, is limited, which restricts their mobility independence. The physical health and well-being of the elderly population are affected by their level of physical activity. Therefore, monitoring daily activities can help improve the quality of life. This becomes especially a huge challenge for those, who suffer from dementia and Alzheimer’s disease. Thus, it is of great importance for personnel in care homes/rehabilitation centers to monitor their daily activities and progress. Unlike normal subjects, it is required to place the sensor on the back of this group of patients, which makes it even more challenging to detect walking from other activities. With the latest advancements in the field of health sensing and sensor technology, a huge amount of accelerometer data can be easily collected. In this study, a Machine Learning (ML) based algorithm was developed to analyze the accelerometer data collected from patients with walking difficulties, who live in one of the municipalities in Denmark. The ML algorithm is capable of accurately classifying the walking activity of these individuals with different walking abnormalities. Various statistical, temporal, and spectral features were extracted from the time series data collected using an accelerometer sensor placed on the back of the participants. The back sensor placement is desirable in patients with dementia and Alzheimer’s disease since they may remove visible sensors to them due to the nature of their diseases. Then, an evolutionary optimization algorithm called Particle Swarm Optimization (PSO) was used to select a subset of features to be used in the classification step. Four different ML classifiers such as k-Nearest Neighbors (kNN), Random Forest (RF), Stacking Classifier (Stack), and Extreme Gradient Boosting (XGB) were trained and compared on an accelerometry dataset consisting of 20 participants. These models were evaluated using the leave-one-group-out cross-validation (LOGO-CV) technique. The Stack model achieved the best performance with average sensitivity, positive predictive values (precision), F1-score, and accuracy of 86.85%, 93.25%, 88.81%, and 93.32%, respectively, to classify walking episodes. In general, the empirical results confirmed that the proposed models are capable of classifying the walking episodes despite the challenging sensor placement on the back of the patients, who suffer from walking disabilities.
... Non-physiological signals are easy to collect and closely related to our daily life. Another is to identify emotions through physiological signals, such as Electroencephalogram (EEG) [8], Electromyogram (EMG) [9], Electrocardiogram (ECG) [10] and Galvanic Skin Response (GSR) [11]. Among them, the EEG signals are spontaneously electric activity of the neurons in human brain that can reflect the truthful and plentiful emotional information within the individuals. ...
Full-text available
italic xmlns:mml="" xmlns:xlink="">Emotion analysis has been employed in many fields such as human-computer interaction, rehabilitation, and neuroscience. But most emotion analysis methods mainly focus on healthy controls or depression patients. This paper aims to classify the emotional expressions in individuals with hearing impairment based on EEG signals and facial expressions. Two kinds of signals were collected simultaneously when the subjects watched affective video clips, and we labeled the video clips with discrete emotional states (fear, happiness, calmness, and sadness). We extracted the differential entropy (DE) features based on EEG signals and converted DE features into EEG topographic maps (ETM). Next, the ETM and facial expressions were fused by the multichannel fusion method. Finally, a deep learning classifier CBAM_ResNet34 combined Residual Network (ResNet) and Convolutional Block Attention Module (CBAM) was used for subject-dependent emotion classification. The results show that the average classification accuracy of four emotions recognition after multimodal fusion achieves 78.32%, which is higher than 67.90% for facial expressions and 69.43% for EEG signals. Moreover, visualization by the Gradient-weighted Class Activation Mapping (Grad-CAM) of ETM showed that the prefrontal, temporal and occipital lobes were the brain regions closely related to emotional changes in individuals with hearing impairment.</i
... In [19], the authors explained a machine learning method to recognize four major types of human emotions which are anger, sadness, joy, and pleasure. The authors incorporated electrocardiogram (ECG) signals to recognize emotions. ...
Full-text available
Emotion is the most important component of being human, and very essential for everyday activities, such as the interaction between people, decision making, and learning. In order to adapt to the COVID-19 pandemic situation, most of the academic institutions relied on online video conferencing platforms to continue educational activities. Due to low bandwidth in many developing countries, educational activities are being mostly carried out through audio interaction. Recognizing an emotion from audio interaction is important when video interaction is limited or unavailable. The literature has documented several studies on detection of emotion in Bangla text and audio speech data. In this paper, ensemble machine learning methods are used to improve the performance of emotion detection from speech data extracted from audio data. The ensemble learning system consists of several base classifiers, each of which is trained with both spontaneous emotional speech and acted emotional speech data. Several trials with different ensemble learning methods are compared to show how these methods can yield an improvement over traditional machine learning method. The experimental results show the accuracy of ensemble learning methods; 84.37% accuracy was achieved using the ensemble learning with bootstrap aggregation and voting method.
Full-text available
Rössler systems are introduced as prototype equations with the minimum ingredients for continuous time chaos. These systems are made up of three nonlinear ordinary differential equations that define a continuous-time dynamical system with chaotic dynamics due to the attractor's fractal features. Recently, the study on dynamics of fractional-order Rössler systems is attracting a lot of attention. In this study, the Rössler system of fractional order was numerically investigated. The existence, equilibrium points and their stability are also studied. The system is considered in the sense of Caputo fractional derivatives. In addition, an Adams-type predictor-corrector (ATPC) procedure is applied to the solutions of the system. The numerical result of the experiment shows that the system undergoes Hopf bifurcation for certain values. The result shows that selecting an appropriate value for the parameter can determine the stability of the region of our model. In conclusion, the study shows that the fractional order is very much stable than the integer order.KeywordsRössler systemStability analysisBifurcation analysisFractional order
This research aims to investigate emotion recognition using ultra-short-term electrocardiogram (ECG) signals with a hybrid convolutional neural network and long short-term memory (CNN-LSTM) network. DREAMER dataset consists of 23 subjects was used in this study. Raw data were recorded in the form of audio-visual stimuli during affect elicitation. ECG signals acquired from this dataset were pre-processed to filter noises. Single ECG cycle with one R-R peak interval was extracted using Pan–Tompkins algorithm and fed to the hybrid CNN-LSTM network. Another type of input in the form of decomposed ECG signals via empirical mode decomposition (EMD) was also investigated. The network was trained to classify high/low valence, arousal and dominance, respectively. Results show that hybrid CNN-LSTM network outperformed the basic CNN and LSTM network with classification accuracy ranges from 60 to 88% compared to 40.6% to 86.8% using basic configuration. Meanwhile, using EMD as the input achieved better recognition rates.KeywordsEmotion recognitionElectrocardiogramNeural network
Designers refer to existing product cases and innovate products to develop new products. However, when designers screen product cases, there is no user participation, which leads to the lack of user-side knowledge and emotional drive that is very important for design. Therefore, it is necessary to play the role of user emotional knowledge in promoting the whole design process. This paper proposes the concept of the positive perceptual sample, which applies the knowledge emotion integration of designers and users to the screening sample case stage at the beginning of the design process. This study is based on the lack of user-side knowledge and emotional drive of reference cases and integrates user emotion into the reference case screening process. Then, in the emotion measurement process, users’ cognitive data in the screening process are obtained through the eye-brain fusion cognitive experiment. Finally, the XGBoost algorithm is used to process feature index data to realize the classification and recognition of cognitive data and applied to the positive perceptual classification of products. The results show that the classification accuracy of physiological cognitive data with user emotional representation by the XGBoost algorithm is 90.87% . The results of cognitive data classification are applied to the screening of positive perceptual samples, and the satisfaction rate is 98.35% . The results show that the method proposed in this paper provides a new source of ideas for obtaining positive perceptual samples and can be applied to new product development.
Full-text available
Human computer interaction is increasingly utilized in smart home, industry 4.0 and personal health. Communication betweenhuman and computer can benefit by a flawless exchange of emotions. As emotions have substantial influence on cognitive processesof the human brain such as learning, memory, perception and problem solving, emotional interactions benefit different applications.It can further be relevant in modern health care especially in interaction with patients suffering from stress or depression. Addi-tionally rehabilitation applications, guiding patients through their rehabilitation training while adapting to the patients emotionalstate, would be highly motivating and might lead to a faster recovery. Depending on the application area, different systems foremotion recognition suit different purposes. The aim of this work is to give an overview of methods to recognize emotions and tocompare their applicability based on existing studies. This review paper should enable practitioners, researchers and engineers tofind a system most suitable for certain applications. An entirely contact-less method is to analyze facial features with the helpof a video camera. This is useful when computers, smart-phones or tablets with integrated cameras are included in the task.Smart wearables provide contact with the skin and physiological parameters such as electro-dermal activity and heart relatedsignals can be recorded unobtrusively also during dynamical tasks. Next to unimodal solutions, multimodal affective computingsystems are analyzed since they promise higher classification accuracy. Accuracy varies based on the amount of detected emotions,extracted features, classification method and the quality of the database. Electroencephalography achieves 88.86 % accuracy forfour emotions, multimodal measurements (Electrocardiography, Electromyography and bio-signals) 79.3 % for four emotive states,facial recognition 89 % for seven states and speech recognition 80.46 % for happiness and sadness. Looking forward, heart-relatedparameters might be an option to measure emotions accurately and unobtrusive with the help of smart wearables. This can beused in dynamic or outdoor tasks. Facial recognition on the other hand is a useful contact-less tool when it comes to emotionrecognition during computer interaction.
Conference Paper
Full-text available
in recent years, multi-label classification problem has become a controversial issue. In this kind of classification, each sample is associated with a set of class labels. Ensemble approaches are supervised learning algorithms in which an operator takes a number of learning algorithms, namely base-level algorithms and combines their outcomes to make an estimation. The simplest form of ensemble learning is to train the base-level algorithms on random subsets of data and then let them vote for the most popular classifications or average the predictions of the base-level algorithms. In this study, an ensemble learning method is proposed for improving multi-label classification evaluation criteria. We have compared our method with well-known base-level algorithms on some data sets. Experiment results show the proposed approach outperforms the base well-known classifiers for the multi-label classification problem.
Conference Paper
Full-text available
We propose new features for emotion recognition from short ECG signals. The features represent the statistical distribution of dominant frequencies, calculated using spectrogram analysis of intrinsic mode function after applying the bivariate empirical mode decomposition to ECG. KNN was used to classify emotions in valence and arousal for a 3-class problem (low-medium-high). Using ECG from the Mahnob-HCI database, the average accuracies for valence and arousal were 55.8% and 59.7% respectively with 10-fold cross validation. The accuracies using features from standard Heart Rate Variability analysis were 42.6% and 47.7% for valence and arousal respectively for the 3-class problem. These features were also tested using subject-independent validation, achieving an accuracy of 59.2% for valence and 58.7% for arousal. The proposed features also showed better performance compared to features based on statistical distribution of instantaneous frequency, calculated using Hilbert transform of intrinsic mode function after applying standard empirical mode decomposition and bivariate empirical mode decomposition to ECG. We conclude that the proposed features offer a promising approach to emotion recognition based on short ECG signals. The proposed features could be potentially used also in applications in which it is important to detect quickly any changes in emotional state.
Full-text available
Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost .
In this article we describe a new approach to enhance presence technologies. First, we discuss the strong relationship between cognitive processes and emotions and how human physiology is uniquely affected when experiencing each emotion. Secondly, we introduce our prototype multimodal affective user interface. In the remainder of the paper we describe the emotion elicitation experiment we designed and conducted and the algorithms we implemented to analyse the physiological signals associated with emotions. These algorithms can then be used to recognise the affective states of users from physiological data collected via non-invasive technologies. The affective intelligent user interfaces we plan to create will adapt to user affect dynamically in the current context, thus providing enhanced social presence.
Emotions play a significant and powerful role in everyday life of human beings. Developing algorithms for computers to recognize an emotional expression is widely studied area. In this study, emotion recognition from Galvanic Skin Response signals was performed using time domain, wavelet and Empirical Mode Decomposition based features. Valence and arousal have been categorized and relationship between physiological signals and arousal and valence has been studied using k-Nearest Neighbors, Decision Tree, Random Forest and Support Vector Machine algorithms. We have achieved 81.81% and 89.29% accuracy rate for arousal and valence respectively.
Conference Paper
Emotions play a significant and powerful role in everyday life of human beings. Developing algorithms for computers to recognize emotional expression is a widely studied area. In this study, emotion recognition from Galvanic signals was performed using time domain and wavelet based features. Feature extraction has been done with various feature set attributes. Various length windows have been used for feature extraction. Various feature attribute sets have been implemented. Valence and arousal have been categorized and relationship between physiological signals and arousal and valence has been studied using Random Forest machine learning algorithm. We have achieved 71.53% and 71.04% accuracy rate for arousal and valence respectively by using only galvanic skin response signal. We have also showed that using convolution has positive affect on accuracy rate compared to non-overlapping window based feature extraction.
Modern Supervisory Control and Data Acquisition (SCADA) systems used by the electric utility industry to monitor and control electric power generation, transmission and distribution are recognized today as critical components of the electric power delivery infrastructure. SCADA systems are large, complex and incorporate increasing numbers of widely distributed components. The presence of a real time intrusion detection mechanism, which can cope with di�erent types of attacks, is of great importance, in order to defend a system against cyber attacks This defense mechanism must be distributed, cheap and above all accurate, since false positive alarms, or mistakes regarding the origin of the intrusion mean severe costs for the system. Recently an integrated detection mechanism, namely IT-OCSVM was proposed, which is distributed in a SCADA network as a part of a distributed intrusion detection system (DIDS), providing accurate data about the origin and the time of an intrusion. In this paper we also analyze the architecture of the integrated detection mechanism and we perform extensive simulations based on real cyber attacks in a small SCADA testbed in order to evaluate the performance of the proposed mechanism.