Content uploaded by Kritiprasanna Das
Author content
All content in this area was uploaded by Kritiprasanna Das on Dec 04, 2022
Content may be subject to copyright.
Emotion identification from TQWT
based EEG rhythms
Aditya Nalwaya, Kritiprasanna Das, and Ram Bilas Pachori
Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
ABSTRACT
Electroencephalogram (EEG) signals are the recording of brain electrical activity, commonly used for
emotion recognition. Different EEG rhythms carry different neural dynamics. EEG rhythms are separated
using tunable Q-factor wavelet transform (TQWT). Several features like mean, standard deviation,
information potential are extracted from the TQWT based EEG rhythms. Machine learning classifiers are
used to differentiate various emotional states automatically. We have validated our proposed model using
a publicly available database. Obtained classification accuracy of 92.86% proves the candidature of the
proposed method for emotion identification.
Keywords: Emotion recognition, affective computing, signal processing, machine learning, physiological
signal.
INTRODUCTION
Emotion plays a vital role in human life, as it influences human behavior, mental state, decision making,
etc. [1]. In humans, overall intelligence is generally measured by logical and emotional intelligence
[2],[3]. In recent years artificial intelligence (AI) and machine learning (ML) have helped computers
achieve higher intelligence particularly, in numerical computing and logical reasoning. But still, there are
some limitations in its ability to understand, comprehend, and respond according to the emotional state of
persons interacting with a computer. To address these shortcomings, research in the domain of affective
computing is going on. Affective computing is a field that aims to design machines that can recognize,
interpret, process, and stimulate the human experience of feeling or emotion. Recognizing a person's
emotional state can help a computer to interact with humans in a better way.
In order to get more customized and user-centric information and communications technology solutions,
an emotion recognition system could play an important role. Although computing systems have achieved
great progress in AI till now but still, it lags in intelligence when compared with humans. The reason is
the absence of emotional intelligence, as it helps in understanding and making a decision according to the
situation. Thus, instead of making decisions logically, computers can be made aware of the human
emotional state and then make any decision. Emotion recognition is also helpful in upcoming new
entertainment systems such as virtual reality systems for enhancing user experience [4]. Emotion
recognition systems can also be used in understanding the health condition of patients with mental
disabilities or infant patients [5]. Emotion detection can be used to monitor students learning and create
personalized educational content for students [6]. Also, a software developer can examine user experience
by using the emotion recognition system. Emotion recognition system has a vast area of application such
as health care, brain-computer interface (BCI), education, smart entertainment system, smart rooms,
intelligent cars, psychological study, etc. [6].
Emotions are revealed by a human through either facial expression, verbal expression, or several
physiological signals such as variability in heart rate, skin conductance, etc. These are generated by the
human body in response to the emotion evoked [1].
In an emotion recognition system, emotions can be evoked or elicited either in a passive way or in an
active way. In the case of passive emotion elicitation, the subject's emotions are evoked by exposing them
to targeted emotion elicitation material. Some of the publicly available elicitation materials are the
international affective picture system (IAPS) [52]; it is a library of photographs used extensively for
emotion elicitation, Nencki affective picture system (NAPS); is another database for visual stimulus. The
Montreal affective voices and the international affective digitized sounds (IADS) are some of the acoustic
stimulus databases used for passive emotion elicitation [53]. In the case of active emotion elicitation,
subjects will be asked to actively participate in a certain task that leads to emotion elicitation. Participants
may be asked to play video games [54] or engage in conversation with another participant [55]; thus, by
actively participating in the experiment subject's emotions can be evoked. Emotion elicited can be labeled
either through explicit assessment by the subject itself, where the subject tells about his/her feeling, or by
an implicit assessment, where the subject's emotional state is evaluated externally by some other person.
Some standard psychological questionnaires used for the emotion evaluation are self-assessment manikin
(SAM) [56], the positive and negative affective scheme (PANA) [57], and differential emotion scale
(DES) [58], subjects will answer according to their feelings. Both implicit and explicit methods of
assessment are approximate evaluations of elicitation. Therefore in [59], to ensure the correctness of the
label, both techniques are used in combination. Thus, in order to get physiological signals for a targeted
emotion, the elicitation or stimulus of a particular emotion must be chosen carefully.
Signals which are interpretable such as facial expressions, speech expressions, etc., can be collected easily
as the subject is not required to wear any equipment for recording such signals. Most facial emotion
recognition (FER) approaches have three main stages: preprocessing, feature extraction, and emotion
classification.
Preprocessing involves operations related to face detection and face alignment. There are many face
detection techniques available such as Viola-Jones [6], normalized pixel difference (NPD) feature-based
face detection [7], and facial image threshing machine [8]. Zhang et al.[9] proposed an algorithm that can
detect faces as well as perform a face alignment operation. Face alignment is an important operation for
making a non-frontal face image to a frontal image. An active appearance model (AAM) matches the
statistical appearance model iteratively to get new face aligned images [10]. Another face alignment
technique is constrained local models (CLM), which has more smooth image alignment due to the use of
linear filters [11].
Feature extraction is a process of extracting useful information from any given image or video. The
process of identifying information from a given image is called data registration [12]. It can be either
from a full facial image, part of a facial image, or a point-based method. The full facial image is generally
used when one is looking for every single detail of variation across the face. Whereas in the case of the
part-based method, only a part of the facial image, such as eye, nose, etc., is considered. The point-based
method is useful in getting information related to shape. Both part-based and point-based holds low-level
feature information. Different low-level feature are local binary pattern (LBP) [13], local phase
quantisation (LPQ) [14], histogram of gradients (HOG) [15], etc. Such low-level feature generates high
dimensional feature vectors. Therefore, for removing redundancy from the obtained feature vector
pooling methods are used. Next, emotions are classified into different classes using the feature vector
obtained.
In [16], authors have used discrete wavelet transform (DWT) to extract features from a face image and
then convolutional neural networks are used to classify the emotion. In [17], authors have extracted
multiscale features using biorthogonal wavelet entropy from the face image and then a fuzzy support
vector machines (SVM) classifier is used to classify the emotion. Jeen et al. [18] have calculated features
using a multilevel wavelet gradient transform, then using Pearson kernel principal component analysis
pooling is done. The classification of emotion is done using a fuzzy SVM classifier. In [19], the authors
have used pre-trained convolutional neural networks that were trained using the ImageNet database.
Using such transfer learning approach used for facial emotion have been recognized. Such FER systems
have been found in applications related to video analytics for monitoring people [20], e-learning to
identify student engagement [21], reducing fatigue during video conferencing [22], etc.
Another popular approach for recognizing a person's emotions is speech analysis. Speech emotion
recognition (SER) estimates the emotional state of a speaker from voice signal. The SER has the same
stages as the FER system had; the only difference is in the kind of features been extracted from the input
signal. Preprocessing stage helps in extracting the speech signal of the target speaker and removing the
voice of the non-target speaker as well as background noise and reverberation. Frequently used features
for emotion recognition through speech are the Teager-energy operator (TEO) [23], prosodic features
[24], voice quality features [25], and spectral features [26]. TEO features find stress in the speech. It has
been observed that speech is produced due to the nonlinear airflow from the human vocal tract, which is
directly related to change in muscle tension. Thus, TEO can be used to analyze the pitch contour to detect
emotions such as neutral, angry, etc. Prosodic feature highlights pitch-related information such as stress,
tone, pause in between words, etc. The voice quality feature represents voice level, voice pitch for a
particular emotion, i.e., the amplitude and the duration of the speech. Spectral features give information
related to frequency distribution over the audible frequency range. Linear predictive cepstral coefficients
(LPCC), mel frequency cepstral coefficients (MFCC), modulation spectral features, etc., are some of the
popular spectral features used for emotion recognition [26].
In [27], the energy content in the speech signal is computed using wavelet-based time-frequency
distribution for the classification of emotions. In [28], authors have used empirical mode decomposition
(EMD) based signal reconstruction method for feature extraction. Daneshfar et al. [29] have proposed
hybrid spectral-prosodic features of a speech signal. Using quantum-behaved particle swarm optimization
(QPSO) dimensionality of the feature vector is reduced. The reduced feature vector is then passed to a
neural network classifier having a Gaussian elliptical basis function (GEBF) for detecting speech
emotion. In [30], LPCC and MFCC features have been extracted using wavelet decomposition. These
features are then reduced using the vector quantization method and using the radial basis function
network (RBFNN) classifier emotions were classified. Compared to biological signals, facial expressions
and speech signals can be acquired comfortably and economically. Although such signals can be
controlled or fabricated by the subject, thus are less reliable when compared with physiological signals
[51].
The autonomous nervous system (ANS) regulates different parameters of our body. Emotions cause a
change in the activity of ANS [31]. Thus, to analyze changes in the emotional state of a person's heart
rate, body temperature, respiration rates, and other physiological signals are often used. There are many
physiological signals such as electroencephalogram (EEG) [32], electrocardiogram (ECG) [33],
phonocardiogram (PCG) [34], galvanic skin response (GSR) [35], respiration [36], etc. which have been
used for emotion recognition.
ECG represents the heart's electrical activity due to cardiac contraction and expansion. The sympathetic
system in the ANS stimulates differently for different emotions. Emotions influence the ANS activity,
which causes changes in the heartbeat rhythm. The ECG signals can be recorded by placing electrodes at
different parts of the chest [33].
As in the case of FER and SER systems, here also feature are extracted for emotion recognition. ECG has
three different approaches of feature extraction: PQRST detection, heart rate (HR) and within beat (WIB),
heart rate variability (HRV), and inter-beat interval (IBI) [37]. HRV is a time-domain feature and is most
widely used for the purpose of emotion recognition [38]. It measures variation in the heartbeats interval.
The time between beats is called an IBI or RR interval. There are three domains of features that are
generally extracted from the HRV, namely: time domain, frequency domain, and nonlinear domain
feature.
Emotion recognition using ECG signal is done in [39]. ECG signal features such as WIB mean, standard
deviation, median, etc., are calculated, and various time and frequency domain parameters have been
calculated using EMD. This feature vector is then passed to extra tree, a random forest classifier. In [40],
using DWT, the signal is decomposed, and different features have been calculated, and then using the
feature vector, emotions have been classified with the help of an SVM classifier.
GSR measures the conductivity of the skin, which is also known as electrodermal activity (EDA). Like
heartbeat is regulated by ANS, sweating is also regulated by the ANS. Due to stress or fear emotions, the
nervous system gets stimulated, and sweat is generated. Thus EDA signals can be used for the purpose of
emotion detection. The conductivity of the skin increases when the subject is active and decreases when a
subject is in a relaxed state. EDA signals can be recorded by placing electrodes over the fingers [35]. In
[41], fractional Fourier transform (FrFT) is used for feature extraction from GSR signal, then using
Wilcoxon test feature selection is done, with SVM classification. Generally, for emotion recognition
purposes, GSR is used in combination with other physiological signals such as ECG, EEG, etc.
Respiration rate is defined as the number of times a person breathes per unit time. The breathing pattern
of a person varies with changes in the physical and emotional state. Due to an increase in physical
workload, respiration may increase. Similarly, a decreased respiration rate indicates a relaxed state. Thus
respiration rate indicates the affective state of the ANS in the condition of emotional response and mental
workload. Fast and deep breathing indicates anger or a happy emotional state. Momentary interruption of
respiration indicates tension. Irregular respiration may also indicate a depressed or fearful emotional state
[36]. In [42], a deep learning-based sparse auto-encoder (SAE) is used to extract features for recognizing
emotional information from respiratory signals. Then logistic regression is used for the classification of
emotion.
An EEG signal is generated due to electrical activities inside the brain. EEG signals can be recorded both
invasively and non-invasively. A non-invasive way of EEG signal recording is popularly used in the case
of human brain study. An international standard 10-20 cap of multiple electrodes are placed over the scalp
to record the EEG signal, and potential difference among the electrodes captures any electrical activity
inside the brain. Here 10–20 refers to the distance between the adjacent electrodes i.e., 10% or 20%
distance from front to back or from right to the left of the skull. The EEG signal hence received from
various channels of a cap has a nonstationary characteristic. A professionally trained person can only
understand such nonstationary EEG signals like a doctor uses these signals in diagnosing different brain
disorders.
Various signal processing algorithms can help extract information from such nonstationary signals to
automate this manual process. Before applying any signal processing algorithm, a signal must be
preprocessed. Preprocessing helps in artifact and noise removal from the raw input signal. Preprocessing
stage makes the signal suitable for further processing. Preprocessing includes operations such as artifact
removal, noise filtering, and resampling the signal. The signal is generally recorded at a higher sampling
rate, and then the signal is downsampled before further processing to reduce the computational
complexity. Downsampling helps in reducing the number of samples used while still maintaining the
needed information. Several artifact removal techniques based on independent component analysis (ICA),
bandpass filtering, deep learning, etc., will further improve the signal quality [51]. The preprocessed
signal is then analyzed using various signal decomposition techniques. Then certain features are extracted
from the decomposed signal. Various features which help in characterizing signals are spatial features,
spectral features, temporal features, statistical features, etc. can be extracted from the oscillatory
components. Spatial information helps in finding out the source of information. In the case of EEG signal,
this feature will help in selecting specific EEG channels or focusing more on the signal coming from a
specific region of the brain. The spectral feature can be helpful in describing signal power distribution in
different frequency bands. Mean, variance, standard deviation, skewness, kurtosis etc., are some of the
widely used statistical features.
EEG signals are very useful physiological signals for the study of human emotion recognition, as EEG
signals have a high temporal resolution [32]. Also, they are generated from the human brain; these signals
signal have more important emotion-related information. EEG signals consist of different rhythms such as
delta (δ) (0.5–4 Hz), theta (θ) (4–8 Hz), alpha (α) (8–13 Hz), beta (β) (13–30 Hz) and gamma (γ) (more
than 30 Hz) [43]. The study of these EEG rhythms gives information about the user's mental and
emotional state.
Several methods have been proposed for emotion recognition using EEG signals. In [45], feature
extraction from EEG signal is done using short-time Fourier transform (STFT). F-score is used for feature
selection than using SVM classification was done. In [46], the author has extracted rational asymmetry
(RASM), which describes the frequency-space domain characteristic of EEG signal, then using long-
short-term-memory (LSTM) recurrent neural networks; different emotions were classified with an
accuracy of 76%. In [47], multivariate synchrosqueezing transform (MSST) is used for time-frequency
representation. The high-dimensional extracted feature is reduced using independent component analysis
(ICA). Gupta et al.[48] decomposed EEG signals using flexible analytic wavelet transform. Information
potential (IP) using Reyni's quadratic entropy is computed from each sub-bands. Then the obtained
feature is smoothed using a moving average filter. Then the classification is done using a random forest
classifier. In [49], the authors have divided an EEG signal into small segments. Then for each segment,
statistical parameters such as mean, median, Fisher information ratio, standard deviation, variance,
maximum, minimum, range, skewness, kurtosis, entropy, and Petrosian fractal dimension. Then the
feature vector of the above computed statistical parameter is passed to a classifier called sparse
discriminative ensemble learning (SDEL) for emotion classification. In [50], Fourier-Bessel series
expansion (FBSE) based empirical wavelet transform (FBSE-EWT) is used for computing K-nearest
neighbour (K-NN) and spectral Shannon entropies. The extracted features are smoothed and then given to
the sparse autoencoder-based random forest (ARF) classifier for emotion classification.
In [51], various multimodal emotion recognition approaches are explained. In the multimodal-based
approach, different signals are captured from the subject, and a separate feature vector is formed from
each signal. These feature vectors of different signals are combined either at the feature level or at the
decision level.
As this chapter focuses more on EEG-related emotion recognition, the subsequent section will give more
details related to different blocks or stages of EEG-based emotion recognition systems.
Proposed framework
This section explains the proposed methodology, the flow chart of the proposed emotion recognition
system is shown in Fig. 1.
Figure 1. Flow chart of the proposed methodology
a) Database description:
A publically available database called SJTU Emotion EEG Dataset (SEED) is used for validating
the proposed model [60]. EEG signals of 15 subjects were recorded for three classes of emotions,
i.e., neutral, sad, and happy. Each subject was called in three sessions. The SEED database
includes the Chinese movie clips used to elicit emotion, a list of subjects, and recorded EEG data.
The Chinese movie clips used for eliciting emotion were selected based on certain criteria: 1) to
avoid fatigue among subjects during the trial, clips are chosen as short as possible, 2) the subject
must easily understand the clip, and 3) only desired emotion must get elicited. Clips were then
shown to twenty participants, and among those, only 15 were selected finally with an equal
number of clips for evoking each emotion. The time interval of each movie clip was
approximately 4 minutes duration.
b) Tunable Q-factor wavelet transform (TQWT) [68]: In this stage, signal is decomposed into
different sub-bands. DWT is the most widely used time-frequency analysis technique for
analyzing nonlinear and nonstationary processes [book]. DWT has a constant Q-factor, which is
not suitable for different varieties of signal, i.e. if the signal is highly oscillatory then it should
have a high Q-factor. on the other hand, if the signal is less oscillatory then it should have a low
Q-factor. TQWT is a more advanced version of wavelet transform. Using TQWT multi-
component EEG signals can be decomposed into several sub-band signals. TQWT is more
flexible than the original DWT as the Q factor of the filter can be adjusted in the case of TQWT.
TQWT provide good time-frequency localization. TQWT has three main parameters Q, r, and J.
Where Q represents the Q-factor it is a dimensionless quantity, J is a parameter that represents
the levels of decomposition, it varies from 1 to N integer values and r represents the oversampling
rate. Wavelet oscillations are adjusted by Q, whereas r controls temporal localization while
conserving its form. By increasing the value of Q, frequency response becomes narrower
resulting in more levels of decompositions for the same frequency range span. For the fixed value
of Q, if r is increased then overlap between the adjacent frequency responses will in turn increase
the levels of decomposition for the same frequency range. TQWT contains a chain of two-
channel high pass and low pass filter banks. The output of low pass filter is connected to the input
of the next stage of the filter bank. Fig. 3 (a) shows the process of decomposition and
reconstruction of a given input EEG signal using TQWT based approach. Fig. 3 (b) shows the
process of iterative signal decomposition up to N levels. For N level of decomposition number of
sub-bands will be N+1 i.e., one low pass sub-band and N high pass sub-bands are obtained. In
Fig. 3 (a) α represents low pass scaling factor, it preserves the low-frequency components of the
signal. Similarly, β represents the high pass scaling factor it preserves the high-frequency
components of the signal.
Figure 3. (a) TQWT decomposition and reconstruction. (b) Multi-stage filter bank.
Equivalent frequency response of low pass () and high pass () filter bank is given as [68]:
For perfect reconstruction α + β > 1 and for α + β = 1 TQWT is critically sampled with no
transition width. For getting the desired Q-factor wavelet filter bank parameters α & β are given
by [68]:
c) Rhythm Separation: Rhythms from decomposed EEG signals are separated by calculating the
mean frequency of the sub-bands signals. If the mean frequency value lies between 0.5 to 4 Hz
then delta rhythm is obtained. Similarly, alpha, beta, and gamma rhythm's mean frequency values
lie between 4 to 8 Hz, 8 to 13 Hz, 13 to 30 Hz, 30 to 75 Hz, respectively.
Sub-bands of decomposed EEG signal are grouped to separate rhythms according to their mean
frequency (μk) calculated as follows [70]:
Where
, here is the sampling frequency, N is the length of the signal and is the sub-
band DFT. As only a one-sided spectrum is considered, therefore, the range for i is between 0 to
(N/2)-1.
d) Feature extraction: IP of the obtained rhythm is then calculated using Reyni's quadratic entropy
[71]. For adaptation and learning of the information, Renyi derived a group of estimators which
use entropy and divergences as a cost function. Since entropy is a scalar quantity, for calculating
the entropy of random data, first its probability density function (PDF) must be estimated. But for
high dimensional spaces, it is difficult to calculate. Using quadratic Renyi's entropy and the IP
(i.e. H(x)) requirement of estimating PDF can be relaxed. Where H(x) is a Gaussian kernel with σ
as variance, IPσ(x) is a quadratic IP estimator which depends on σ, xm & xn are sample pairs and
the total number of samples is given by N.
e) Classification: SVM is a supervised machine learning algorithm. It is mainly used for finding
decision boundaries or support vectors which are then used for classification. SVM classifier
learns from the training data that are projected into a higher-dimensional space, where data is
separated into two classes by a hyperplane [72]. The user-defined kernel function helps in
transforming the original feature space into a higher-dimensional space. It finds support vectors to
maximize the separation between the two classes. Margin is the total separation between the
hyperplane. Once hyperplanes are defined, SVM iteratively optimizes in order to maximize the
margin. SVM can perform both linear and nonlinear classification. In the case of a nonlinear
classifier, the kernels are complex polynomial, homogenous polynomial, Gaussian radial basis
function, and hyperbolic tangent function. In this work SVM classifier, with a cubic kernel is
used.
Hyperplane of SVM classifier can be expressed mathematically as follow [73]:
Where is a positive real constant, c is a real constant, is a kernel or feature space, xn
and yn are nth input and output vector. For a linear feature space
, for polynomial
SVM of order d feature space is given by
i.e., for quadratic polynomial
(d=2) and cubic polynomial (d=3).
RESULTS AND DISCUSSION
The proposed methodology for the recognition of human emotion using EEG signal is evaluated using a
publically available database consisting of EEG signal of 15 participants. TQWT decomposes EEG signal
into several sub-bands. Different parameters related to TQWT were chosen as Q = 5, r = 3, and J = 18.
One second epoch of the EEG signal is chosen for decomposing it into (J+1) sub-bands. Then mean
frequency is calculated, and based on its value, EEG rhythms were extracted. Thus, all sub-bands with a
mean frequency value between 1 to 4 Hz are summed together to obtain delta rhythm. In a similar way,
all other EEG rhythms were obtained as shown in Figs. 4, 5, and 6. Then information potential for each
rhythm is calculated, and a feature vector corresponding to a particular emotion was formed. Several
machine learning classifiers are trained using the feature vector. SVM classifier with cubic kernel
outperforms other classifiers.
Figure 4: EEG rhythms for happy emotion.
Figure 5: EEG rhythms for neutral emotion.
Figure 6: EEG rhythms for sad emotion.
All simulations are done using MATLAB 2021b, installed on a system having an Intel i5 processor and 8
GB RAM. Table 1 shows the accuracy of the classifier for an individual subject. The overall average
accuracy for the subject-dependent emotion recognition task is 92.86%.
Table 1
SVM (cubic) classifier performance for subject dependent emotion recognition
Subject
True Positive rate (in %)
Overall
Accuracy
(in %)
Negative (-1)
Neutral (0)
Positive (1)
Subject 1
91.7
93.7
95.9
93.8
Subject 2
93.6
94.7
94.2
94.2
Subject 3
86.8
88.7
94.2
90
Subject 4
99.1
99.5
99.6
99.4
Subject 5
93.8
93
92.7
93.2
Subject 6
93.8
92.4
97.9
94.7
Subject 7
82.1
87.7
88.4
86.1
Subject 8
89
90.2
92.5
90.6
Subject 9
88.8
87.2
95.2
90.5
Subject 10
92.1
92.8
97.5
94.2
Subject 11
89
91.8
91.5
90.8
Subject 12
93
94.8
92.9
93.6
Subject 13
91.4
92.6
94.8
93
Subject 14
93.6
94.6
96.5
94.9
Subject 15
93.1
95.5
93.2
93.9
Average
91.39
92.61
94.46
92.86
In the case of subject independent classification, MATLAB classifier learner application is used, and five
different classifiers are trained. Table 2 shows a comparison of different classifiers' performance in the
case of subject independent emotion recognition. Ensemble bagged trees provides the highest overall
accuracy of 86.8%.
Table 2
Classification performance for subject independent emotion recognition
Classifier
True Positive rate (in %)
Overall
Accuracy
(in %)
Negative (-1)
Neutral (0)
Positive (1)
Fine tree
51.8
68.9
80.1
67.1
SVM (cubic)
81.8
83.1
90.9
85.4
KNN (cubic)
58.2
61.4
73.4
64.4
Ensemble boosted trees
50
59.4
80.3
64.2
Ensemble bagged trees
85
85.2
90
86.8
TQWT based emotion recognition using EEG signal requires less amount of training data compared to the
deep neural network-based emotion recognition system. Comparative performance of different emotion
recognition methods is shown in Table 3.
Table 3
Performance comparison of the different existing methods with the proposed method
Author
Year
Dataset
Methodology
Accuracy
Li et al.[75]
2017
DEAP
CNN + LSTM, RNN.
75.21%
W. Zhang et
al [76]
2019
SEED
CNN and DDC
82.1%
Zhong et al.
[77]
2020
SEED
RGNN
85.3%
Yucel et
al.[78]
2020
SEED
Pretrained CNN
78.34%
Wang et
al.[79]
2020
SEED
CNN, EFDMs, and STFT
90.59%
Wei et al.[80]
2020
SEED
Dual-tree complex wavelet transform and simple
recurrent units network
80.02%
Khateeb et
al.[81]
2021
DEAP
Multi-domain features extraction using wavelet
(entropy, energy) and classification using SVM
classifier
65.92%
Haqque et
al.[82]
2021
SEED
Wavelet filters and CNN.
83.44%
Proposed
work
SEED
TQWT is used for feature extraction, and SVM with
cubic kernel is used for classification
92.86%
CNN: Convolutional neural network, LSTM: Long short-term memory, RNN: recurrent neural network,
STFT: short-time Fourier transform, DDC: deep domain confusion, EFDMs: electrode-frequency
distribution maps.
CONCLUSION
This study recognized human emotions using EEG signals by applying advanced signal processing
techniques. TQWT decomposes EEG signal into several sub-bands. Depending on the mean frequency,
sub-bands are grouped together in order to obtain EEG rhythms. Statistical feature from each rhythm is
computed and used for classification using an SVM classifier. SEED emotion database is used for
evaluation of the proposed methodology. For subject-dependent and independent emotion recognition,
92.86% and 86.8 % accuracy, respectively, are obtained. The proposed technique may be helpful in
creating a more user-friendly interactive system. Also, the complexity of such a system will be less when
compared with artificial neural network techniques. Therefore, it can be easily deployed in real-time
applications.
ACKNOWLEDGEMENT
This study is supported by the Council of Scientific & Industrial Research (CSIR) funded Research
Project, Government of India, Grant number: 22(0851)/20/EMR-II.
REFERENCES
[1] Šimić, G., Tkalčić, M., Vukić, V., Mulc, D., Španić, E., Šagud, M., ... & R Hof, P. (2021). Understanding
Emotions: Origins and Roles of the Amygdala. Biomolecules, 11(6), 823.
[2] Picard, R.W., Vyzas, E., & Healey, J. (2001). Toward Machine Emotional Intelligence: Analysis of Affective
Physiological State. IEEE Trans. Pattern Anal. Mach. Intell., 23, 1175-1191.
[3] Salovey, P., & Mayer, J. D. (1990). Emotional intelligence. Imagination, cognition and personality, 9(3), 185-
211.
[4] Gupta, K., Lazarevic, J., Pai, Y. S., & Billinghurst, M. (2020, November). AffectivelyVR: Towards VR
Personalized Emotion Recognition. In 26th ACM Symposium on Virtual Reality Software and Technology (pp. 1-3).
[5] Hassouneh, A., Mutawa, A. M., & Murugappan, M. (2020). Development of a real-time emotion recognition
system using facial expressions and EEG based on machine learning and deep neural network methods. Informatics
in Medicine Unlocked, 20, 100372.
[6] Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., & Wrobel, M. R. (2014). Emotion recognition and
its applications. In Human-Computer Systems Interaction: Backgrounds and Applications 3 (pp. 51-62). Springer,
Cham.
[7] Egger, M., Ley, M., & Hanke, S. (2019). Emotion recognition from physiological signal analysis: A review.
Electronic Notes in Theoretical Computer Science, 343, 35-55.
[] Viola, P.A., & Jones, M.J. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings
of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 1, I-I.
[7] Liao, S., Jain, A. K., & Li, S. Z. (2015). A fast and accurate unconstrained face detector. IEEE Transactions on
Pattern Analysis And Machine Intelligence, 38(2), 211-223.
[8] Kim, J. H., Poulose, A., & Han, D. S. (2021). The extensive usage of the facial image threshing machine for
facial emotion recognition performance. Sensors, 21(6), 2026.
[9] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded
convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.
[10] Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions On Pattern
Analysis And Machine Intelligence, 23(6), 681-685.
[11] Saragih, J. M., Lucey, S., & Cohn, J. F. (2009, September). Face alignment through subspace constrained
mean-shifts. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1034-1041). Ieee.
[12] Sariyanidi, E., Gunes, H., & Cavallaro, A. (2014). Automatic analysis of facial affect: A survey of registration,
representation, and recognition. IEEE Transactions On Pattern Analysis And Machine Intelligence, 37(6), 1113-
1133.
[13] Jabid, T., Kabir, M. H., & Chae, O. (2010, January). Local directional pattern (LDP) for face recognition. In
2010 Digest of Technical Papers International Conference On Consumer Electronics (ICCE) (pp. 329-330). IEEE.
[14] Ojansivu, V., & Heikkilä, J. (2008, July). Blur insensitive texture classification using local phase quantization.
In International Conference On Image And Signal Processing (pp. 236-243). Springer, Berlin, Heidelberg.
[15] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE
Computer Society Conference on Computer Vision And Pattern Recognition (CVPR'05) (Vol. 1, pp. 886-893).
[16] Bendjillali, R. I., Beladgham, M., Merit, K., & Taleb-Ahmed, A. (2019). Improved facial expression
recognition based on DWT feature for deep CNN. Electronics, 8(3), 324.
[17] Zhang, Y. D., Yang, Z. J., Lu, H. M., Zhou, X. X., Phillips, P., Liu, Q. M., & Wang, S. H. (2016). Facial
emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross
validation. IEEE Access, 4, 8375-8385.
[18] Kumar, R., Sundaram, M., & Arumugam, N. (2021). Facial emotion recognition using subband selective
multilevel stationary wavelet gradient transform and fuzzy support vector machine. The Visual Computer, 37(8),
2315-2329.
[19] Chowdary, M. K., Nguyen, T. N., & Hemanth, D. J. (2021). Deep learning-based facial emotion recognition for
human–computer interaction applications. Neural Computing and Applications, 1-18.
[20] Gautam, K. S., & Thangavel, S. K. (2021). Video analytics-based facial emotion recognition system for smart
buildings. International Journal of Computers and Applications, 43(9), 858-867.
[21] De Carolis, B., D'Errico, F., Macchiarulo, N., & Palestra, G. (2019, October). "Engaged Faces": Measuring and
Monitoring Student Engagement from Face and Gaze Behavior. In IEEE/WIC/ACM International Conference on
Web Intelligence-Companion Volume (pp. 80-85).
[22] Rößler, J., Sun, J., & Gloor, P. (2021). Reducing Videoconferencing Fatigue through Facial Emotion
Recognition. Future Internet, 13(5), 126.
[23] Kaiser, J. F. (1993, April). Some useful properties of Teager's energy operators. In 1993 IEEE International
Conference on Acoustics, Speech, and Signal Processing (Vol. 3, pp. 149-152). IEEE.
[24] Ingale, A. B., & Chaudhari, D. S. (2012). Speech emotion recognition. International Journal of Soft Computing
and Engineering (IJSCE), 2(1), 235-238.
[25] Guidi, A., Gentili, C., Scilingo, E. P., & Vanello, N. (2019). Analysis of speech features and personality traits.
Biomedical Signal Processing and Control, 51, 1-7.
[26] Gupta, D., Bansal, P., & Choudhary, K. (2018). The state of the art of feature extraction techniques in speech
recognition. Speech and Language Processing For Human-Machine Communications, 195-207.
[27] Vasquez-Correa, J. C., Arias-Vergara, T., Orozco-Arroyave, J. R., Vargas-Bonilla, J. F., & Noeth, E. (2016,
October). Wavelet-based time-frequency representations for automatic recognition of emotions from speech. In
Speech Communication; 12. ITG Symposium (pp. 1-5). VDE.
[28] Li, X., Li, X., Zheng, X., & Zhang, D. (2010). EMD-TEO based speech emotion recognition. In Life System
Modeling and Intelligent Computing (pp. 180-189). Springer, Berlin, Heidelberg.
[29] Daneshfar, F., Kabudian, S. J., & Neekabadi, A. (2020). Speech emotion recognition using hybrid spectral-
prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian
elliptical basis function network classifier. Applied Acoustics, 166, 107360.
[30] Palo, H. K., & Mohanty, M. N. (2018). Wavelet-based feature combination for recognition of emotions. Ain
Shams Engineering Journal, 9(4), 1799-1806.
[31] McCraty, R. "Science of the heart: Exploring the role of the heart in human performance (Vol. 2)". Boulder
Creek, CA: HeartMath Institute (2015).
[32] Zheng, W. (2016). Multichannel EEG-based emotion recognition via group sparse canonical correlation
analysis. IEEE Transactions on Cognitive and Developmental Systems, 9(3), 281-290.
[33] Jing, C., Liu, G., & Hao, M. (2009, July). The research on emotion recognition from ECG signal. In 2009
International Conference on Information Technology and Computer Science (Vol. 1, pp. 497-500). IEEE.
[34] Xiefeng, C., Wang, Y., Dai, S., Zhao, P., & Liu, Q. (2019). Heart sound signals can be used for emotion
recognition. Scientific reports, 9(1), 1-11.
[35] Wu, G., Liu, G., & Hao, M. (2010, October). The analysis of emotion recognition from GSR based on PSO. In
2010 International Symposium on Intelligence Information Processing and Trusted Computing (pp. 360-363). IEEE.
[36] Philippot, P., Chapelle, G., & Blairy, S. (2002). Respiratory feedback in the generation of emotion. Cognition &
Emotion, 16(5), 605-627.
[37] Hasnul, M. A., Alelyani, S., & Mohana, M. (2021). Electrocardiogram-Based Emotion Recognition Systems
and Their Applications in Healthcare—A Review. Sensors, 21(15), 5015.
[38] Ferdinando, H., Seppänen, T., & Alasaarela, E. (2016, October). Comparing features from ECG pattern and
HRV analysis for emotion recognition system. In 2016 IEEE Conference on Computational Intelligence in
Bioinformatics and Computational Biology (CIBCB) (pp. 1-6). IEEE.
[39] Dissanayake, T., Rajapaksha, Y., Ragel, R., & Nawinne, I. (2019). An ensemble learning approach for
electrocardiogram sensor based human emotion recognition. Sensors, 19(20), 4495.
[40] Chen, Genlang, Yi Zhu, Zhiqing Hong, and Zhen Yang. "EmotionalGAN: generating ECG to enhance emotion
state classification." In Proceedings of the 2019 International Conference on Artificial Intelligence and Computer
Science, pp. 309-313. 2019.
[41] Panahi, F., Rashidi, S., & Sheikhani, A. (2021). Application of fractional Fourier transform in feature extraction
from ELECTROCARDIOGRAM and GALVANIC SKIN RESPONSE for emotion recognition. Biomedical Signal
Processing and Control, 69, 102863.
[42] Zhang, Qiang, Xianxiang Chen, Qingyuan Zhan, Ting Yang, and Shanhong Xia. "Respiration-based emotion
recognition with deep learning." Computers in Industry 92 (2017): 84-90.
[43] Das, K., & Pachori, R. B. (2021). Schizophrenia detection technique using multivariate iterative filtering and
multichannel EEG signals. Biomedical Signal Processing and Control, 67, 102525.
[44] Mühl, C., Allison, B., Nijholt, A., & Chanel, G. (2014). A survey of affective brain computer interfaces:
principles, state-of-the-art, and challenges. Brain-Computer Interfaces, 1(2), 66-84.
[45] Lin, Y. P., Wang, C. H., Jung, T. P., Wu, T. L., Jeng, S. K., Duann, J. R., & Chen, J. H. (2010). EEG-based
emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798-1806.
[46] Li, Z., Tian, X., Shu, L., Xu, X., & Hu, B. (2017, August). Emotion recognition from EEG using RASM and
LSTM. In International Conference on Internet Multimedia Computing and Service (pp. 310-318). Springer,
Singapore.
[47] Mert, A., & Akan, A. (2018). Emotion recognition based on time–frequency distribution of EEG signals using
multivariate synchrosqueezing transform. Digital Signal Processing, 81, 106-115.
[48] Gupta, V., Chopda, M. D., & Pachori, R. B. (2018). Cross-subject emotion recognition using flexible analytic
wavelet transform from EEG signals. IEEE Sensors Journal, 19(6), 2266-2274.
[49] Ullah, H., Uzair, M., Mahmood, A., Ullah, M., Khan, S. D., & Cheikh, F. A. (2019). Internal emotion
classification using EEG signal with sparse discriminative ensemble. IEEE Access, 7, 40144-40153.
[50] Bhattacharyya, A., Tripathy, R. K., Garg, L., & Pachori, R. B. (2020). A novel multivariate-multiscale
approach for computing EEG spectral and temporal complexity for human emotion recognition. IEEE Sensors
Journal, 21(3), 3579-3591.
[51] Li, W., Zhang, Z., & Song, A. (2021). Physiological-signal-based emotion recognition: An odyssey from
methodology to philosophy. Measurement, 172, 108747.
[52] Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1997). International affective picture system (IAPS): Technical
manual and affective ratings. NIMH Center for the Study of Emotion and Attention, 1(39-58), 3.
[53] Yang, W., Makita, K., Nakao, T., Kanayama, N., Machizawa, M. G., Sasaoka, T., ... & Miyatani, M. (2018).
Affective auditory stimulus database: An expanded version of the International Affective Digitized Sounds (IADS-
E). Behavior Research Methods, 50(4), 1415-1429.
[54] Martínez-Tejada, L. A., Puertas-González, A., Yoshimura, N., & Koike, Y. (2021). Exploring EEG
Characteristics to Identify Emotional Reactions under Videogame Scenarios. Brain Sciences, 11(3), 378.
[55] Boateng, G., Sels, L., Kuppens, P., Lüscher, J., Scholz, U., & Kowatsch, T. (2020, April). Emotion elicitation
and capture among real couples in the lab. In 1st Momentary Emotion Elicitation & Capture workshop (MEEC
2020, cancelled). ETH Zurich, Department of Management, Technology, and Economics.
[56] Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic
differential. Journal of Behavior Therapy and Experimental Psychiatry, 25(1), 49-59.
[57] Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and
negative affect: the PANAS scales. Journal of personality and social psychology, 54(6), 1063.
[58] Gross, J. J., & Levenson, R. W. (1995). Emotion elicitation using films. Cognition & emotion, 9(1), 87-108.
[59] Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. (2018). Amigos: A dataset for affect, personality and
mood research on individuals and groups. IEEE Transactions on Affective Computing.
[60] Zheng, W. L., & Lu, B. L. (2015). Investigating critical frequency bands and channels for EEG-based emotion
recognition with deep neural networks. IEEE Transactions on Autonomous Mental Development, 7(3), 162-175.
[61] Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., ... & Patras, I. (2011). Deap: A
database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18-31.
[62] Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2011). A multimodal database for affect recognition and
implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42-55.
[63] Subramanian, R., Wache, J., Abadi, M. K., Vieriu, R. L., Winkler, S., & Sebe, N. (2016). ASCERTAIN:
Emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing, 9(2),
147-160.
[64] Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. (2018). Amigos: A dataset for affect, personality and
mood research on individuals and groups. IEEE Transactions on Affective Computing.
[65] Katsigiannis, S., & Ramzan, N. (2017). DREAMER: A database for emotion recognition through EEG and
ECG signals from wireless low-cost off-the-shelf devices. IEEE Journal of Biomedical and Health Informatics,
22(1), 98-107.
[66] Dragomiretskiy, K., & Zosso, D. (2013). Variational mode decomposition. IEEE Transactions On Signal
Processing, 62(3), 531-544.
[67] Gilles, J. (2013). Empirical wavelet transform. IEEE transactions on signal processing, 61(16), 3999-4010.
[68] Selesnick, I. W. (2011). Wavelet transform with tunable Q-factor. IEEE Transactions on Signal
Processing, 59(8), 3560-3575.
[69] Daubechies, I., Lu, J., & Wu, H. T. (2011). Synchrosqueezed wavelet transforms: An empirical mode
decomposition-like tool. Applied and Computational Harmonic Analysis, 30(2), 243-261.
[70] Singh, P., Joshi, S. D., Patney, R. K., & Saha, K. (2016). Fourier-based feature extraction for classification of
EEG signals using EEG rhythms. Circuits, Systems, and Signal Processing, 35(10), 3700-3715.
[71] Xu, D., & Erdogmuns, D. (2010). Renyi's entropy, divergence and their nonparametric estimators. In
Information Theoretic Learning (pp. 47-102). Springer, New York, NY.
[72] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[73] Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing
Letters, 9(3), 293-300.
[74] Tuncer, T., Dogan, S., & Subasi, A. (2021). A new fractal pattern feature generation function based emotion
recognition method using EEG. Chaos, Solitons & Fractals, 144, 110671.
[75] Li, Y., Huang, J., Zhou, H., & Zhong, N. (2017). Human emotion recognition with electroencephalographic
multidimensional features by hybrid deep neural networks. Applied Sciences, 7(10), 1060.
[76] Zhang, W., Wang, F., Jiang, Y., Xu, Z., Wu, S., & Zhang, Y. (2019, August). Cross-subject EEG-based
emotion recognition with deep domain confusion. In International Conference On Intelligent Robotics And
Applications (pp. 558-570). Springer, Cham.
[77] Zhong, P., Wang, D., & Miao, C. (2020). EEG-based emotion recognition using regularized graph neural
networks. IEEE Transactions on Affective Computing.
[78] Cimtay, Y., & Ekmekcioglu, E. (2020). Investigating the use of pretrained convolutional neural network on
cross-subject and cross-dataset EEG emotion recognition. Sensors, 20(7), 2034.
[79] Wang, F., Wu, S., Zhang, W., Xu, Z., Zhang, Y., Wu, C., & Coleman, S. (2020). Emotion recognition with
convolutional neural network and EEG-based EFDMs. Neuropsychologia, 146, 107506.
[80] Wei, C., Chen, L. L., Song, Z. Z., Lou, X. G., & Li, D. D. (2020). EEG-based emotion recognition using simple
recurrent units network and ensemble learning. Biomedical Signal Processing and Control, 58, 101756.
[81] Khateeb, M., Anwar, S. M., & Alnowami, M. (2021). Multi-Domain Feature Fusion for Emotion Classification
Using DEAP Dataset. IEEE Access, 9, 12134-12142.
[82] Haqque, R. H. D., Djamal, E. C., & Wulandari, A. (2021, September). Emotion Recognition of EEG Signals
Using Wavelet Filter and Convolutional Neural Networks. In 2021 8th International Conference on Advanced
Informatics: Concepts, Theory and Applications (ICAICTA) (pp. 1-6). IEEE.
[book] Daubechies, I. (1992). Ten lectures on wavelets. Society for industrial and applied mathematics.