ArticlePDF Available

Fuzzy Approach for Audio-Video Emotion Recognition in Computer Games for Children

Authors:

Figures

Content may be subject to copyright.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 231 (2024) 771–778
1877-0509 © 2024 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the Conference Program Chairs
10.1016/j.procs.2023.12.139
10.1016/j.procs.2023.12.139 1877-0509
© 2024 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the Conference Program Chairs
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
Soft Computing and Intelligent Systems: Theory and Applications
(SCISTA 2023)
November 7-9, 2023, Almaty, Kazakhstan
Fuzzy Approach for Audio-Video Emotion Recognition in
Computer Games for Children
Pavel Kozlova, Alisher Akrama, Pakizar Shamoia,
aSchool of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
Abstract
Computer games are widespread nowadays and enjoyed by people of all ages. But when it comes to kids, playing these games can
be more than just fun—it’s a way for them to develop important skills and build emotional intelligence. Facial expressions and
sounds that kids produce during gameplay reflect their feelings, thoughts, and moods. In this paper, we propose a novel framework
that integrates a fuzzy approach for the recognition of emotions through the analysis of audio and video data. Our focus lies within
the specific context of computer games tailored for children, aiming to enhance their overall user experience. We use the FER
dataset to detect facial emotions in video frames recorded from the screen during the game. For the audio emotion recognition of
sounds a kid produces during the game, we use CREMA-D, TESS, RAVDESS, and Savee datasets. Next, a fuzzy inference system
is used for the fusion of results. Besides this, our system can detect emotion stability and emotion diversity during gameplay, which,
together with prevailing emotion report, can serve as valuable information for parents worrying about the eect of certain games on
their kids. The proposed approach has shown promising results in the preliminary experiments we conducted, involving 3 dierent
video games, namely fighting, racing, and logic games, and providing emotion-tracking results for kids in each game. Our study
can contribute to the advancement of child-oriented game development, which is not only engaging but also accounts for children’s
cognitive and emotional states.
©2018 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
Keywords: fuzzy logic, video emotion recognition, audio emotion recognition, computer games, facial expression, user experience.
1. Introduction
Many parents are concerned about how much time their children spend playing computer games, and concerns
regarding the eects of video games on aspects like mental health and cognitive abilities have become consistent
topics in societal dialogues [12]. A majority of parents, specifically 64%, held the belief that video games were
Corresponding author. Tel.: +7-701-349-0001.
E-mail address: p.shamoi@kbtu.kz
1877-0509 ©2018 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
Soft Computing and Intelligent Systems: Theory and Applications
(SCISTA 2023)
November 7-9, 2023, Almaty, Kazakhstan
Fuzzy Approach for Audio-Video Emotion Recognition in
Computer Games for Children
Pavel Kozlova, Alisher Akrama, Pakizar Shamoia,
aSchool of Information Technology and Engineering, Kazakh-British Technical University, Almaty, Kazakhstan
Abstract
Computer games are widespread nowadays and enjoyed by people of all ages. But when it comes to kids, playing these games can
be more than just fun—it’s a way for them to develop important skills and build emotional intelligence. Facial expressions and
sounds that kids produce during gameplay reflect their feelings, thoughts, and moods. In this paper, we propose a novel framework
that integrates a fuzzy approach for the recognition of emotions through the analysis of audio and video data. Our focus lies within
the specific context of computer games tailored for children, aiming to enhance their overall user experience. We use the FER
dataset to detect facial emotions in video frames recorded from the screen during the game. For the audio emotion recognition of
sounds a kid produces during the game, we use CREMA-D, TESS, RAVDESS, and Savee datasets. Next, a fuzzy inference system
is used for the fusion of results. Besides this, our system can detect emotion stability and emotion diversity during gameplay, which,
together with prevailing emotion report, can serve as valuable information for parents worrying about the eect of certain games on
their kids. The proposed approach has shown promising results in the preliminary experiments we conducted, involving 3 dierent
video games, namely fighting, racing, and logic games, and providing emotion-tracking results for kids in each game. Our study
can contribute to the advancement of child-oriented game development, which is not only engaging but also accounts for children’s
cognitive and emotional states.
©2018 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
Keywords: fuzzy logic, video emotion recognition, audio emotion recognition, computer games, facial expression, user experience.
1. Introduction
Many parents are concerned about how much time their children spend playing computer games, and concerns
regarding the eects of video games on aspects like mental health and cognitive abilities have become consistent
topics in societal dialogues [12]. A majority of parents, specifically 64%, held the belief that video games were
Corresponding author. Tel.: +7-701-349-0001.
E-mail address: p.shamoi@kbtu.kz
1877-0509 ©2018 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
772 Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778
2Author name /Procedia Computer Science 00 (2018) 000–000
Fig. 1: Basic human emotions. Fig. 2: Thayer’s arousal-valence emotion plane [24], [27]
.
responsible for fostering addiction. Furthermore, more than one out of every ve parents were worried about video
games aecting their own child [5].
At the same time, most parents recognize the benefits of games and allow children to download applications [4]. It
has long been assumed that emotions have a strong influence on human behavior, actions, and mental abilities [14].
Many teachers are convinced that the correct handling of games can be useful for the development of a child as well.
The development of cognitive skills based on games depends on the quality of the content oered by the developers
of such games. The emotional state of the child during the game largely determines the interest in the computer game.
There are six classic human emotions: happiness, surprise, fear, disgust, anger, and sadness. However, according
to recent findings [17], basic emotion transmission is divided into four (rather than six) types. Specifically, in the
early phases, anger and disgust, as well as fear and surprise, are perceived identically. A wrinkled nose, for example,
expresses both anger and disgust, while lifted eyebrows communicate surprise and fear. Basic human emotions are
represented in Fig. 1. We use Thayer’s arousal-valence emotion plane [24] as our taxonomy and use seven emotions
(six basic and neutral) belonging to one of the four quadrants of the emotion plane (See Fig. 2). In adults, the expres-
sion of emotions is less natural and determined by upbringing and cultural code. In general, the language of emotions
is more universal, but there are some dierences in facial expressions and gestures among dierent people.
The field of emotion recognition (ER) has gained growing interest in recent times. The complexity arising from
factors like dierent poses, illumination conditions, motion blurring, and more makes the identification of emotions
from audio-video sources a challenging task [29]. Moreover, a limited number of works investigate kids’ ER. One of
the recent studies explored the use of ER to improve online school learning [18].
Most emotion recognition algorithms are still limited to a single modality at the moment. However, in everyday
life, humans frequently conceal their true feelings, which leads to the dilemma that single-modal emotion recognition
accuracy is relatively poor [26]. The majority of works in this area employ CNN models for ER tasks. Some studies
use multiple deep models (CNN, RNN for images and SVM, LSTM for acoustic features) [8]. The authors of the other
study proposed a multi-modal residual perceptron network for multimodal ER [3]. An interesting approach has been
proposed for ER based on generating a representative set of frames from videos using the eigenspace domain and
Principal component analysis [9]. Another paper introduces Spatiotemporal attention-based multimodal deep neural
networks for dimensional ER [11].
Parents, wanting to occupy their child with something useful on a smartphone or computer, are not always sure
about the eects of such games. Current cognitive learning games very often forget about the relationship between
cognition and emotions, which is characterized by a ”hot executive function”. [7]. Game developers are now more
focused only on the end result, forgetting about the exciting gameplay and emotions that children experience in the
process. The interaction of children with educational games and applications should be easily accessible and engaging
while encouraging them to achieve and complete tasks.
In this article, we introduce the framework that integrates a fuzzy logic approach to precisely identify emotions by
analyzing both audio and video data. Our primary emphasis is on computer games designed for children, with the goal
Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778 773
Author name /Procedia Computer Science 00 (2018) 000–000 3
Fig. 3: Methodology representation.
Fig. 4: Some samples from FER with children emotions Fig. 5: Frequency of emotions in a dataset
of improving their overall gaming experience. By automatically monitoring the emotions of children as they navigate
through gameplay, developers can pinpoint pivotal moments within the gaming experience and can make games that
truly connect with kids. Our contributions include the proposed fuzzy fusion technique and exploring the audio-video
emotional stability and diversity besides emotions.
2. Methodology
The proposed approach is shown in Fig. 3. Our framework has two important stages - feature extraction and emotion
detection, fuzzy fusion of emotions. From incoming player video data we extract audio and video frames, perform
feature extraction, detect emotions, and pass them to a fuzzy inference system to perform the fusion.
2.1. FER 2013
We used the facial expression recognition 2013 (FER 2013) emotion dataset, [6], which was presented at the con-
ference [19]. This database contains 35887 black-and-white images of people’s faces with a resolution of 48x48 pixels.
All images were divided into 7 categories: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral.
An example of children’s emotions from the database is shown in Fig. 4.
The dataset contains images not only of fully open faces but also partially closed, for example with a hand, low-
contrast images, and images of people in glasses, etc. The dataset is divided into two parts, test and training. The test
set is needed to compare the recognition accuracy among other models. The training set is needed for the training and
optimization of models. The FER2013 dataset, divided into emotion categories, is shown in Fig. 5. The FER dataset is
normalized in such a way that its data resembles a normal distribution: zero expectation and unit variance. Output data
is registered in seven categories: ”angry”, ”disgust”, ”fear”, ”happy”, ”sad”, ”surprise”, and ”neutral”. Each emotion
is evaluated by the result on a scale from 0 to 1.
774 Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778
4Author name /Procedia Computer Science 00 (2018) 000–000
Table 1: Fuzzy rules used in the fuzzy inference system.
Rules Audio Emotion Intensity Video Emotion Intensity Overall Emotion Intensity
1 Low Low Little Bit
2 Low Medium Sometimes
3 Low High High
4 Medium Low Sometimes
5 Medium Medium High
6 Medium High Very High
7 High Low Sometimes
8 Medium Medium Very High
9 High High Extremely High
The project was built on a convolutional neural network-based model [23]. You can also retrain the model if
needed when calling and initializing the model. The accuracy of the CNN model on test data is 78%. The constructor
parameter is the Multitask Cascaded Neural Network (MTCNN) facial recognition technique. When the value is
”True”, the MTCNN model is used to detect faces, and when the value is set to ”False”, the function uses the OpenCV
Haar Cascade classifier by default.
2.2. Audio Emotion Recognition
Audio, or Speech Emotion Recognition (SER) involves identifying human emotions and audio signals. Voice pat-
terns frequently convey underlying emotions via variations in tone and pitch. In order to detect emotions in audio
extracted from player video, we used Speech Emotion Recognition model [1], which was trained using the well-
known datasets of audio clips annotated with emotions, namely, Crowd Sourced Emotional Multimodal Actors Dataset
(CREMA-D, contains 7,442 audio clips) [2], Toronto emotional speech set (TESS, 2800 audio files) [16], Ryerson
Audio-Visual Database of Emotional Speech and Song (RAVDESS, contains 1440 audio files) [13], Surrey Audio-
Visual Expressed Emotion (Savee, 480 audio files) [10]. The accuracy of the model on test data is 60.74 %.
2.3. PyAutoGUI
We use PyAutoGUI for automatic screening of the screen with the game and the child’s face every second. PyAu-
toGUI is a cross-platform library for automating actions on a computer using Python scripts. With the help of this
library, a screen with a logic game played by a child and a child’s face will be displayed.
2.4. Fuzzy Sets and Logic
There are some diculties in recognizing emotions in faces, such as emotion ambiguity and a small number of
emotion classes in comparison to human emotions [25]. Fuzzy logic is a powerful tool for handling imprecision and
it was used in several studies to express emotions and their intensity [20]. A fuzzy set is a class of objects that has
a range of membership grades [28]. The main reason we used fuzzy sets and logic in our study is that they help us
to rate emotions in a human-consistent manner because they do not have clearly defined bounds. Despite being less
accurate, a language value is closer to human cognitive processes than a number [22].
We partition the spectrum of possible emotions corresponding to linguistic tags [21]. We have two input variables
for each emotion - Audio and Video Emotion Intensity. The output variable is simply the Overall Emotion Intensity in
percentage points (see Fig. 6). As can be seen from the Fig, we have ’Low’, ”Medium, and ’High’ fuzzy sets for both
input variables and ’A little bit’, ’Sometimes’, ’High’, ’Very High’, and ’Extremely High’ for the output variable.
To build fuzzy relationships between input and output variables we use fuzzy rules. In our fuzzy inference system
we have nine fuzzy rules as shown in Table 1. The detailed example is provided in section Application and Results.
Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778 775
4Author name /Procedia Computer Science 00 (2018) 000–000
Table 1: Fuzzy rules used in the fuzzy inference system.
Rules Audio Emotion Intensity Video Emotion Intensity Overall Emotion Intensity
1 Low Low Little Bit
2 Low Medium Sometimes
3 Low High High
4 Medium Low Sometimes
5 Medium Medium High
6 Medium High Very High
7 High Low Sometimes
8 Medium Medium Very High
9 High High Extremely High
The project was built on a convolutional neural network-based model [23]. You can also retrain the model if
needed when calling and initializing the model. The accuracy of the CNN model on test data is 78%. The constructor
parameter is the Multitask Cascaded Neural Network (MTCNN) facial recognition technique. When the value is
”True”, the MTCNN model is used to detect faces, and when the value is set to ”False”, the function uses the OpenCV
Haar Cascade classifier by default.
2.2. Audio Emotion Recognition
Audio, or Speech Emotion Recognition (SER) involves identifying human emotions and audio signals. Voice pat-
terns frequently convey underlying emotions via variations in tone and pitch. In order to detect emotions in audio
extracted from player video, we used Speech Emotion Recognition model [1], which was trained using the well-
known datasets of audio clips annotated with emotions, namely, Crowd Sourced Emotional Multimodal Actors Dataset
(CREMA-D, contains 7,442 audio clips) [2], Toronto emotional speech set (TESS, 2800 audio files) [16], Ryerson
Audio-Visual Database of Emotional Speech and Song (RAVDESS, contains 1440 audio files) [13], Surrey Audio-
Visual Expressed Emotion (Savee, 480 audio files) [10]. The accuracy of the model on test data is 60.74 %.
2.3. PyAutoGUI
We use PyAutoGUI for automatic screening of the screen with the game and the child’s face every second. PyAu-
toGUI is a cross-platform library for automating actions on a computer using Python scripts. With the help of this
library, a screen with a logic game played by a child and a child’s face will be displayed.
2.4. Fuzzy Sets and Logic
There are some diculties in recognizing emotions in faces, such as emotion ambiguity and a small number of
emotion classes in comparison to human emotions [25]. Fuzzy logic is a powerful tool for handling imprecision and
it was used in several studies to express emotions and their intensity [20]. A fuzzy set is a class of objects that has
a range of membership grades [28]. The main reason we used fuzzy sets and logic in our study is that they help us
to rate emotions in a human-consistent manner because they do not have clearly defined bounds. Despite being less
accurate, a language value is closer to human cognitive processes than a number [22].
We partition the spectrum of possible emotions corresponding to linguistic tags [21]. We have two input variables
for each emotion - Audio and Video Emotion Intensity. The output variable is simply the Overall Emotion Intensity in
percentage points (see Fig. 6). As can be seen from the Fig, we have ’Low’, ”Medium, and ’High’ fuzzy sets for both
input variables and ’A little bit’, ’Sometimes’, ’High’, ’Very High’, and ’Extremely High’ for the output variable.
To build fuzzy relationships between input and output variables we use fuzzy rules. In our fuzzy inference system
we have nine fuzzy rules as shown in Table 1. The detailed example is provided in section Application and Results.
Author name /Procedia Computer Science 00 (2018) 000–000 5
Fig. 6: Input Fuzzy sets for Emotion Intensity (the same for audio and video) and Output Fuzzy Sets for Overall Emotion Intensity.
Fig. 7: Example of the application interface.
3. Application and Results
3.1. Prototype Application
Fig. 7illustrates the prototype application mockup. As it can be seen, it allows to track the average emotion,
prevailing emotions in audio and video, emotional stability and diversity. As a result, parents will be oered a report
about the feelings their child had while playing dierent computer games, emotional stability/instability and what
were the prevailing emotions associated with this condition. This report would be really helpful for parents. It would
let them see the eect of dierent games and how much their child can play. Together with a psychologist, it can help
to figure out the best way to manage gaming based on their child’s emotions.
3.2. Experimental Results
In this section, we present our preliminary experimental results. We conducted an experiment on a 7-year-old child
and looked at his emotions during three games: a fighting,racing, and logic game. The speech emotion analysis led to
the following results obtained on 10-sec audio extracted from the video and analyzed for each second:
776 Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778
6Author name /Procedia Computer Science 00 (2018) 000–000
Table 2: Emotions recognition results from video corresponding to a Fight game. 5 illustrative frames (F1, F2, F3, F4, F5) were chosen among 262
to show emotions change. These frames correspond to the ones presented in Fig. 9.
Emotion F1 ... F2 ... F3 ... F4 ... F5 Mean Median Variance SD
Happy 0.04 ... 0.01 ... 0.84 ... 0.54 ... 0.04 0.18 0.07 0.05 0.23
Angry 0.01 ... 0.02 ... 0.01 ... 0.01 ... 0.02 0.04 0.03 0.001 0.03
Disgust 0 ... 0 ... 0 ... 0 ... 0 0.0 0.0 0.0 0.0
Fear 0.03 ... 0.06 ... 0.01 ... 0.02 ... 0.02 0.06 0.05 0.003 0.05
Neutral 0.87 ... 0.65 ... 0.14 ... 0.38 ... 0.84 0.59 0.63 0.04 0.2
Sad 0.05 ... 0.26 ... 0 ... 0.02 ... 0.08 0.11 0.07 0.01 0.11
Surprise 0 ... 0 ... 0 ... 0.02 ... 0 0.01 0.01 0.001 0.03
(a) Fight Game (b) Racing Game (c) Logic Game (tic-tac-toe)
Fig. 8: Video Emotion Recognition Results
Fight game - [’disgust’ sad’ sad’ sad’ disgust’ ’happy’ ’fearful’ sad’ sad’ disgust’]
Racing game - [’sad’ sad’ sad’ ’fearful’ ’neutral’ ’neutral’ ’fearful’ ’happy’ sad’ disgust’]
Logic Game - [’neutral’ ’neutral’ ’neutral’ ’neutral’ disgust’ ’angry’ ’neutral’ ’neutral’ sad’ ’neutral’]
Video emotion recognition results for a Fight game are shown in Fig. 9. Table 2illustrates the emotion detection
results for some video frames. To illustrate how emotions evolve, 5 exemplary frames (F1, F2, F3, F4, F5) were
selected from the set of 262 frames. These frames match those in Fig. 9. Fig. 8shows emotion tracking results for
each of the selected games. As we can see from Fig. 8,Fight game data exhibits emotional instability and more
emotional diversity than other games, including happy,neutral,sad, and fear emotions. We can also see that for a
Logic game, there is one leading neutral emotion. Emotional stability can be related to standard deviation (Table 2)
and emotional diversity - with the number of emotions with High or Medium intensity.
We can now simulate our fuzzy system by simply specifying the inputs and applying the defuzzification method.
For example, let us find out what would be the overall emotion intensity in the following scenario: the audio and
video intensity values for Happy Emotion are 12% and 85% respectively. Then, the output membership functions
are combined using the maximum operator (fuzzy aggregation). Next, to get a crisp answer, we need to perform
defuzzification, for that we use a centroid method. As a result of performing aggregation based on fuzzy rules, we get
47.55 % as the overall intensity for a Happy emotion. The visualized result is presented in Fig. 10.
4. Conclusion
Our study aimed to highlight the significance of audio-video ER in developing more engaging, safe, and eective
games for children. We proposed a fuzzy logic-based approach to aggregate the emotions detected from video frames
and sounds. For that, we used FER emotion library as the basis.
Our work can contribute to the advancement of emotionally aware computer games. Using ER software, game
developers can identify problems and work on their elimination and enhancement of user experience aiming for
Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778 777
Author name /Procedia Computer Science 00 (2018) 000–000 7
Fig. 9: Video emotion detection results for the Fight game and Thayer’s arousal-valence emotion planes for each of the 5 selected frames.
(a) Applying input 12% on Audio Intensity fuzzy set (b) Applying input 85% on Video Intensity fuzzy set (c) Aggregated Membership and Result, 47.55%
Fig. 10: Simulation Results.
games that connect with them on a deeper level. Parents can control the influence of certain games on their kids and
track the emotions associated with them.
The study has certain limitations. Adult’s face diers from a child’s, but we made training on all kinds of faces.
Moreover, interpreting emotions solely through audio and video signals might not capture the complete emotional
context of gameplay, because children of dierent ages might have varying emotional responses and cognitive abili-
ties. Despite its limitations, preliminary results demonstrate that it is a promising tool that has the potential to make
computer games more child-oriented based on emotional data.
The study has several open questions. In particular, we want to understand more about how emotions from sound
and video connect and contrast with each other. Researchers like [3], [29], [8], [11], [26] have delved into similar
areas.
As for future work, we plan to test the system in real settings to see how well it performs. Future experiments will
involve more participants of dierent ages and engaging with dierent game types. After testing, it will be possible
to conduct interviews with the participants to ask clarifying questions. According to recent findings, the expression of
fear and neutral emotions between adults and children is quite dierent between kids and adults [15]. So, we plan to
improve the ER framework by training the models on kids’ faces and sounds only.
778 Pavel Kozlov et al. / Procedia Computer Science 231 (2024) 771–778
8Author name /Procedia Computer Science 00 (2018) 000–000
References
[1] Burnwal, S., . Speech emotion recognition (kaggle). http://www.kaggle.com/code/shivamburnwal/
speech-emotion- recognition/notebook. Accessed: 2023-08-25.
[2] Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., Verma, R., 2014. Crema-d: Crowd-sourced emotional multimodal actors
dataset. IEEE Transactions on Aective Computing 5, 377–390. doi:10.1109/TAFFC.2014.2336244.
[3] Chang, X., Skarbek, W., 2021. Multi-modal residual perceptron network for audio–video emotion recognition. Sensors 21. URL: https:
//www.mdpi.com/1424-8220/21/16/5452, doi:10.3390/s21165452.
[4] Dore, R.A., Logan, J., Lin, T.J., Purtell, K.M., Justice, L.M., 2020. Associations between children’s media use and language and literacy skills.
Frontiers in Psychology 11. doi:10.3389/fpsyg.2020.01734.
[5] Frontier, . How parents perceive their children’s video game habits. URL: https://frontier.com/resources/
e-is- for-everyone- video-game- study.
[6] Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., Zhou, Y.,
Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J., Ionescu, R., Popescu, M., Grozea, C.,
Bergstra, J., Xie, J., Romaszko, L., Xu, B., Chuang, Z., Bengio, Y., 2013. Challenges in representation learning: A report on three machine
learning contests. arXiv:1307.0414.
[7] Gray, S.I., Robertson, J., Manches, A., Rajendran, G., 2019. Brainquest: The use of motivational design theories to create a cognitive training
game supporting hot executive function. Int-l Journal of Human-Computer Studies 127, 124–149. doi:10.1016/J.IJHCS.2018.08.004.
[8] Guo, X., Polan´
ıa, L.F., Barner, K.E., 2020. Audio-video emotion recognition in the wild using deep hybrid networks. arXiv:2002.09023.
[9] Hajarolasvadi, N., Demirel, H., 2020. Deep facial emotion recognition in video using eigenframes. Image Processing, IET doi:10.1049/
iet-ipr.2019.1566.
[10] Haq, S., Jackson, P., 2009. Speaker-dependent audio-visual emotion recognition, in: Proc. Int. Conf. on Auditory-Visual Speech Processing
(AVSP’08), Norwich, UK.
[11] Lee, J., Kim, S., Kim, S., Sohn, K., 2018. Audio-visual attention networks for emotion recognition, in: Proceedings of the 2018 Workshop
on Audio-Visual Scene Understanding for Immersive Multimedia, Ass-n for Comp. Mach-y, New York, NY, USA. p. 27–32. doi:10.1145/
3264869.3264873.
[12] Lieberoth, A., Fiskaali, A., 2021. Can worried parents predict eects of video games on their children? a case-control study of cognitive
abilities, addiction indicators and wellbeing. Front. in Psychology 11. doi:10.3389/fpsyg.2020.586699.
[13] Livingstone, S.R., Russo, F.A., 2019. Ravdess emotional speech audio. URL: https://www.kaggle.com/dsv/256618, doi:10.34740/
KAGGLE/DSV/256618.
[14] Metcalfe, J., Mischel, W., 1999. A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review 106,
3–19. doi:10.1037/0033-295X.106.1.3.
[15] Park, H., Shin, Y., Song, K., Yun, C., Jang, D., 2022. Facial emotion recognition analysis based on age-biased data. Applied Sciences 12.
URL: https://www.mdpi.com/2076-3417/12/16/7992, doi:10.3390/app12167992.
[16] Pichora-Fuller, M.K., Dupuis, K., 2020. Toronto emotional speech set (TESS). doi:10.5683/SP2/E8H2MF.
[17] Rachael E. Jack, Oliver G.B. Garrod, P.G.S., 2014 (accessed January 02, 2014). Dynamic facial expressions of emotion transmit an evolving
hierarchy of signals over time .
[18] Rathod, M., Dalvi, C., Kaur, K., Patil, S., Gite, S., Kamat, P., Kotecha, K., Abraham, A., Gabralla, L., 2022. Kids’ emotion recognition using
various deep-learning models with explainable ai. Sensors 22, 8066. doi:10.3390/s22208066.
[19] Sambare, M., . Fer13. URL: https://www.kaggle.com/datasets/msambare/fer2013.
[20] Shamoi, E., Turdybay, A., Shamoi, P., Akhmetov, I., Jaxylykova, A., Pak, A., 2022. Sentiment analysis of vegan related tweets using mutual
information for feature selection. PeerJ Computer Science 8, e1149. doi:10.7717/peerj-cs.1149.
[21] Shamoi, P., Inoue, A., , Kawanaka, H., 2016. Fhsi: Toward more human-consistent color representation. Journal of Advanced Computational
Intelligence and Intelligent Informatics 20. doi:10.20965/jaciii.2016.p0393.
[22] Shamoi, P., Inoue, A., 2012. Computing with words for direct marketing support system, in: Midwest Artificial Intelligence and Cognitive
Science Conference. URL: http://ceur-ws.org/Vol- 841/submission_36.pdf.
[23] Shenk, J., CG, A., Arriaga, O., Owlwasrowk, 2021. justinshenk/fer: Zenodo. doi:10.5281/zenodo.5362356.
[24] Thayer, R.E., 2000. Mood regulation and general arousal systems. Psychological Inquiry 11, 202–204. URL: http://www.jstor.org/
stable/1449805.
[25] Ualibekova, A., Shamoi, P., 2022. Music emotion recognition using k-nearest neighbors algorithm, in: 2022 International Conference on Smart
Information Systems and Technologies (SIST), pp. 1–6. doi:10.1109/SIST54437.2022.9945814.
[26] Wu, X., Tian, M., Zhai, L., 2022. Icanet: A method of short video emotion recognition driven by multimodal data. arXiv:2208.11346.
[27] Yang, y.h., Su, Y.F., Lin, Y.C., Chen, H., 2007. Music emotion recognition: The role of individuality. Proceedings of the ACM International
Multimedia Conference and Exhibition , 13–22doi:10.1145/1290128.1290132.
[28] Zadeh, L.A., 1965. Fuzzy sets. Information and Control 8, 338–353. doi:10.1016/S0019-9958(65)90241- X.
[29] Zhou, H., Meng, D., Zhang, Y., Peng, X., Du, J., Wang, K., Qiao, Y., 2019. Exploring emotion features and fusion strategies for audio-video
emotion recognition, in: 2019 Int-l Conf. on Multimodal Interaction, ACM. doi:10.1145/3340555.3355713.
... The methodologies exploring fuzzy dominant colors in ecommerce [16] and color-emotion associations in abstract art [17] demonstrate color's role in eliciting emotional responses and its digital applications. Further research into children's color-emotion associations [18] and emotion recognition in children's computer games using fuzzy approaches [19] broadens our understanding of the varied influences on color perception from an early age, including cultural and natural factors. ...
... Several studies have utilized a fuzzy approach to capture the intensity and classification of emotions, helping to distinguish between various emotional labels [34], [19], [35]. Analysis of emotions can be derived from various channels, such as colors [35], music [36], faces [19], multichannel [37], emoji [38]. ...
... Several studies have utilized a fuzzy approach to capture the intensity and classification of emotions, helping to distinguish between various emotional labels [34], [19], [35]. Analysis of emotions can be derived from various channels, such as colors [35], music [36], faces [19], multichannel [37], emoji [38]. Each channel provides unique data for a more detailed understanding of emotional states. ...
Preprint
Full-text available
It's widely recognized that the colors used in branding significantly impact how a brand is perceived. This research explores the influence of color in logos on consumer perception and emotional response. We investigate the associations between color usage and emotional responses in food and beverage marketing. Using a dataset of 644 companies, we analyzed the dominant colors in brand logos using k-means clustering to develop distinct color palettes. Concurrently, we extracted customer sentiments and emotions from Google Maps reviews of these companies (n=30,069), categorizing them into five primary emotions: Happiness, Anger, Sadness, Fear, and Surprise. These emotional responses were further categorized into four intensity levels: Low, Medium, Strong, and Very Strong, using a fuzzy sets approach. Our methodology involved correlating specific color palettes with the predominant emotional reactions associated with each brand. By merging the color palettes of companies that elicited similar emotional responses, we identified unique color palettes corresponding to each emotional category. Our findings suggest that among the food companies analyzed, the dominant emotion was Happiness, with no instances of Anger. The colors red and gray were prevalent across all emotional categories, indicating their importance in branding. Specific color-emotion correlations confirmed by our research include associations of yellow with Happiness, blue with Sadness, and bright colors with Surprise. This study highlights the critical role of color in shaping consumer attitudes. The study findings have practical implications for brand designers in the food industry.
... In addition, in future research, we aim to explore the integration of fuzzy logic with ViT, a recent trend [57]. Fuzzy logic is a powerful tool for handling imprecision and uncertainty [58], which could enhance the robustness and adaptability of face antispoofing models, particularly in scenarios with ambiguous or uncertain data. Finally, real-world testing and deployment of these models in diverse environments would be valuable in assessing their practical effectiveness and identifying areas for improvement. ...
Article
Full-text available
Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework utilizing CelebA-Spoof, CASIA SURF, and a proprietary dataset. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against traditional models, including CNN Model EfficientNet b2, EfficientNet b2 (Noisy Student), and Mobile ViT on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than other models in terms of accuracy and resistance to different spoofing methods. Our model’s superior performance, particularly in APCER (1.6%), the most critical metric in this domain, underscores its improved ability to detect spoofing relative to other models. Additionally, we collected our own dataset from a biometric application to validate our findings further. This study highlights the superior performance of transformer-based architecture in identifying complex spoofing cues, leading to significant advancements in biometric security.
... Analysis of emotions can be derived from various channels, such as colors [31], music [32], faces [33], [34], multichannel [35,36], emoji [37]. Each channel provides unique data, allowing for a more detailed understanding of emotional states. ...
Preprint
Full-text available
Computer-mediated communication has become more important than face-to-face communication in many contexts. Tracking emotional dynamics in chat conversations can enhance communication, improve services, and support well-being in various contexts. This paper explores a hybrid approach to tracking emotional dynamics in chat conversations by combining DistilBERT-based text emotion detection and emoji sentiment analysis. A Twitter dataset was analyzed using various machine learning algorithms, including SVM, Random Forest, and AdaBoost. We contrasted their performance with DistilBERT. Results reveal DistilBERT's superior performance in emotion recognition. Our approach accounts for emotive expressions conveyed through emojis to better understand participants' emotions during chats. We demonstrate how this approach can effectively capture and analyze emotional shifts in real-time conversations. Our findings show that integrating text and emoji analysis is an effective way of tracking chat emotion, with possible applications in customer service, work chats, and social media interactions.
... However, a limited number of works focused on emotion-based recommendations for groups. Most emotion detection methods are still limited to a single channel at the moment [57]. Another advantage of our approach lies in multi-channel emotion detection, with subsequent consensus estimation. ...
... In addition, in future research, we aim to explore the integration of fuzzy logic with ViT, a recent trend [50], [51]. Fuzzy logic is a powerful tool for handling imprecision and uncertainty [52], which could enhance the robustness and adaptability of face antispoofing models, particularly in scenarios with ambiguous or uncertain data. Finally, real-world testing and deployment of these models in diverse environments would be valuable in assessing their practical effectiveness and identifying areas for improvement. ...
Preprint
Face recognition systems are increasingly used in biometric security for convenience and effectiveness. However, they remain vulnerable to spoofing attacks, where attackers use photos, videos, or masks to impersonate legitimate users. This research addresses these vulnerabilities by exploring the Vision Transformer (ViT) architecture, fine-tuned with the DINO framework. The DINO framework facilitates self-supervised learning, enabling the model to learn distinguishing features from unlabeled data. We compared the performance of the proposed fine-tuned ViT model using the DINO framework against a traditional CNN model, EfficientNet b2, on the face anti-spoofing task. Numerous tests on standard datasets show that the ViT model performs better than the CNN model in terms of accuracy and resistance to different spoofing methods. Additionally, we collected our own dataset from a biometric application to validate our findings further. This study highlights the superior performance of transformer-based architecture in identifying complex spoofing cues, leading to significant advancements in biometric security.
... Pavel Kozlov, Alisher Akram, and Pakizar Shamoi proposed a framework for detecting emotions in video games for children by analyzing the audio and video using facial expression recognition dataset combined with fuzzy logic inference approach. The framework consists of feature extraction from the video data while a fuzzy inference system performs the fusion [17]. Asia Samreen and Syed Asif Ali presented a dataset for human behavior analysis to verify the accuracy of online translation between Roman Urdu and English language and to address the challenge of interpreting mixed codes for emotions. ...
Article
Full-text available
Emotions are expressed as part of ordinary speech. Facial expressions, speaking, utterance, writing, gestures and actions are all examples of how humans convey their emotions. Emotions are visible in a large body of research in the domains of psychology, linguistics, social science and communication.as a result, scientific research in emotion has been explored along multiple dimensions and has drawn research from various fields. This paper proposes a model which automatically learns emotions from texts to address the challenge of emotion recognition, noting that language is a powerful tool for communication. We provide automatic recognition in text form of six primary emotions. The use of microblogging was adopted as a rich source of opinion and emotion data. The text under investigation is made up of data gathered from blogs, which reflect writings with high emotional content and hence are appropriate for the study. The first challenge that comes to mind is to create a corpus that is annotated with emotion-related data. Unlike traditional approaches, which rely mostly on statistical methods, we propose a new method which infers and extracts the causes of emotions by incorporating knowledge and theories from other disciplines, such as sociology. The model incorporates Long Short Term Memory (LSTM) machine learning model capable of correctly predicting and classifying human emotions. The results showed that the model produced a 98 percent training accuracy and 88 percent validation accuracy. This concept can be deployed and used in a variety of corporate domains, including marketing, customer support and even the entertainment industry.
Article
p style="text-align: justify;">The article presents a theoretical review of modern research dedicated to the problem of the influence of video games on the emotional state of players. Despite the large number of works in this area, their results are quite contradictory, therefore, an analysis of meta-analytical data is necessary, as well as a detailed consideration of various factors influencing the effect. It has been shown that there is evidence for negative effects of video games on various indicators of emotional state, evidence for positive effects and evidence for the absence of effects. The effect depends on various additional factors — individual personality traits of the player, the genre of the game, the amount of time spent playing video games, and many others. It is revealed that in some cases, contradictions in the results of empirical studies are explained by the influence of these factors, in particular, the genre of the game has a significant impact, as well as the presence of video game addiction. We also reviewed meta-analytical works that confirmed the positive effect of educational video games on the development of emotional self-regulation skills, therapy for symptoms of autism and depression. Formulating reliable clear conclusions about the presence and magnitude of the effect of entertainment genres of games is still difficult due to the lack of modern meta-analyses.</p
Article
Full-text available
This systematic literature review examines the use of computer games as instructional aids in the teaching and learning of programming. With the ubiquitous nature of technology permeating various aspects of modern life, the integration of gaming devices into educational settings has garnered increased attention. This paper investigates whether computer games, primarily designed for entertainment, can effectively facilitate the teaching and learning of programming concepts. By analysing existing literature, this review aims to provide insights into the potential benefits, challenges, and overall efficacy of using computer games as a pedagogical tool for teaching and learning programming. Key themes explored include student engagement, skill acquisition, and the impact on learning outcomes. The findings of this review contribute to a deeper understanding of the innovative intersection between gaming and programming education and offer practical implications for educators and instructional designers.
Article
Full-text available
Multimodal emotion recognition is a developing field that analyzes emotions through various channels, mainly audio, video, and text. However, existing state-of-the-art systems focus on two to three modalities at the most, utilize traditional techniques, fail to consider emotional interplay, lack the scope to add more modalities, and aren’t efficient in predicting emotions accurately. This research proposes a novel approach using rule-based systems to convert non-verbal cues to text, inspired by a limited prior attempt, which lacked proper benchmarking. It achieves efficient multimodal emotion recognition by utilizing distilRoBERTa, a large language model, fine-tuned with a combined textual representation of audio (such as loudness, spectral flux, MFCCs, pitch stability, and emphasis) and visual features (action units) extracted from videos. This approach is evaluated using datasets RAVDESS and BAUM-1. It achieves high accuracy (93.18% in RAVDESS and 93.69% in BAUM-1) on both datasets, performing on par with the SOTA (State-of-the-Art) systems, if not slightly better. Furthermore, the research highlights the potential for incorporating additional modalities by transforming them into text using rule-based systems and utilizing them to refine further pre-trained large language models giving rise to a more comprehensive approach to emotion recognition.
Article
Full-text available
Nowadays, people get increasingly attached to social media to connect with other people, to study, and to work. The presented article uses Twitter posts to better understand public opinion regarding the vegan (plant-based) diet that has traditionally been portrayed negatively on social media. However, in recent years, studies on health benefits, COVID-19, and global warming have increased the awareness of plant-based diets. The study employs a dataset derived from a collection of vegan-related tweets and uses a sentiment analysis technique for identifying the emotions represented in them. The purpose of sentiment analysis is to determine whether a piece of text (tweet in our case) conveys a negative or positive viewpoint. We use the mutual information approach to perform feature selection in this study. We chose this method because it is suitable for mining the complicated features from vegan tweets and extracting users’ feelings and emotions. The results revealed that the vegan diet is becoming more popular and is currently framed more positively than in previous years. However, the emotions of fear were mostly strong throughout the period, which is in sharp contrast to other types of emotions. Our findings place new information in the public domain, which has significant implications. The article provides evidence that the vegan trend is growing and new insights into the key emotions associated with this growth from 2010 to 2022. By gaining a deeper understanding of the public perception of veganism, medical experts can create appropriate health programs and encourage more people to stick to a healthy vegan diet. These results can be used to devise appropriate government action plans to promote healthy veganism and reduce the associated emotion of fear.
Article
Full-text available
Human ideas and sentiments are mirrored in facial expressions. They give the spectator a plethora of social cues, such as the viewer’s focus of attention, intention, motivation, and mood, which can help develop better interactive solutions in online platforms. This could be helpful for children while teaching them, which could help in cultivating a better interactive connect between teachers and students, since there is an increasing trend toward the online education platform due to the COVID-19 pandemic. To solve this, the authors proposed kids’ emotion recognition based on visual cues in this research with a justified reasoning model of explainable AI. The authors used two datasets to work on this problem; the first is the LIRIS Children Spontaneous Facial Expression Video Database, and the second is an author-created novel dataset of emotions displayed by children aged 7 to 10. The authors identified that the LIRIS dataset has achieved only 75% accuracy, and no study has worked further on this dataset in which the authors have achieved the highest accuracy of 89.31% and, in the authors’ dataset, an accuracy of 90.98%. The authors also realized that the face construction of children and adults is different, and the way children show emotions is very different and does not always follow the same way of facial expression for a specific emotion as compared with adults. Hence, the authors used 3D 468 landmark points and created two separate versions of the dataset from the original selected datasets, which are LIRIS-Mesh and Authors-Mesh. In total, all four types of datasets were used, namely LIRIS, the authors’ dataset, LIRIS-Mesh, and Authors-Mesh, and a comparative analysis was performed by using seven different CNN models. The authors not only compared all dataset types used on different CNN models but also explained for every type of CNN used on every specific dataset type how test images are perceived by the deep-learning models by using explainable artificial intelligence (XAI), which helps in localizing features contributing to particular emotions. The authors used three methods of XAI, namely Grad-CAM, Grad-CAM++, and SoftGrad, which help users further establish the appropriate reason for emotion detection by knowing the contribution of its features in it.
Conference Paper
Full-text available
We listen to music every day and feel various emotions from different music, it brings us to the right mood and bonds us to other people and creates a shared experience. There is something special about music that influences our emotions. Depending on the situation and mental state different people may get affected diversely. However, we can extract common music emotion information even from people with diverse backgrounds and cultures. In this research paper, we develop a system that uses music to retrieve human emotions using k-Nearest Neighbors(kNN) machine learning algorithm and compare the classifier with others to find the most accurate. How do we define music as happy or sad? Low-level features such as beat, pitch, rhythm, valence and tempo which made up the music, are used in the classification of music emotion. For example, high pitched music or music with a high frequency is usually associated with happy and energizing state, and the opposite for slow music, which makes us feel sad and calm. Proposed system can be very useful in music-related systems and applications such as music retrieval, recommendation and understanding. Also, various disciplines such as psychology, physiology, musicology and cognitive sciences refer to music emotion recognition.
Article
Full-text available
This paper aims to analyze the importance of age-biased data in recognizing six emotions using facial expressions. For this purpose, a custom dataset (adults, kids, mixed) was constructed using images that separated the existing datasets (FER2013 and MMA FACILE EXPRESSION) into adults (≥14) and kids (≤13). The convolutional Neural Networks (CNN) algorithm was used to calculate emotion recognition accuracy. Additionally, this study investigated the effect of the characteristics of CNN architecture on emotion recognition accuracy. Based on the variables of Accuracy and FLOP, three types of CNN architectures (MobileNet-V2, SE-ResNeXt50 (32 × 4 d), and ResNeXt-101 (64 × 4 d)) were adopted. As for the experimental result, SE-ResNeXt50 (32 × 4 d) showed the highest accuracy at 79.42%, and the model that learned by age obtained 22.24% higher accuracy than the model that did not learn by age. In the results, the difference in expression between adults and kids was greatest for fear and neutral emotions. This study presented valuable results on age-biased learning data and algorithm type effect on emotion recognition accuracy.
Conference Paper
Full-text available
This paper explores the recognition of expressed emotion from speech and facial gestures for the speaker-dependent case. Experiments were performed on an English audiovisual emotional database consisting of 480 utterances from 4 English male actors in 7 emotions. A total of 106 audio and 240 visual features were extracted and features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. Linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification. The performance was higher for LDA features compared to PCA features. The visual features performed better than the audio features and overall performance improved for the audiovisual features. In case of 7 emotion classes, an average recognition rate of 56 % was achieved with the audio features, 95 % with the visual features and 98 % with the audiovisual features selected by Bhat-tacharyya distance and transformed by LDA. Grouping emotions into 4 classes, an average recognition rate of 69 % was achieved with the audio features, 98 % with the visual features and 98 % with the audiovisual features fused at decision level. The results were comparable to the measured human recognition rate with this multimodal data set. 1
Article
Full-text available
Emotion recognition is an important research field for human–computer interaction. Audio–video emotion recognition is now attacked with deep neural network modeling tools. In published papers, as a rule, the authors show only cases of the superiority in multi-modality over audio-only or video-only modality. However, there are cases of superiority in uni-modality that can be found. In our research, we hypothesize that for fuzzy categories of emotional events, the within-modal and inter-modal noisy information represented indirectly in the parameters of the modeling neural network impedes better performance in the existing late fusion and end-to-end multi-modal network training strategies. To take advantage of and overcome the deficiencies in both solutions, we define a multi-modal residual perceptron network which performs end-to-end learning from multi-modal network branches, generalizing better multi-modal feature representation. For the proposed multi-modal residual perceptron network and the novel time augmentation for streaming digital movies, the state-of-the-art average recognition rate was improved to 91.4% for the Ryerson Audio–Visual Database of Emotional Speech and Song dataset and to 83.15% for the Crowd-Sourced Emotional Multi Modal Actors dataset. Moreover, the multi-modal residual perceptron network concept shows its potential for multi-modal applications dealing with signal sources not only of optical and acoustical types.
Article
Full-text available
Many parents worry over their children’s gaming habits, but to what extent do such worries match any detrimental effects of excessive gaming? We attempted to answer this question by comparing children of highly concerned parents with other adolescents of the same age. A cohort of parents who identified as highly concerned over their children’s video game habits were recruited for a public study in collaboration with a national television network. Using an online experimental platform in conjunction with surveys of parents’ beliefs and attitudes, we compared their children to age-matched peers in an exploratory case-control study. The scores of children with highly concerned parents on tests of cognitive control (cued task-switching and Iowa Gambling Task) and psychological wellbeing (WHO-5) were statistically similar to controls, suggesting no selective cognitive or psychological detriments from gaming or otherwise in the cases with concerned parents. The case group, however, did spend more time gaming, and scored higher than controls on problem gaming indicators (Gaming Addiction Scale), which also correlated negatively with wellbeing. Within the case group, wellbeing effects seemed mainly to consist in issues of relaxation and sleep, and related to gaming addiction indicators of playing to forget real-world problems, and the feeling of neglecting non-gaming activities. Where most results of research staged for TV never get published, making it difficult to interpret both methods and results, this paper describes findings and participant recruitment in detail. The relationship between parental concern and children’s gaming is discussed, as is the merits and challenges of research conducted with media, such as TV programs and their recruited on-screen participants.
Article
Full-text available
Recently, video-based facial emotion recognition (FER) has been an attractive topic in the computer vision society. However, processing several hundreds of frames for a single video of a particular emotion is not efficient. In this paper, we propose a novel approach to obtain a representative set of frames for a video in the eigenspace domain. Principal Component Analysis (PCA) is applied to a single emotional video extracting the most significant eigenframes representing the temporal motion variance embedded in the video. Given that faces are segmented and normalized, the variance captured by PCA is attributed to the facial expression dynamics. The variation in the temporal domain is mapped to the eigenspace reducing the redundancy. The proposed approach is used to extract the input eigenframes. Later, VGG-16, ResNet50, and 2D and 3D CNN architectures called eigenFaceNet are trained on the RML, eNTERFACE'05, and AFEW 6.0 databases. The experimental results are superior to the state-of-the-art by 8\% and 4\% for RML, eNTERFACE'05 databases, respectively. The performance achievement is also coupled with a reduction in the computational time.
Article
Full-text available
Media use is a pervasive aspect of children’s home experiences but is often not considered in studies of the home learning environment. Media use could be detrimental to children’s language and literacy skills because it may displace other literacy-enhancing activities like shared reading and decrease the quantity and quality of caregiver–child interaction. Thus, the current study asked whether media use is associated with gains in children’s language and literacy skills both at a single time point and across a school year and whether age moderates any association. Children (N = 1583) were from preschool through third grade classrooms and language and literacy skills were measured in the fall and spring of the school year. Parents reported how much time their child spends using media on a typical school day. Regression analyses showed that using 4 h or more of media was related to lower literacy gains, but not to language gains. Multilevel models conducted as a robustness check showed that this effect did not hold when accounting for classroom. In neither set of models was there an interaction between age and media use. Single-time-point models did show some associations that did not manifest in more stringent models, highlighting the limitations of correlational designs that do not have measures of children’s skills over time. Given the concern and popular press coverage around children’s media use, it is important to acknowledge non-significant effects in this domain. These non-significant associations suggest that societal fears around children’s media use may be exaggerated. Notably, however, characteristics of children’s media use, like educational content or adult co-use, may moderate any effects. The relation between media use and language and literacy growth did not differ across the age range investigated suggesting that, within this range, younger children are not more vulnerable to detrimental effects.
Article
Speech Emotion Recognition, abbreviated as SER, the act of trying to identify a person's feelings and relationships. Affected situations from speech. This is because the truth often reflects the basic feelings of tone and tone of voice. Emotional awareness is a fast-growing field of research in recent years. Unlike humans, machines do not have the power to comprehend and express emotions. But human communication with the computer can be improved by using automatic sensory recognition, accordingly reducing the need for human intervention. In this project, basic emotions such as peace, happiness, fear, disgust, etc. are analyzed signs of emotional expression. We use machine learning techniques such as Multilayer perceptron Classifier (MLP Classifier) which is used to separate information provided by groups to be divided equally. Coefficients of Mel-frequency cepstrum (MFCC), chroma and mel features are extracted from speech signals and used to train MLP differentiation. By accomplishing this purpose, we use python libraries such as Librosa, sklearn, pyaudio, numpy and audio file to analyze speech patterns and see the feeling. Keywords: Speech emotion recognition, mel cepstral coefficient, neural artificial network, multilayer perceptrons, mlp classifier, python.