Lab
Speech, Image and Language Processing (SILP) Lab
Institution: Indian Institute of Information Technology Allahabad
Department: Information Technology Masters Program
Featured research (8)
Recently, the representation of emotions in the
Valence, Arousal and Dominance (VAD) space has drawn enough
attention. However, the complex nature of emotions and the
subjective biases in self-reported values of VAD make the emotion
model too specific to a particular experiment. This study aims
to develop a generic model representing emotions using a fuzzy
VAD space and improve emotion recognition by utilizing this
representation. We partitioned the crisp VAD space into a fuzzy
VAD space using low, medium and high type-2 fuzzy dimensions
to represent emotions. A framework that integrates fuzzy VAD
space with EEG data has been developed to recognize emotions.
The EEG features were extracted using spatial and temporal
feature vectors from time-frequency spectrograms, while the
subject-reported values of VAD were also considered. The study
was conducted on the DENS dataset, which includes a wide range
of twenty-four emotions, along with EEG data and subjective
ratings. The study was validated using various deep fuzzy frame-
work models based on type-2 fuzzy representation, cuboid prob-
abilistic lattice representation and unsupervised fuzzy emotion
clusters. These models resulted in emotion recognition accuracy
of 96.09%, 95.75% and 95.31%, respectively, for the classes of
24 emotions. The study also included an ablation study, one with
crisp VAD space and the other without VAD space. The result
with crisp VAD space performed better, while the deep fuzzy
framework outperformed both models. The model was extended
to predict cross-subject cases of emotions, and the results with
78.37% accuracy are promising, proving the generality of our
model. The generic nature of the developed model, along with its
successful cross-subject predictions, gives direction for real-world
applications in the areas such as affective computing, human-
computer interaction, and mental health monitoring.
In this paper, we worked on the fusion of multiple brain regions in order to combine information from different brain regions. The idea is that considering the dynamic processing of emotional video stimulus will involve different brain regions, and hence, fusion of information from these brain regions can increase emotion recognition accuracy significantly. We utilized EEG datasets from Indian (DENS) and European (DEAP) populations, comprising 128 and 32 channels, respectively. We categorized EEG channels into different brain regions and divided them into five sub-regions. Then, we used five transformer models to extract information from these sub-regions and concatenated the results to obtain a fused information vector for classification. The idea resembles the theoretical findings that brain association areas that fuse information from different brain lobes contribute significantly to emotion processing. For the DENS dataset, we achieved 97.78% accuracy in valence and 95.74% accuracy in arousal. For the DEAP dataset, we achieved 90.07% accuracy in valence and 84.52% accuracy in arousal. We propose that deep learning models simulating information fusion in the association regions of the brain can enhance emotion recognition accuracy.
This report discusses the problem of denoising in image processing and the application of Generative Adversarial Networks (GANs) to address this challenge. GANs have demonstrated promising results in denoising tasks by learning to generate clean images from noisy ones through training on paired noisy and clean image datasets. Several variations of GANs have been proposed for denoising, including SRGAN, DCGAN, and LSGAN, each with
unique strengths and weaknesses. This report suggests conducting a comparative study to
determine the best-performing model under different conditions. The comparative study involves evaluating the denoising performance of these GAN models on a shared dataset, using metrics such as Peak Signal-to-Noise Ratio (PSNR). Through this paper, we want to
present you a comparative study on denoising images using SRGAN, DCGAN, Auto encoding GAN, Vanilla GAN, and LSGAN to provide insights into the strengths and limitations of each model and the main aim of this study is to provide a guide to the best way to denoise the images.
Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the ‘Dataset on Emotion using Naturalistic Stimuli’ (DENS) dataset. The dataset contains the 'Emotional Events'- the precise information of the emotion timings that participants felt. The model is a combination of regular, depthwise and separable convolution layers of CNN to classify the emotions. The model has the capacity to learn the spatial features of the EEG channels and the temporal features of the EEG signals variability with time. The model is evaluated for the valence space ratings. The model achieved an accuracy of 73.04%.
Inter-subject or subject-independent emotion recognition has been a challenging task in affective computing. This work is about an easy-to-implement emotion recognition model that classifies emotions from EEG signals subject independently. It is based on the famous EEGNet architecture, which is used in EEG-related BCIs. We used the Dataset on Emotion using Naturalistic Stimuli (DENS) dataset. The dataset contains the Emotional Events -- the precise information of the emotion timings that participants felt. The model is a combination of regular, depthwise and separable convolution layers of CNN to classify the emotions. The model has the capacity to learn the spatial features of the EEG channels and the temporal features of the EEG signals variability with time. The model is evaluated for the valence space ratings. The model achieved an accuracy of 73.04%.