Thierry Pun’s research while affiliated with University of Geneva and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (241)


Fig. 1: Impression Formation and Detection Diagram [5]
Fig. 2: Unimodal Impression Detection Performance
Fig. 3: The Impression Management System [5]
Fig. 4: Landmarks from OpenFace [10]. The green rectangle defines the face area. The vertical black bar separates the left from the right hemiface.
Fig. 5: Gaze area in human-human vs. human-agent interaction

+2

Impression Detection and Management Using an Embodied Conversational Agent
  • Chapter
  • Full-text available

July 2020

·

195 Reads

·

2 Citations

Lecture Notes in Computer Science

Chen Wang

·

·

·

[...]

·

Embodied Conversational Agents (ECAs) are a promising medium for human-computer interaction, since they are capable of engaging users in real-time face-to-face interaction [1, 2]. Users’ formed impressions of an ECA (e.g. favour or dislike) could be reflected behaviourally [3, 4]. These impressions may affect the interaction and could even remain afterwards [5, 7]. Thus, when we build an ECA to impress users, it is important to detect how users feel about the ECA. The impression the ECA leaves can then be adjusted by controlling its non-verbal behaviour [7]. Motivated by the role of ECAs in interpersonal interaction and the state-of-the-art on affect recognition, we investigated three research questions: 1) which modality (facial expressions, eye movements, and physiological signals) reveals most of the formed impressions; 2) whether an ECA could leave a better impression by maximizing the impression it produces; 3) whether there are differences in impression formation during human-human vs. human-agent interaction. Our results firstly showed the interest to use different modalities to detect impressions. An ANOVA test indicated that facial expressions performance outperforms the physiological modality performance (M = 1.27, p = 0.02). Secondly, our results presented the possibility of creating an adaptive ECA. Compared with the randomly selected ECA behaviour, participants’ ratings tended to be higher in the conditions where the ECA adapted its behaviour based on the detected impressions. Thirdly, we found similar behaviour during human-human vs. human-agent interaction. People treated an ECA similarly to a human by spending more time observing the face area when forming an impression.

Download


Recognizing Induced Emotions of Movie Audiences From Multimodal Information

February 2019

·

367 Reads

·

52 Citations

IEEE Transactions on Affective Computing

Recognizing emotional reactions of movie audiences to affective movie content is a challenging task in affective computing. Previous research on induced emotion recognition has mainly focused on using audiovisual movie content. Nevertheless, the relationship between the perceptions of the affective movie content (perceived emotions) and the emotions evoked in the audiences (induced emotions) is unexplored. In this work, we studied the relationship between perceived and induced emotions of movie audiences. Moreover, we investigated multimodal modelling approaches to predict movie induced emotions from movie content based features, as well as physiological and behavioral reactions of movie audiences. To carry out analysis of induced and perceived emotions, we first extended an existing database for movie affect analysis by annotating perceived emotions in a crowd-sourced manner. We find that perceived and induced emotions are not always consistent with each other. In addition, we show that perceived emotions, movie dialogues and aesthetic highlights are discriminative for movie induced emotion recognition besides spectators? physiological and behavioral reactions. Also, our experiments revealed that induced emotion recognition could benefit from including temporal information and performing multimodal fusion. Moreover, our work deeply investigated the gap between affective content analysis and induced emotion recognition by gaining insight into the relationships between aesthetic highlights, induced emotions and perceived emotions.



Towards a Better Gold Standard: Denoising and Modelling Continuous Emotion Annotations Based on Feature Agglomeration and Outlier Regularisation

October 2018

·

710 Reads

·

14 Citations

Emotions are often perceived by humans through a series of multimodal cues, such as verbal expressions, facial expressions and gestures. In order to recognise emotions automatically, reliable emotional labels are required to learn a mapping from human expressions to corresponding emotions. Dimensional emotion models have become popular and have been widely applied for annotating emotions continuously in the time domain. However, the statistical relationship between emotional dimensions is rarely studied. This paper provides a solution to automatic emotion recognition for the Audio/Visual Emotion Challenge (AVEC) 2018. The objective is to find a robust way to detect emotions using more reliable emotion annotations in the valence and arousal dimensions. The two main contributions of this paper are: 1) the proposal of a new approach capable of generating more dependable emotional ratings for both arousal and valence from multiple annotators by extracting consistent annotation features; 2) the exploration of the valence and arousal distribution using outlier detection methods, which shows a specific oblique elliptic shape. With the learned distribution, we are able to detect the prediction outliers based on their local density deviations and correct them towards the learned distribution. The proposed method performance is evaluated on the RECOLA database containing audio, video and physiological recordings. Our results show that a moving average filter is sufficient to remove the incidental errors in annotations. The unsupervised dimensionality reduction approaches could be used to determine a gold standard annotations from multiple annotations. Compared with the baseline model of AVEC 2018, our approach improved the arousal and valence prediction of concordance correlation coefficient significantly to respectively 0.821 and 0.589.


Figure 1 • Le jeu collaboratif Portal 2® (à gauche) et la fenêtre indiquant les feedbacks (à droite)
Figure 2 • Patterns émotionnels représentant les 5 émotions les plus intensément ressenties dans chaque condition expérimentale
Effet des antécédents émotionnels de contrôle et de valeur sur la résolution de problème dans un jeu vidéo collaboratif

July 2018

·

818 Reads

·

3 Citations

Sciences et Technologies de l Information et de la Communication pour l Éducation et la Formation

Des feedbacks biaisés de contrôle et de valeur ont été utilisés pour influencer l’évaluation émotionnelle pendant un jeu collaboratif de résolution de problème. Nous avons étudié comment ces feedbacks ont modulé l’intensité des émotions ressenties ainsi que les relations entre les émotions, la collaboration perçue et la performance du groupe. Les résultats montrent des patterns de corrélations entre émotions, processus socio-cognitifs et performance différents en fonction des perceptions de contrôle et de valeur.


Aesthetic Highlight Detection in Movies Based on Synchronization of Spectators’ Reactions

July 2018

·

49 Reads

·

21 Citations

ACM Transactions on Multimedia Computing, Communications and Applications

Detection of aesthetic highlights is a challenge for understanding the affective processes taking place during movie watching. In this article, we study spectators’ responses to movie aesthetic stimuli in a social context. Moreover, we look for uncovering the emotional component of aesthetic highlights in movies. Our assumption is that synchronized spectators’ physiological and behavioral reactions occur during these highlights because: (i) aesthetic choices of filmmakers are made to elicit specific emotional reactions (e.g., special effects, empathy, and compassion toward a character) and (ii) watching a movie together causes spectators’ affective reactions to be synchronized through emotional contagion. We compare different approaches to estimation of synchronization among multiple spectators’ signals, such as pairwise, group, and overall synchronization measures to detect aesthetic highlights in movies. The results show that the unsupervised architecture relying on synchronization measures is able to capture different properties of spectators’ synchronization and detect aesthetic highlights based on both spectators’ electrodermal and acceleration signals. We discover that pairwise synchronization measures perform the most accurately independently of the category of the highlights and movie genres. Moreover, we observe that electrodermal signals have more discriminative power than acceleration signals for highlight detection.


A Comparative Survey of Methods for Remote Heart Rate Detection From Frontal Face Videos

May 2018

·

1,719 Reads

·

94 Citations

Remotely measuring physiological activity can provide substantial benefits for both the medical and the affective computing applications. Recent research has proposed different methodologies for the unobtrusive detection of heart rate (HR) using human face recordings. These methods are based on subtle color changes or motions of the face due to cardiovascular activities, which are invisible to human eyes but can be captured by digital cameras. Several approaches have been proposed such as signal processing and machine learning. However, these methods are compared with different datasets, and there is consequently no consensus on method performance. In this article, we describe and evaluate several methods defined in literature, from 2008 until present day, for the remote detection of HR using human face recordings. The general HR processing pipeline is divided into three stages: face video processing, face blood volume pulse (BVP) signal extraction, and HR computation. Approaches presented in the paper are classified and grouped according to each stage. At each stage, algorithms are analyzed and compared based on their performance using the public database MAHNOB-HCI. Results found in this article are limited on MAHNOB-HCI dataset. Results show that extracted face skin area contains more BVP information. Blind source separation and peak detection methods are more robust with head motions for estimating HR.




Citations (81)


... These low-level signals are processed using EyesWeb and other external tools, such as machine learning pretrained models (Dermouche and Pelachaud, 2019;Wang et al., 2019), to extract high-level features about the user, such as their level of engagement. ...

Reference:

Adaptation Mechanisms in Human–Agent Interaction: Effects on User’s Impressions and Engagement
Impression Detection and Management Using an Embodied Conversational Agent

Lecture Notes in Computer Science

... favour or dislike) could be reflected behaviourally [3,4]. These impressions may affect the interaction and could even remain afterwards [5,7]. Thus, when we build an ECA to impress users, it is important to detect how users feel about the ECA. ...

Your Body Reveals Your Impressions about Others: A Study on Multimodal Impression Detection
  • Citing Conference Paper
  • September 2019

... They employed an LSTM-based model to recognize induced sentiment from the viewers' physiological features, showcasing the effect of integrating multiple modalities, including external information like affective cues in movies. Muszyński et al. [28] further investigated the correlation between dialogue and aesthetic features in inducing sentiment in movies. They introduced an innovative multi-modal model for predicting induced sentiment. ...

Recognizing Induced Emotions of Movie Audiences From Multimodal Information
  • Citing Article
  • February 2019

IEEE Transactions on Affective Computing

... First, provide detailed information about the implementation of an EAT where these three factors coalesce. To meet this objective, I build on and extend previous work conducted in the Emotion Awareness Tool for Computer-Mediated Interactions (EATMINT) project Cereghetti, Molinari, Chanel, Pun, & Bétrancourt, 2015;Chanel et al., 2016;. More specifically, a prototype of an EAT named Dynamic Emotion Wheel (Fritz, Bétrancourt, Molinari, & Pun, 2015), designed during an internship in the project and as the subject of my Master thesis (Fritz, 2015), will serve as the basis for the implementation of a functioning proof of concept. ...

Sharing emotions during a computer-mediated collaborative task: a dual eye-tracking study
  • Citing Conference Paper
  • August 2015

... Understanding the target distribution can also help improve the reliability of annotations. Wang et al. [58] explore the distribution of emotion annotations using outlier detection methods and use these insights to correct outliers toward the learned distribution, reducing labeling noise and outperforming previous SOTA results. Escalante et al. [59] analyze different aspects of the target variable in the First Impression dataset, including intravideo and intervideo variance, and use these insights, together with studies of the correlations with sensitive traits, such as gender and ethnicity, to uncover existing biases in the dataset. ...

Towards a Better Gold Standard: Denoising and Modelling Continuous Emotion Annotations Based on Feature Agglomeration and Outlier Regularisation

... Une recherche dans la revue Sticef de 2010 à 2020 avec le terme "éthique", montre qu'il est employé dans certains articles dans les réflexions à avoir lors de l'utilisation d'outils numériques proposés pour l'apprentissage [5,11], les réflexions dans le processus de recherche [5], ou encore dans les études soumises et validées par le comité d'éthique de l'institution [12]. Cependant, ces articles ne présentent pas de définitions univoques du terme éthique et ne proposent pas d'accompagnement pour l'intégration de ces réflexions dans le processus de conception. ...

Effet des antécédents émotionnels de contrôle et de valeur sur la résolution de problème dans un jeu vidéo collaboratif

Sciences et Technologies de l Information et de la Communication pour l Éducation et la Formation

... We anticipated that the in-depth exploration of music's emotional aspects through analytical knowledge would lead to greater physiological concordance among learners, reflecting a collective emotional engagement with the music. Furthermore, drawing on the evidence that people watching the same emotionally expressive content exhibited similar physiological responses (Bracken et al., 2014;Golland et al., 2014;Muszynski et al., 2016Muszynski et al., , 2018, our study extends this concept to the domain of music education. We propose that the type of knowledge imparted-analytical or historical-may influence the degree of emotional synchronization in a group setting. ...

Aesthetic Highlight Detection in Movies Based on Synchronization of Spectators’ Reactions
  • Citing Article
  • July 2018

ACM Transactions on Multimedia Computing, Communications and Applications

... However, they cannot be used to monitor pet vital signs because they have a short detection range and are limited by the condition of the body surface of the animal. Similarly, hair covering the body surface also renders camera-or video-based approaches complex, limiting their application to animals (18,19). Recently, radar, a contactless vital sign monitoring method, has received extensive interest and has been applied to various scenarios (20)(21)(22). ...

A Comparative Survey of Methods for Remote Heart Rate Detection From Frontal Face Videos

... The classification improved when including cross-subject features. In [247], the authors analyze whether the emotional reaction of one individual can be assessed by the emotional response of their partner in a dyad cooperation task, exploring physiological and speech data. The models were trained to predict emotional and non-emotional moments using a linear SVM and a random forest classifier. ...

Multiple users' emotion recognition: Improving performance by joint modeling of affective reactions
  • Citing Conference Paper
  • October 2017

... Induced emotion analysis, distinct from perceiving emotion conveyed by content creators, pertains to analyzing emotional reactions induced from content consumers [18,42]. Presently, there is a growing interest in comprehending the patterns of emotion induced by video [6], since it has a wide range of applications in various perspectives [41,38,30,3]. ...

Recognizing induced emotions of movie audiences: Are induced and perceived emotions the same?