Conference PaperPDF Available

A Multimodal Annotation Schema for Non-Verbal Affective Analysis in the Health-Care Domain

Authors:

Abstract

The development of conversational agents with human interaction capabilities requires advanced affective state recognition integrating non-verbal cues from the different modalities constituting what in human communication we perceive as an overall affective state. Each of the modalities is often handled by a different subsystem that conveys only a partial interpretation of the whole and, as such, is evaluated only in terms of its partial view. To tackle this shortcoming, we investigate the generation of a unified multimodal annotation schema of non-verbal cues from the perspective of an interdisciplinary group of experts. We aim at obtaining a common ground-truth with a unique representation using the Valence and Arousal space and a discrete non-linear scale of values. The proposed annotation schema is demonstrated on a corpus in the health-care domain but is scalable to other purposes. Preliminary results on inter-rater variability show a positive correlation of consensus level with high (absolute) values of Valence and Arousal as well as with the number of annotators labeling a given video sequence.



UPF, UAU, UULM, CERTH
 !"!
 !#$% &
'( !



&( 
 !"
#
)( #$
$"%#
%
*+,*+,'+

-( %

.( 


&'
"
+( /#$
 !"!
'(!


 !"!
 !#$% -
'('(01
203
((
*+,*+,'+
"
1%%
)
*"
 + 
), -&./ .0/
 !"!
.
41
5
"1!
!

)12

*"
&


3"
&
$"
)"
!
123
#"
$
0
$
$"
)
4
#"

)12
*
*"
*
3"
&

*
$
 !$"
 !"!
&(
 !"
#
 !"!
 !#$% 6
&('(7%
5"&8
6 7
*+,*+,'+
 !"!
 !#$% 9
&(&(
:;$18%
0<%=>' ?
*+,*+,'+

*"
"
)
 !"!
)( #$
$"#
%
 !"!
 !#$% '*
)('(
#""&
632.9,,:7;<=
)163"&2.9,<,7;9=
0&"6$(2.9,<97;>=
@1215
$6$(2.9,<97;>=
?362.9,<>7;@=
!$A6*2.9,,B7;:=

*+,*+,'+
 !"!
 !#$% ''
)(&(2!3
15"46..
/7
#(C5
85%""&D"(
(
"&( 

5"""
5

#(C5
+"&" 
*+,*+,'+
 !"!
'&
)(&(2!!3
A
A
B
B


C%%
#
) 
;,EF,29:EF,2:EF<=
*+,*+,'+  !#$%
 !"!
-(%;
 !"!
 !#$% '-
-('(%%2!3
(;)&
(;.
=:
=%
%=7
="$
=
;&D&E
-)F
)*F
'9F
6F &F
 %2F3





*+,*+,'+
:;'*
&*F%
 !"!
 !#$% '.
-('(%%2!!3
(%$;'9
9F
).F
)*F
'9F
6F &F
:,






:;
7G&;
7G';
*+,*+,'+
H
%%
7
 !"!
.(


I

 !"!
'6
.('(2!3
A
A
B
B


C%%
#
/1
J*,K*(&.,K*(.,K'L
(:
)6"
"GH
*+,*+,'+  !#$%
 !"!
 !#$% '9
.(&(2!!3
H:;
 !0"68%%
7

&"
<6!"7
"6,.,7
*+,*+,'+
I"$"6Wienburg et al., 2008) ;J=


 !"!
 !#$% 'E
.()(2!3
%;<2%'3
*+,*+,'+
C C0
#"K
%

(
 !"!
 !#$% &*
.(-(I
M1H2%&3
*+,*+,'+
I62.9,<<7;L=
 !"!
 !#$% &'
.(.(2!!3
12%)3
*+,*+,'+
 !"!
+(/#$
 !"!
 !#$% &)
+(/#$
%
 
)"0"
+6<6(%
"""
2C3M2%C3
%
)"15"
%5."C..*

*+,*+,'+
 !"!
"$N
O%;,,$%8(,,
M$C+)
M
2"8M"2"
 !"!
 !#$% &.

J'L2$2P22&22
".<>69759N>O>,>.9,,:2
J&L2P.2 
!22"!A.N@O<,<.
9,<,2
J)L*24.2" #
$"2%".
>6<75:O<L.9,<92
J-L)2 1.2%   "$&$
222")*".<OB.9,<>2
J.L$2:.2.2&'$( $ 
"222$"1.BJ:OBJB.9,,B2
J+L2#O1.2 #222
3""".<::JO<::N.9,,J2
J6L2.'2.02232)" "   $*"
 #* +,2$..>,6<,7.<LLN <LN@.
9,<<2
*+,*+,'+
... • Multimodal communication analytics for processing verbal and nonverbal information, such as (a) automated speech recognition, to support the transformation of spoken language into text, using statistical speech models for both acoustic and language modeling; 11 (b) language analysis, projecting the outputs of syntactic and semantic parsing to a DOLCE+DnS UltraLite compliant representation (Ballesteros, Bohnet, Mille, & Wanner, 2015) (c) analysis of nonveral behaviour, such as emotions and gestures (Sukno et al., 2016). ...
Article
Full-text available
Dialogue-based systems often consist of several components, such as communication analysis, dialogue management, domain reasoning, and language generation. In this paper, we present Converness, an ontology-driven, rule-based framework to facilitate domain reasoning for conversational awareness in multimodal dialogue-based agents. Converness uses Web Ontology Language 2 (OWL 2) ontologies to capture and combine the conversational modalities of the domain, for example, deictic gestures and spoken utterances, fuelling conversational topic understanding, and interpretation using description logics and rules. At the same time, defeasible rules are used to couple domain and user-centred knowledge to further assist the interaction with end users, facilitating advanced conflict resolution and personalised context disambiguation. We illustrate the capabilities of the framework through its integration into a multimodal dialogue-based agent that serves as an intelligent interface between users (elderly, caregivers, and health experts) and an ambient assistive living platform in real home settings.
Article
For technology (like serious games) that aims to deliver interactive learning, it is important to address relevant mental experiences such as reflective thinking during problem solving. To facilitate research in this direction, we present the weDraw-1 Movement Dataset of body movement sensor data and reflective thinking labels for 26 children solving mathematical problems in unconstrained settings where the body (full or parts) was required to explore these problems. Further, we provide qualitative analysis of behaviours that observers used in identifying reflective thinking moments in these sessions. The body movement cues from our compilation informed features that lead to average F1 score of 0.73 for binary classification of problem-solving episodes by reflective thinking based on Long Short-Term Memory neural networks. We further obtained 0.79 average F1 score for end-to-end classification, i.e. based on raw sensor data. Finally, the algorithms resulted in 0.64 average F1 score for subsegments of these episodes as short as 4 seconds. Overall, our results show the possibility of detecting reflective thinking moments from body movement behaviours of a child exploring mathematical concepts bodily, such as within serious game play.
Article
Full-text available
CARMA is a media annotation program that collects continuous ratings while displaying audio and video files. It is designed to be highly user-friendly and easily customizable. Based on Gottman and Levenson’s affect rating dial, CARMA enables researchers and study participants to provide moment-by-moment ratings of multimedia files using a computer mouse or keyboard. The rating scale can be configured on a number of parameters including the labels for its upper and lower bounds, its numerical range, and its visual representation. Annotations can be displayed alongside the multimedia file and saved for easy import into statistical analysis software. CARMA provides a tool for researchers in affective computing, human-computer interaction, and the social sciences who need to capture the unfolding of subjective experience and observable behavior over time.
Conference Paper
Full-text available
We present in this paper a new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA, which is being made available to the research community. Participants were recorded in dyads during a video conference while completing a task requiring collaboration. Different multimodal data, i.e., audio, video, ECG and EDA, were recorded continuously and synchronously. In total, 46 participants took part in the test, for which the first 5 minutes of interaction were kept to ease annotation. In addition to these recordings, 6 annotators measured emotion continuously on two dimensions: arousal and valence, as well as social behavior labels on live dimensions. The corpus allowed us to take self-report measures of users during task completion. Methodologies and issues related to affective corpus construction are briefly reviewed in this paper. We further detail how the corpus was constructed, i.e., participants, procedure and task, the multimodal recording setup, the annotation of data and some analysis of the quality of these annotations.
Conference Paper
Full-text available
Automatic detection and interpretation of social signals car-ried by voice, gestures, mimics, etc. will play a key-role for next-generation interfaces as it paves the way towards a more intuitive and natural human-computer interaction. The paper at hand introduces Social Signal Interpretation (SSI), a framework for real-time recognition of social signals. SSI supports a large range of sensor devices, filter and feature algorithms, as well as, machine learning and pattern recognition tools. It encourages developers to add new components using SSI's C++ API, but also addresses front end users by offering an XML interface to build pipelines with a text editor. SSI is freely available under GPL at http://openssi.net.
Article
In the context of affective human behavior analysis, we use the term continuous input to refer to naturalistic settings where explicit or implicit input from the subject is continuously available, where in a human–human or human–computer interaction setting, the subject plays the role of a producer of the communicative behavior or the role of a recipient of the communicative behavior. As a result, the analysis and the response provided by the automatic system are also envisioned to be continuous over the course of time, within the boundaries of digital machine output. The term continuous affect analysis is used as analysis that is continuous in time as well as analysis that uses affect phenomenon represented in dimensional space. The former refers to acquiring and processing long unsegmented recordings for detection of an affective state or event (e.g., nod, laughter, pain), and the latter refers to prediction of an affect dimension (e.g., valence, arousal, power). In line with the Special Issue on Affect Analysis in Continuous Input, this survey paper aims to put the continuity aspect of affect under the spotlight by investigating the current trends and provide guidance towards possible future directions.
Article
Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.