Conference PaperPDF Available

Emotion Elicitation and Capture among Real Couples in the Lab

Authors:

Abstract

Couples’ relationships affect partners’ mental and physical well-being. Automatic recognition of couples’ emotions will not only help to better understand the interplay of emotions, intimate relationships, and health and well-being, but also provide crucial clinical insights into protective and risk factors of relationships, and can ultimately guide interventions. However, several works developing emotion recognition algorithms use data from actors in artificial dyadic interactions and the algorithms are likely not to perform well on real couples. We are developing emotion recognition methods using data from real couples and, in this paper, we describe two studies we ran in which we collected emotion data from real couples — Dutch-speaking couples in Belgium and German-speaking couples in Switzerland. We discuss our approach to eliciting and capturing emotions and make five recommendations based on their relevance for developing well-performing emotion recognition systems for couples.
Emotion Elicitation and Capture
among Real Couples in the Lab
George Boateng
ETH Zürich
Zürich, Switzerland
gboateng@ethz.ch
Peter Kuppens
KU Leuven
Leuven, Belgium
peter.kuppens@kuleuven.be
Urte Scholz
University of Zürich
Zürich, Switzerland
Laura Sels
Ghent University
Ghent, Belgium
laura.sels@ugent.be
Janina Lüscher
University of Zürich
Zürich, Switzerland
janina.luescher@psychologie.uzh.ch
Tobias Kowatsch
ETH Zürich, University of St.
Gallen
Abstract
Couples’ relationships affect partners’ mental and physi-
cal well-being. Automatic recognition of couples’ emotions
will not only help to better understand the interplay of emo-
tions, intimate relationships, and health and well-being, but
also provide crucial clinical insights into protective and risk
factors of relationships, and can ultimately guide interven-
tions. However, several works developing emotion recog-
nition algorithms use data from actors in artificial dyadic
interactions and the algorithms are likely not to perform well
on real couples. We are developing emotion recognition
methods using data from real couples and, in this paper,
we describe two studies we ran in which we collected emo-
tion data from real couples Dutch-speaking couples in
urte.scholz@psychologie.uzh.ch Zürich,St. Gallen Switzerland
tobias.kowatsch@unisg.ch
Paper presented at the 1st Momentary Emotion Elicitation & Capture (MEEC)
workshop, co-located with the ACM CHI Conference on Human Factors in
Computing Systems, Honolulu, Hawaii, USA, April 25th, 2020. This is an open-
access paper distributed under the terms of the Creative Commons Attribution
License (https://creativecommons.org/licenses/by/4.0/), which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Belgium and German-speaking couples in Switzerland. We
discuss our approach to eliciting and capturing emotions
and make five recommendations based on their relevance
for developing well-performing emotion recognition systems
for couples.
Author Keywords
Emotion; Couples; Multimodal Sensor Data; Smartphone;
Smartwatch
CCS Concepts
Applied computing Psychology; Human-centered
computing Ubiquitous and mobile computing systems
and tools;
Introduction
Extensive research shows that intimate relationships have
powerful effects on people’s mental and physical health
(see e.g. [23] for an overview). For instance, conflicts and
negative qualities of one’s intimate relationship are asso-
ciated prospectively with morbidity and mortality [16]. In-
creasingly, researchers are zooming in on the emotional
processes that take place in intimate relationships as un-
derlying mechanisms for this relationship-health link (e.g.
[9]. However, assessing these dynamic emotional pro-
cesses is challenging.
In studies of intimate relationships, two methods predomi-
nate: self-reports and observer reports. Most often, a stan-
dard dyadic interaction paradigm is used, in which cou-
ples participate in an emotionally charged discussion that
is videotaped [22]. Next, couples can watch these videos
and report on the emotions that they have experienced dur-
ing the interaction (resulting in self-reported emotion); or
observers use a coding scheme to rate the interaction on
specific emotional behaviors (e.g., the SPAFF [7]). Both
methods have their own advantages and limitations and
provide unique information on the emotional processes in
couples. The power of observational data is that it goes
beyond people’s own awareness, and is not subjective to
reporting biases. However, its greatest limitation is the re-
source use required in coding. First, a coding scheme has
to be developed, which is a whole process in itself [12].
Next, multiple observers have to be trained in a system-
atic manner to obtain sufficient inter-rater agreement. When
the actual coding can start, this process is slow and costly,
and multiple coders have to code the same videos to allow
obtaining inter-rater reliability.
Automatic emotion recognition holds important promise in
meeting these limitations and significantly advancing the
field. Hence, it is important to develop a system for auto-
matic recognition of couples’ emotions using information
such as speech, facial expressions, gestures etc. Works
that develop emotion recognition systems using speech
data collected from individuals are not adequate for our
purpose as such works do not capture the complexity of
dyadic conversations such as turn-taking in couples’ con-
versations. As a result, works that focus on couple dyads
are most relevant.
Several emotion-recognition works using data from cou-
ple dyads involve data collected from actors in artificial
dyadic interactions. Examples of these datasets are the
IEMOCAP dataset [5], USC CreativeIT dataset [19], and
MSP-IMPROV dataset [6]. To elicit emotions, actors are
either asked to use a script or they are given hypothetical
situations to act out so as to make the acting seem natu-
ral and more like a real couple. To capture ground truth,
these works tend to be annotated later using either dimen-
sional and or categorical labels and also either moment-by-
moment or using global emotion labels of whole recordings.
There are several challenges with these annotations by ex-
ternal raters which are highlighted in this work [20] such as
dealing with inter-rater agreement, the subjectivity of each
rater, approaches to combine the annotations for moment-
by-moment ratings and the laborious nature of these anno-
tations. Additionally, and importantly, the ratings do not re-
flect the perceived emotions of couples which is necessary
to capture rather than the assessment of external raters.
Furthermore, it has been shown that algorithms trained on
naturalistic data perform worse than those trained on acted
data [8] and it is likely that algorithms developed from data
collected from actors will not perform well on real people
given that actors tend to express emotions with greater in-
tensity as compared to naturalistic contexts and real cou-
ples. It is hence important to develop emotion recognition
methods using data from real couples along with emotion
ratings from them as well.
Towards that end, it is important to adequately collect ground
truth information and sensor data to develop a system for
emotion recognition among couples. We are developing
such a system and, in this paper, we describe our approach
to elicit and capture emotions among real couples in two lab
studies one conducted in Belgium with couples speaking
Dutch and the other in Switzerland with couples speaking
German. We then discuss these studies and make five rec-
ommendations for future data collection among couples in
the lab to improve automatic emotion recognition. For work
focusing on data collected from couples in everyday life,
see our paper (under review) [2].
Methods
We used data from two lab studies with real couples, in
which the sessions were videotaped and couples provided
ratings either of the whole session or retroactively on a
moment-by-moment basis while watching the video.
Study 1: Dyadic Interaction Study
A Dyadic Interaction lab study was conducted in Leuven,
Belgium with 101 Dutch-speaking couples. These couples
were asked to have a 10-minute conversation about a nega-
tive topic (a characteristic of their partner that annoys them
the most) and a positive topic (a characteristic of their part-
ner that they value the most) [29]. During both conversa-
tions, couples were asked to wrap up the conversation after
8 minutes. For the negative topic, they were also asked to
end on good terms. After each conversation, each partner
completed self-reports on various categorical emotion la-
bels such as anger, sadness, anxiety, relaxation, happiness,
etc. on a 7-point Likert scale ranging from strongly disagree
(1) to strongly agree (7). Also, they completed the Affect
Grid questionnaire [27] which captures the valence and
arousal dimensions of Russel’s circumplex model of emo-
tions [25]. Each partner also completed their perception of
their partner’s emotion using the Affect Grid. Additionally,
each partner watched the video recording of the conversa-
tion separately on a computer and rated his or her emotion
on a moment-by-moment basis by continuously adjusting
a joystick to the left (very negative) and the right (very pos-
itive), so that it closely matched their feelings, resulting in
valence scores on a continuous scale from -1 to 1 [11, 24].
Study 2: DyMand Study
We are currently running a Dyadic Management of Di-
abetes (DyMand) lab study in Zurich, Switzerland with
German-speaking couples in which one partner has type
2 diabetes with data from eight (8) couples collected so far
[17]. In this lab study, the couple is asked to discuss an ill-
ness managementrelated concern that is causing them
considerable distress for a 10-minute period. The session
is videotaped and additionally, each partner wears a smart-
watch as it collects various sensor data: audio, heart rate,
accelerometer, gyroscope, and ambient light. After the ses-
sion, each partner completes a self-report on a smartphone
about their emotions using the Affective Slider [1] which
assesses the valence and arousal dimensions of their emo-
tions over the last 10 min of the discussion. Also, the smart-
phone takes a 3-second video of their facial expression
while they complete the self-report.
Discussion and Recommendations
Based on these two studies, we discuss and recommend
approaches to collect sensor and ground truth data from
couples to aid in developing well-performing systems for
emotion recognition among couples.
Elicitation of Emotions
In these studies, we elicited emotions in the couples by
asking them to discuss various relationship-relevant top-
ics (Study 1), or a distressing illness management concern
(Study 2). In comparison to various elicitation approaches
such as watching a video or listening to music, this ap-
proach leverages context which mimics a real-world context
partners having a conversation. Hence, the algorithms
developed using data from this context like verbal and non-
verbal vocalizations could then also be implemented in
ubiquitous systems such as smartphones and smartwatch
for couple emotion recognition from everyday life. We rec-
ommend the use of similar elicitation approaches for couple
emotion recognition works.
Self-Report Data Collection
In these studies, we captured emotions using a range of ap-
proaches which can generally be grouped into two: global
rating (one value or label for the whole conversation) and
continuous rating (different values for different parts of the
conversation) (only in Study 1).
The global ratings consisted of 7-point Likert scale for cat-
egorical emotions such as angry, relaxed, happy, sad, and
the Affect Grid in Study 1 which were completed using elec-
tronic questionnaires. We collected valence and arousal
values using the Affective Slider on a smartphone for Study
2. Global ratings are important to capture (1) a partner’s
perception of his/her emotion (self-perceived) and (2) his/her
perception of his/her partner’s emotion (partner-perceived)
as was done in Study 1. The assessment of a partner’s per-
ception of his/her partner’s emotion is useful and could be
used to compute the baseline measures for metrics like ac-
curacy (for classification task) and correlation coefficient (for
regression tasks) of machine learning experiments.
The continuous emotion rating was done only in Study 1 by
each partner separately by continuously adjusting a joystick
to the left while watching a video of their conversation on a
computer-based software (the rated valence values were
displayed in real-time on). This continuous emotion rating
is important as it gives a granular assessment of emotions
which is important for developing an emotion recognition
system that shows how the emotion of each partner is
changing on a second-by-second or minute-by-minute ba-
sis. Also, the mean value could be used to get an estimate
of the global emotion rating. Additionally, it could be useful
for the accurate recognition of the global rating. Based on
the peak-end rule, which says that the extremes and end of
emotional experience influence a person’s overall judgment
of that emotional experience [10] and prior work exploring
this rule using Study 1’s data [29], using data from the ex-
tremes and or end of the 10-minute conversation might pro-
duce better emotion recognition performance of the global
emotion rating of the whole conversation.
We did not collect self-reports about the personality of each
partner though it might be useful. There are individual dif-
ferences in the experience and expression of emotions with
a concrete example shown in how the relation between
arousal and valence varies across individuals [13]. Prelimi-
nary evidence suggests the valence and arousal emotional
expressions of individuals relates to the five-factor model of
personality [14]. Hence, individuals’ personality may affect
how they express their emotions. Hence, collecting infor-
mation such as the Big Five Inventory [30] and using as
input to an emotion recognition algorithm could potentially
improve its performance.
Based on the discussion, we recommend collecting self-
perceived and partner-perceived (1) global emotion rat-
ings with smartphone-based valence and arousal instru-
ments such as the Affective Slider and (2) continuous emo-
tion ratings for valence and arousal using for example, a
smartphone-based app. Categorical labels could also be
collected if they are not additionally burdensome or redun-
dant. We also recommend that personality self-reports also
be collected. These will help in developing and evaluating
robust emotion recognition systems.
Sensor Data Collection
In Study 1, we collected only audio and video data whereas
in Study 2, we additionally use a smartwatch-based system
we developed DyMand — [3] to collect multimodal sen-
sor data: audio, heart rate, accelerometer, gyroscope, and
ambient light. The additional data collected from the smart-
watch could provide more context for better recognition
such as the heart rate providing physiological measures
and the accelerometer and gyroscope providing information
about hand gestures. Previous works have shown that mul-
timodal approaches to emotion recognition perform better
than unimodal approaches [21]. Given that an additional
device like a commercial smartwatch is not burdensome to
wear, we hence recommend the collection of such multi-
modal data.
Cross-Cultural Studies
The universality of emotions has been interrogated and
questioned [26, 15]. There is evidence that suggest that
culture affects how people experience and express emo-
tions, for example, with facial expressions, gestures, phys-
iological reaction, verbal and nonverbal vocalizations [18,
28]. Hence, algorithms developed using data from one cul-
tural context might not work well in others, or worse, contain
various biases. Collecting cross-cultural data will be useful
in developing algorithms that work across various cultures
and reduce bias in the algorithms. We collected data from
different cultures albeit, only within Europe as of yet: Dutch-
speaking couples in Belgium and German-speaking cou-
ples in Switzerland. We are developing and evaluating our
emotion recognition systems using cross-cultural data. We
hence recommend collecting data from couples in different
cultures to develop robust algorithms.
Development of Software Tools
Data collected from real couples can be annotated by them
as described previously and as a result, there is no need
for manual annotation by external raters which is time-
consuming and laborious. However, the data needs to be
processed before they are useful for developing emotion
recognition systems. There are some challenges involved
in this process, some of which are unique to the context of
dyadic interactions like couples’ dyadic conversations such
as turn-taking. Audio is an important data source for emo-
tion recognition because various key information can be
extracted such as vocal expression (how things are said),
nonverbal vocalizations (eg. sigh, laughs) and verbal vo-
calizations (what is said which might give more context for
recognition). Tools that perform automatic processing of au-
dio data would improve the development of emotion recog-
nition system for couples. Hence, it is important for various
software tools to be developed that can easily be used by
other researchers.
There is a need for open source tools for voice activity de-
tection [4] and diarization [31] that are robust perform
well when used with all kinds of audio. The voice activity
detection tool is needed to automatically annotate parts of
the audio that contain vocalizations so either silent or noisy
segments can be discarded. Additionally, the tool could be
further refined to annotate specific nonverbal vocalizations
like sighs, laughter, chuckles, etc. which might be indicative
of specific emotions in various parts of the audio, thereby
improve recognition performance. The diarization tool is
needed to automatically annotate which parts of the audio
correspond to each speaker. It is important to segment au-
dio recordings into parts that correspond to each speaker
to aid in developing a well-performing emotion recognition
system.
Also, there is a need for open-source tools for automatic
transcription of non-English languages (which are lack-
ing) because using the transcriptions could provide more
context and improve recognition performance. Doing the
annotation and transcription manually for a few hours of au-
dio might not be a problem. However, doing so for data in
the tens of thousands of hours is not scalable. Approaches
such as using Amazon Mechanical Turk may work for acted
data but they cannot work for real couples’ data because
of their confidentiality. We recommend that efforts be put
into developing these tools within the affective computing
community to avoid individual duplicate efforts and also
because inaccurate annotations would result in poor data
input for the emotion recognition algorithms.
Conclusion
We are developing emotion recognition methods using data
from real couples and in this work, we describe two stud-
ies we ran with real couples Dutch-speaking couples in
Belgium and German-speaking couples in Switzerland. We
discuss our approach to eliciting and capturing emotions
and make the following five recommendation based on their
relevance for developing well-performing emotion recogni-
tion systems for couples: 1) Elicit emotions by asking cou-
ples to discuss a topic from their relationship, 2) Collect
global and continuous emotion self-report and personal-
ity data using mobile systems like smartphones, 3) Collect
multimodal sensor data using devices like smartwatches,
4) Collect data from different cultures and 5) Develop open-
source voice activity detection, diarization, and transcription
software tools within the affective computing community.
Acknowledgements
We are grateful to Prabakaran Santhanam and Dominik
Rügger for helping with the development of mobile software
tools in running the second study. Study 2 is co-funded by
the Swiss National Science Foundation (CR12I1_166348).
REFERENCES
[1] Alberto Betella and Paul FMJ Verschure. 2016. The
affective slider: A digital self-assessment scale for the
measurement of human emotions. PloS one 11, 2
(2016), e0148037.
[2] George Boateng, Janina Lüscher, Urte Scholz, and
Tobias Kowatsch. 2020. Emotion Capture among Real
Couples in Everyday Life. Momentary Emotion
Elicitation. In Momentary Emotion Elicitation and
Capture workshop. CHI 2020 (Under Review).
[3] George Boateng, Prabhakaran Santhanam, Janina
Lüscher, Urte Scholz, and Tobias Kowatsch. 2019a.
Poster: DyMandAn Open-Source Mobile and
Wearable System for Assessing Couples’ Dyadic
Management of Chronic Diseases. In The 25th Annual
International Conference on Mobile Computing and
Networking. 13.
[4] George Boateng, Prabhakaran Santhanam, Janina
Lüscher, Urte Scholz, and Tobias Kowatsch. 2019b.
VADLite: an open-source lightweight system for
real-time voice activity detection on smartwatches. In
Adjunct Proceedings of the 2019 ACM International
Joint Conference on Pervasive and Ubiquitous
Computing and Proceedings of the 2019 ACM
International Symposium on Wearable Computers.
902906.
[5] Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe
Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N
Chang, Sungbok Lee, and Shrikanth S Narayanan.
2008. IEMOCAP: Interactive emotional dyadic motion
capture database. Language resources and evaluation
42, 4 (2008), 335.
[6] Carlos Busso, Srinivas Parthasarathy, Alec Burmania,
Mohammed AbdelWahab, Najmeh Sadoughi, and
Emily Mower Provost. 2016. MSP-IMPROV: An acted
corpus of dyadic interactions to study emotion
perception. IEEE Transactions on Affective Computing
8, 1 (2016), 6780.
[7] James A Coan and John M Gottman. 2007. The
specific affect coding system (SPAFF). Handbook of
emotion elicitation and assessment (2007), 267285.
[8] Sidney K D’mello and Jacqueline Kory. 2015. A review
and meta-analysis of multimodal affect detection
systems. ACM Computing Surveys (CSUR) 47, 3
(2015), 136.
[9] Allison K Farrell, Ledina Imami, Sarah CE Stanton,
and Richard B Slatcher. 2018. Affective processes as
mediators of links between close relationships and
physical health. Social and Personality Psychology
Compass 12, 7 (2018), e12408.
[10] Barbara L Fredrickson. 2000. Extracting meaning from
past affective experiences: The importance of peaks,
ends, and specific emotions. Cognition & Emotion 14,
4 (2000), 577606.
[11] John M Gottman and Robert W Levenson. 1985. A
valid procedure for obtaining self-report of affect in
marital interaction. Journal of consulting and clinical
psychology 53, 2 (1985), 151.
[12] Patricia K Kerig and Donald H Baucom. 2004. Couple
observational coding systems. Taylor & Francis.
[13] Peter Kuppens, Francis Tuerlinckx, James A Russell,
and Lisa Feldman Barrett. 2013. The relation between
valence and arousal in subjective experience.
Psychological bulletin 139, 4 (2013), 917.
[14] Peter Kuppens, Francis Tuerlinckx, Michelle Yik, Peter
Koval, Joachim Coosemans, Kevin J Zeng, and
James A Russell. 2017. The relation between valence
and arousal in subjective experience varies with
personality and culture. Journal of personality 85, 4
(2017), 530542.
[15] Nangyeon Lim. 2016. Cultural differences in emotion:
differences in emotional arousal level between the
East and the West. Integrative medicine research 5, 2
(2016), 105109.
[16] Timothy J Loving and Richard B Slatcher. 2013.
Romantic relationships and health. The Oxford
handbook of close relationships (2013), 617637.
[17] Janina Lüscher, Tobias Kowatsch, George Boateng,
Prabhakaran Santhanam, Guy Bodenmann, and Urte
Scholz. 2019. Social Support and Common Dyadic
Coping in Couples’ Dyadic Management of Type II
Diabetes: Protocol for an Ambulatory Assessment
Application. JMIR research protocols 8, 10 (2019),
e13685.
[18] David Matsumoto and Paul Ekman. 1989.
American-Japanese cultural differences in intensity
ratings of facial expressions of emotion. Motivation and
Emotion 13, 2 (1989), 143157.
[19] Angeliki Metallinou, Chi-Chun Lee, Carlos Busso,
Sharon Carnicke, Shrikanth Narayanan, and others.
2010. The USC CreativeIT database: A multimodal
database of theatrical improvisation. Multimodal
Corpora: Advances in Capturing, Coding and
Analyzing Multimodality (2010), 55.
[20] Angeliki Metallinou and Shrikanth Narayanan. 2013.
Annotation and processing of continuous emotional
attributes: Challenges and opportunities. In 2013 10th
IEEE international conference and workshops on
automatic face and gesture recognition (FG). IEEE, 1
8.
[21] Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir
Hussain. 2017. A review of affective computing: From
unimodal analysis to multimodal fusion. Information
Fusion 37 (2017), 98125.
[22] Nicole A Roberts, Jeanne L Tsai, and James A Coan.
2007. Emotion elicitation using dyadic interaction
tasks. Handbook of emotion elicitation and
assessment (2007), 106123.
[23] Theodore F Robles, Richard B Slatcher, Joseph M
Trombello, and Meghan M McGinn. 2014. Marital
quality and health: A meta-analytic review.
Psychological bulletin 140, 1 (2014), 140.
[24] Anna Marie Ruef and Robert W Levenson. 2007.
Continuous measurement of emotion. Handbook of
emotion elicitation and assessment (2007), 286297.
[25] James A Russell. 1980. A circumplex model of affect.
Journal of personality and social psychology 39, 6
(1980), 1161.
[26] James A Russell. 1994. Is there universal recognition
of emotion from facial expression? A review of the
cross-cultural studies. Psychological bulletin 115, 1
(1994), 102.
[27] James A Russell, Anna Weiss, and Gerald A
Mendelsohn. 1989. Affect grid: a single-item scale of
pleasure and arousal. Journal of personality and social
psychology 57, 3 (1989), 493.
[28] K. R Scherer, H Wallbott, D Matsumoto, and K
Tsutomu. 1988. Emotional experience in cultural
context: A comparison between Europe, Japan and
the United States. Faces of emotion: recent research
(1988), 98115.
[29] Laura Sels, Eva Ceulemans, and Peter Kuppens.
2019. All’s well that ends well? A test of the peak-end
rule in couples’ conflict discussions. European Journal
of Social Psychology 49, 4 (2019), 794806.
[30] Christopher J Soto and Oliver P John. 2017. The next
Big Five Inventory (BFI-2): Developing and assessing
a hierarchical model with 15 facets to enhance
bandwidth, fidelity, and predictive power. Journal of
personality and social psychology 113, 1 (2017), 117.
[31] Eva Vozáriková and Jozef Juhár. 2015. Comparison of
Diarization Tools for Building Speaker Database.
Advances in Electrical and Electronic Engineering 13
(11 2015), 314319. DOI:
http://dx.doi.org/10.15598/aeee.v13i4.1468
... The intervention was developed with the open-source software platform MobileCoach [67,83,84], which has already been used successfully for various clinical and public health interventions [17,68,[77][78][79]85,86] and ecological momentary assessments [87][88][89]. MobileCoach is available under the academia-and industry-friendly open-source Apache 2.0 license. MobileCoach-based interventions are delivered via SMS text messages, and via mobile apps for the Android and iOS operating systems. ...
Article
Full-text available
Background: Successful management of chronic diseases requires a trustful collaboration between health care professionals, patients, and family members. Scalable conversational agents, designed to assist health care professionals, may play a significant role in supporting this collaboration in a scalable way by reaching out to the everyday lives of patients and their family members. However, to date, it remains unclear whether conversational agents, in such a role, would be accepted and whether they can support this multistakeholder collaboration. Objective: With asthma in children representing a relevant target of chronic disease management, this study had the following objectives: (1) to describe the design of MAX, a conversational agent–delivered asthma intervention that supports health care professionals targeting child-parent teams in their everyday lives; and (2) to assess the (a) reach of MAX, (b) conversational agent–patient working alliance, (c) acceptance of MAX, (d) intervention completion rate, (e) cognitive and behavioral outcomes, and (f) human effort and responsiveness of health care professionals in primary and secondary care settings. Methods: MAX was designed to increase cognitive skills (ie, knowledge about asthma) and behavioral skills (ie, inhalation technique) in 10-15-year-olds with asthma, and enables support by a health professional and a family member. To this end, three design goals guided the development: (1) to build a conversational agent–patient working alliance; (2) to offer hybrid (human- and conversational agent–supported) ubiquitous coaching; and (3) to provide an intervention with high experiential value. An interdisciplinary team of computer scientists, asthma experts, and young patients with their parents developed the intervention collaboratively. The conversational agent communicates with health care professionals via email, with patients via a mobile chat app, and with a family member via SMS text messaging. A single-arm feasibility study in primary and secondary care settings was performed to assess MAX. Results: Results indicated an overall positive evaluation of MAX with respect to its reach (49.5%, 49/99 of recruited and eligible patient-family member teams participated), a strong patient-conversational agent working alliance, and high acceptance by all relevant stakeholders. Moreover, MAX led to improved cognitive and behavioral skills and an intervention completion rate of 75.5%. Family members supported the patients in 269 out of 275 (97.8%) coaching sessions. Most of the conversational turns (99.5%) were conducted between patients and the conversational agent as opposed to between patients and health care professionals, thus indicating the scalability of MAX. In addition, it took health care professionals less than 4 minutes to assess the inhalation technique and 3 days to deliver related feedback to the patients. Several suggestions for improvement were made. Conclusions: This study provides the first evidence that conversational agents, designed as mediating social actors involving health care professionals, patients, and family members, are not only accepted in such a “team player” role but also show potential to improve health-relevant outcomes in chronic disease management.
... Building upon our recommendations in [4], we investigate through a machine learning perspective which segment(s) of an audio conversation could be used to best recognize the emotions of each partner after a conversation. Our research question is as follows: ...
Conference Paper
Full-text available
Extensive couples’ literature shows that how couples feel after a conflict is predicted by certain emotional aspects of that conver- sation. Understanding the emotions of couples leads to a better understanding of partners’ mental well-being and consequently their relationships. Hence, automatic emotion recognition among couples could potentially guide interventions to help couples im- prove their emotional well-being and their relationships. It has been shown that people’s global emotional judgment after an experience is strongly influenced by the emotional extremes and ending of that experience, known as the peak-end rule. In this work, we leveraged this theory and used machine learning to investigate, which au- dio segments can be used to best predict the end-of-conversation emotions of couples. We used speech data collected from 101 Dutch- speaking couples in Belgium who engaged in 10-minute long con- versations in the lab. We extracted acoustic features from (1) the audio segments with the most extreme positive and negative rat- ings, and (2) the ending of the audio. We used transfer learning in which we extracted these acoustic features with a pre-trained convolutional neural network (YAMNet). We then used these fea- tures to train machine learning models — support vector machines — to predict the end-of-conversation valence ratings (positive vs negative) of each partner. The results of this work could inform how to best recognize the emotions of couples after conversation- sessions and eventually, lead to a better understanding of couples’ relationships either in therapy or in everyday life.
... The intervention was developed with the open-source software platform MobileCoach (www.mobile-coach.eu) [68,84], which has been already used successfully for various clinical and public health interventions [17,67,77,79,80,85,86] and ecological momentary assessments [87][88][89]. ...
Preprint
BACKGROUND Successful management of chronic diseases requires a trustful collaboration between healthcare professionals, patients, and family members. Scalable conversational agents (CAs), designed to assist healthcare professionals, may play a significant role in supporting this collaboration in a scalable way by reaching out into the everyday lives of patients and their family members. Until now, however, it has not been clear whether CAs, in such a role, would be accepted and whether they can support this multi-stakeholder collaboration. OBJECTIVE With asthma in children representing a relevant target of chronic disease management, this work has two objectives: (1) To describe the design of MAX, a CA-delivered asthma intervention that supports healthcare professionals targeting child-parent teams in their everyday lives; (2) To assess the (a) reach of MAX, (b) CA-patient working alliance, (c) acceptance of MAX, (d) intervention completion rate, (e) cognitive and behavioral outcomes, and (f) human effort and responsiveness of healthcare professionals in primary and secondary care settings. METHODS MAX was designed to increase cognitive skills (i.e. knowledge about asthma) and behavioral skills (i.e. inhalation technique) in 10-15-year-olds with asthma and enables support by a health professional and a family member. To this end, three design goals guided the development: (1) To build a CA-patient working alliance; (2) To offer hybrid (human- and CA-supported) ubiquitous coaching; (3) To provide an intervention with a high experiential value. An interdisciplinary team of computer scientists, asthma experts, and young patients with their parents developed the intervention collaboratively. The CA communicates with healthcare professionals via email, with patients via a mobile chat app and with a family member via SMS. A single-arm feasibility study in primary and secondary care settings was conducted to assess MAX. RESULTS Results indicate an overall positive evaluation of MAX with respect to its reach (49.5% (49 out of 99) of recruited and eligible patient-family member teams participated), a strong patient-CA working alliance, and a high acceptance by all relevant stakeholders. Moreover, MAX led to improved cognitive and behavioral skills and an intervention completion rate of 75.5%. Family members supported the patients in 269 out of 275 (97.8%) coaching sessions. Most of the conversational turns (99.5%) were conducted between patients and the CA as opposed to between patient and healthcare professional, thus indicating the scalability of MAX. In addition, it took healthcare professionals less than four minutes to assess the inhalation technique and three days to deliver that feedback to the patients. Several suggestions for improvement were made. CONCLUSIONS For the first time, this work provides evidence that CAs, designed as mediating social actors involving healthcare professionals, patients and family members, are not only accepted in such a “team player” role, but also show potential to improve health-relevant outcomes in chronic disease management.
Conference Paper
Full-text available
Illness management among married adults is mainly shared with their spouses and it involves social support. Social support among couples has been shown to affect emotional well-being positively or negatively and result in healthier habits among diabetes patients. Hence, through automatic emotion recognition, we could have an assessment of the emotional well-being of couples which could inform the development and triggering of interventions to help couples better manage chronic diseases. We are developing an emotion recognition system to recognize the emotions of real couples in everyday life and in this paper, we describe our approach to collecting sensor and self-report emotion data among Swiss-based German-speaking couples in everyday life. We also discuss various aspects of the study such as our novel approach of triggering data collection based on detecting that the partners are close and speaking, the self-reports and multimodal data as well as privacy concerns with our method.
Conference Paper
Full-text available
Married adults share illness management with spouses and it involves social support and common dyadic coping (CDC). Social support and CDC have an impact on health behavior and well-being or emotions in couples' dyadic management of diabetes in daily life. Hence, understanding dyadic interactions in-situ in chronic disease management could inform behavioral interventions to help the dyadic management of chronic diseases. It is however not clear how well social support and CDC can be assessed in daily life among couples who are managing chronic diseases. In this ongoing work, we describe the development of DyMand, a novel open-source mobile and wearable system for ambulatory assessment of couples' dyadic management of chronic diseases. Our first prototype is used in the context of diabetes mellitus Type II. Additionally, we briefly describe our experience deploying the prototype in two pre-pilot tests with five subjects and our plans for future deployments.
Conference Paper
Full-text available
Smartwatches provide a unique opportunity to collect more speech data because they are always with the user and also have a more exposed microphone compared to smartphones. Speech data could be used to infer various indicators of mental well being such as emotions, stress and social activity. Hence, real-time voice activity detection (VAD) on smartwatches could enable the development of applications for mental health monitoring. In this work, we present VADLite, an open-source, lightweight, system that performs real-time VAD on smartwatches. It extracts mel-frequency cepstral coefficients and classifies speech versus non-speech audio samples using a linear Support Vector Machine. The real-time implementation is done on the Wear OS Polar M600 smartwatch. An offline and online evaluation of VADLite using real-world data showed better performance than WebRTC's open-source VAD system. VADLite can be easily integrated into Wear OS projects that need a lightweight VAD module running on a smartwatch.
Article
Full-text available
Background: Type II diabetes mellitus (T2DM) is a common chronic disease. To manage blood glucose levels, patients need to follow medical recommendations for healthy eating, physical activity, and medication adherence in their everyday life. Illness management is mainly shared with partners and involves social support and common dyadic coping (CDC). Social support and CDC have been identified as having implications for people’s health behavior and well-being. Visible support, however, may also be negatively related to people’s well-being. Thus, the concept of invisible support was introduced. It is unknown which of these concepts (ie, visible support, invisible support, and CDC) displays the most beneficial associations with health behavior and well-being when considered together in the context of illness management in couple’s everyday life. Therefore, a novel ambulatory assessment application for the open-source behavioral intervention platform MobileCoach (AAMC) was developed. It uses objective sensor data in combination with self-reports in couple’s everyday life. Objective: The aim of this paper is to describe the design of the Dyadic Management of Diabetes (DyMand) study, funded by the Swiss National Science Foundation (CR12I1_166348/1). The study was approved by the cantonal ethics committee of the Canton of Zurich, Switzerland (Req-2017_00430). Methods: This study follows an intensive longitudinal design with 2 phases of data collection. The first phase is a naturalistic observation phase of couples’ conversations in combination with experience sampling in their daily lives, with plans to follow 180 T2DM patients and their partners using sensor data from smartwatches, mobile phones, and accelerometers for 7 consecutive days. The second phase is an observational study in the laboratory, where couples discuss topics related to their diabetes management. The second phase complements the first phase by focusing on the assessment of a full discussion about diabetes-related concerns. Participants are heterosexual couples with 1 partner having a diagnosis of T2DM. Results: The AAMC was designed and built until the end of 2018 and internally tested in March 2019. In May 2019, the enrollment of the pilot phase began. The data collection of the DyMand study will begin in September 2019, and analysis and presentation of results will be available in 2021. Conclusions: For further research and practice, it is crucial to identify the impact of social support and CDC on couples’ dyadic management of T2DM and their well-being in daily life. Using AAMC will make a key contribution with regard to objective operationalizations of visible and invisible support, CDC, physical activity, and well-being. Findings will provide a sound basis for theory- and evidence-based development of dyadic interventions to change health behavior in the context of couple’s dyadic illness management. Challenges to this multimodal sensor approach and its feasibility aspects are discussed. International Registered Report Identifier (IRRID): PRR1-10.2196/13685
Article
Full-text available
Whether emotion is universal or social is a recurrent issue in the history of emotion study among psychologists. Some researchers view emotion as universal construct, and that large part of emotional experience is biologically based. However, emotion is not only biologically determined, but also influenced by environment. Therefore, some aspects of emotions have cultural differences. One of the important aspects of emotion that has cultural difference is emotional arousal level. All affective states are systematically represented as two bipolar dimensions, valence and arousal. Arousal level of actual and ideal emotions consistently found to have cross-cultural differences. In Western or individualist culture, high arousal emotions are valued and promoted more than low arousal emotions. Also, Westerners experience high arousal emotions more than low arousal emotions. On the contrary, in Eastern or collectivist culture low arousal emotions are valued more than high arousal emotions. Moreover, people in the East actually experience and prefer to experience low arousal emotions more than high arousal emotions. Mechanism of these cross-cultural differences and implications are also discussed.
Article
Despite its importance for well‐being, surprisingly little is known about what determines how couples feel after a conflict. Based on the peak‐end rule, we examined whether partners’ post‐conflict affect was mainly predicted by their most aversive or pleasant emotional experience (peaks) during the conflict, or by the emotional tone at the end of the interaction. 101 couples engaged in a conflict interaction and afterwards evaluated their momentary affect during the interaction. Post‐conflict affect (in terms of positive and negative feelings, and perceived partner responsiveness) was assessed immediately after the conflict, after a subsequent positive discussion, and upon returning to daily life (here, rumination about the relationship was assessed as well). Our results showed that the negative and positive peaks, but not the end emotion, predicted immediate and partly extended post‐conflict affect in individuals. This finding has clinical implications for the remediation of couple conflict. This article is protected by copyright. All rights reserved.
Article
Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field.
Article
Objective: While in general arousal increases with positive or negative valence (a so-called V-shape relation), there are large differences among individuals in how these two fundamental dimensions of affect are related in people's experience. In two studies, we examined two possible sources of this variation: personality and culture. Method: In Study 1, participants recalled a recent event that was characterised by high or low valence or arousal and reported on their feelings, and reported on their personality in terms of the Five-Factor Model. In Study 2, participants from Canada, China/Hong Kong, Japan, Korea, and Spain reported on their feelings in a thin slice of time and on their personality. Results: In Study 1, we replicated the V-shape as characterising the relation between valence and arousal, and identified personality correlates of experiencing particular valence-arousal combinations. In Study 2, we documented how the V-shaped relation varied as a function of western versus eastern cultural background and again personality. Conclusion: The results showed that the steepness of the V-shape relation between valence and arousal increases with extraversion within cultures, and with a west-east distinction between cultures. Implications for the personality-emotion link and research on cultural differences in affect are discussed. This article is protected by copyright. All rights reserved.