Conference PaperPDF Available

An Analysis of PCA-Based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions.

Authors:

Abstract and Figures

Entrainment has played a crucial role in analyzing marital couples interactions. In this work, we introduce a novel technique for quantifying vocal entrainment based on Principal Component Analysis (PCA). The entrainment measure, as we define in this work, is the amount of preserved variability of one interlocutor's speaking characteristic when projected onto representing space of the other's speaking characteristics. Our analysis on real couples interactions shows that when a spouse is rated as having positive emotion, he/she has a higher value of vocal entrainment compared when rated as having negative emotion. We further performed various statistical analyses on the strength and the directionality of vocal entrainment under different affective interaction conditions to bring quantitative insights into the entrainment phenomenon. These analyses along with a baseline prediction model demonstrate the validity and utility of the proposed PCA-based vocal entrainment measure.
Example of Computing Two Directions of Vocal Entrainment for Turns Hi interlocutor. A schematic example of how to compute the two directions (toward: veT O , from: veF R) of vocal entrainment for an interlocutor's, husband, speech turn, Hi, in an married couple interaction is shown in Figure 1. The steps listed below are used to compute the husband's veT O at turn Hi: 1. Extract appropriate vocal features, X1, to represent husband's speaking characteristics at turn Hi. 2. Perform PCA on z-normalized of X1, such that Y T 1 = D1X T 1. 3. Predefined a variance level (v1 = 0.95) to select Lsubset of basis vectors, D1L. 4. Project the z-normalized vocal features, X2 extracted from wife's speech at turn Wi, using D1L. 5. Compute vocal entrainment measure as the ratio of represented variance of X2, in W1L basis, and the predefined variance level in step 3. We can compute the other direction of entrainment, veF R, by interchanging X1 with X2. There are two major motivations behind these PCA-based vocal entrainment measures. First is the elimination of concerns associated with imposing heuristics in the computation of conventional synchronization measures due to the the turn-taking structures of human conversation and variable length of speaking turns (resulting in different number of vocal feature vectors sequences per speaking turn). These two factors can raise concerns on the reliability of using classical measures. However, with the PCA-based measures, because the representation resides in another transformed space, the issues of non-simultaneously occurring time series and variablelength analysis chunks are both lessened. Second is the ability to introduce the directionality of entrainment at each speaking turn per speaker. As we can see from Figure 1, there can be two directions of vocal entrainment for a given spouse at each of his/her speaking turn. This directionality can be important to understand the details of entrainment phenomenon.
… 
Content may be subject to copyright.
An Analysis of PCA-based Vocal Entrainment Measures in Married Couples’
Affective Spoken Interactions
Chi-Chun Lee1, Athanasios Katsamanis1, Matthew P. Black1,
Brian R. Baucom2, Panayiotis G. Georgiou1, Shrikanth S. Narayanan1,2
1Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA
2Department of Psychology, University of Southern California, Los Angeles, CA, USA
http://sail.usc.edu
Abstract
Entrainment has played a crucial role in analyzing marital cou-
ples interactions. In this work, we introduce a novel technique
for quantifying vocal entrainment based on Principal Compo-
nent Analysis (PCA). The entrainment measure, as we define
in this work, is the amount of preserved variability of one inter-
locutor’s speaking characteristic when projected onto represent-
ing space of the other’s speaking characteristics. Our analysis
on real couples interactions shows that when a spouse is rated
as having positive emotion, he/she has a higher value of vocal
entrainment compared when rated as having negative emotion.
We further performed various statistical analyses on the strength
and the directionality of vocal entrainment under different affec-
tive interaction conditions to bring quantitative insights into the
entrainment phenomenon. These analyses along with a base-
line prediction model demonstrate the validity and utility of the
proposed PCA-based vocal entrainment measure.
Index Terms: vocal entrainment, couples therapy, behavioral
signal processing, principal component analysis
1. Introduction
In a dyadic spontaneous spoken interaction, the interlocutors
exert mutual influence on each other’s behaviors. This mutual
influence on the dyad’s behaviors guides the dynamic flow of
the interaction. It is in this context, the term - interaction syn-
chrony, a.k.a entrainment, is used to describe the phenomenon
of a naturally occurring coordination between interacting indi-
viduals’ behaviors both in timing and form. There has been re-
search works on attempting to quantify specific entrainment be-
haviors, such as voice activity rhythm [1] and gestures [2], and
they have shown that it is essential to apply quantitative meth-
ods for analyzing interpersonal interaction dynamics in fine de-
tails. Entrainment in conversation describes an important as-
pect of human interaction dynamics, since it is believed that
variations in the pattern of entrainment phenomenon can offer
insights into the behaviors of the interacting individuals; this
is especially critical in understanding interaction patterns when
the underlying behavior is deemed atypical or distressed. This
has inspired the investigation of new computational approaches,
referred as behavioral signal processing (BSP), to problems in
mental health such as couples therapy, addiction behavior, de-
pression, and autism spectrum disorder diagnosis/analysis. The
aim of BSP is to automatically analyze abstract human behav-
iors/states from low level signal measurements such as from au-
dio and video recordings of interactions. In this work, we at-
tempt to quantify vocal entrainment in the spoken interactions
of married couples engaged in affective problem-solving ses-
sions during marital therapy using such signal processing tech-
niques. A major motivation for this quantitative study of vo-
cal entrainment comes from various psychological studies that
have stated the importance of entrainment phenomenon in un-
derstanding the nature of couples’ interactions [3].
Across a variety of research domains, e.g., econometrics,
neuroscience, physical coupled system studies, etc, a long list
of synchronization measures [4] have been utilized to quan-
tify interdependence between time series and associated vari-
ables. These measures often lack straightforward methods to
handle complex interaction scenarios like human-human con-
versations, where the analysis window length (e.g., length of
each speaking turn) per channel (e.g., a speaker in the conver-
sation) varies across time and speakers; the signals associated
with human conversations can also be very abstract and com-
plex. The two variables in the time series (corresponding to
the interlocutors in the dyad) do not occur simultaneously be-
cause of the inherent turn-taking structure of human conversa-
tions. These phenomenon often violate the underlying assump-
tion when applying classical synchronization measures on sig-
nals of interest. Furthermore, majority of these measures are
symmetric measures that do not provide information on the di-
rections of synchronization.
In order to improve upon our previous work [5] of quan-
tifying vocal entrainment, we incorporate an expanded list of
vocal features and derive a new quantitative vocal entrainment
measure based on Principal Component Analysis (PCA). In this
work, we propose the quantification of vocal entrainment as the
amount of variability preserved when representing a speaker’s
(say, SP1) vocal characteristics in the vocal characteristics space
of another speaker (say, SP2). The vocal characteristics space
is constructed using PCA with acoustic cues. Intuitively, the
larger the amount of variability preserved, the higher the vocal
entrainment level. This method can address both of the afore-
mentioned concerns because of its utilization of projecting vo-
cal features onto the transformed vocal characteristic subspace
for any variable length of speech features. Furthermore, for a
given speaker pair (say SP1, SP2) in an interaction, this method
can generate two directions of vocal entrainment when we look
at any single speaker, say SP1: one corresponds to how much
SP1 is entraining toward SP1, and the other corresponds to how
much SP1 is getting entrained from SP2.
Various psychology research studies [6, 7] and our own pre-
vious work [5] indicate that the general existence of a higher
level of entrainment when a spouse is rated as having posi-
tive affect compared to rated as having negative affect. While
the relationship between entrainment phenomenon and emotion
can be complex [8], we rely on this general trend to investi-
Copyright © 2011 ISCA 28
-
31 August 2011, Florence, Italy
INTERSPEECH 2011
3101
gate the use of the proposed PCA-based vocal entrainment mea-
sures. Our analysis on the directionality of entrainment further
indicates that when a spouse is rated as having positive affect,
he/she shows statistically significant more vocal entrainment to-
ward his/her interacting partners, but is not eliciting entrain-
ment from his/her interacting partners. Finally, we use support
vector machine (SVM) to design a baseline prediction model for
classifying session-level code of high positive vs. high negative
affect on each spouse using only vocal entrainment measure.
The paper is organized as follows: we describe the database
and research methodology in section2. Experiment setup and
results are in section 3, and conclusions are in section 4.
2. Research Methodology
2.1. Database
The data that we are using was collected as part of the largest
longitudinal, randomized control trial of psychotherapy for
severely and stably distressed couples [9]. The database con-
sists of audio-visual data recordings: a single channel far-field
microphone, split screen videos, and observation coding on the
behaviors of these real married couples. Multiple trained evalu-
ators were instructed to code the behaviors of each spouse using
the two standard manual codings, the Social Support Interaction
Rating System (SSIRS) and the Couples Interaction Rating Sys-
tem (CIRS), resulting in 33 session-level codes for each spouse
on their interaction. There are a total of 569 sessions (117
unique couples) of couples engaging in problem solving inter-
actions in which an issue in their relationship was raised and
discussed. Since the manual transcripts are available, the au-
dio data was automatically segmented into pseudo-turns (with
speaker identification: husband, wife, unknown) and aligned to
the word transcripts using a software, SailAlign [10]. These
pseudo-turns are considered as speaking turns in this research
work because they correspond to the speaking portion of the
same speaker before the other speaker takes over the floor. The
audio data qualities vary a lot from session to session; therefore,
we use only a subset of 372 sessions out of 569 sessions because
they meet the criteria of 5 dB SNR and 55% speaker segmen-
tations after this automatic process. Details of the database can
be found in the previous work [11].
The focus of this work is to quantitatively examine the vocal
entrainment of married couples in sessions where either spouse
was rated with extreme affective states (positive & negative).
The emotional rating is the code “Global Positive Affect” and
“Global Negative Affect” (based on SSIRS) on each spouse at
the session level. We focus on those sessions out of the 372 ses-
sions that spouse was rated in top 20% of positive and negative
emotion on the sessions and denote them as being high positive
emotion and high negative emotion in this work. Based on this
selection of extreme affective states, it results in a total number
of 280 sessions with 81 unique couples of which 140 sessions
correspond to high positive emotion and another 140 sessions
correspond to high negative emotion to be used in this work.
2.2. PCA-based Vocal Entrainment Measures
The core idea behind this quantification of vocal entrainment
is to construct a basis set representing speaking characteristics
space of an interlocutor per speaking turn using PCA. The en-
trainment level is essentially defined as a measure of similarity
when projecting another interlocutor’s speaking characteristics
onto this constructed space of speaking characteristics; in this
case, the metric is the amount of preserved variance of vocal
features from one interlocutor while projecting onto the other
Figure 1: Example of Computing Two Directions of Vocal En-
trainment for Turns Hi
interlocutor. A schematic example of how to compute the two
directions (toward:veT O ,from:veF R ) of vocal entrainment
for an interlocutor’s, husband, speech turn, Hi, in an married
couple interaction is shown in Figure 1. The steps listed below
are used to compute the husband’s veT O at turn Hi:
1. Extract appropriate vocal features, X1, to represent hus-
band’s speaking characteristics at turn Hi.
2. Perform PCA on z-normalized of X1, such that YT
1=
D1XT
1.
3. Predefined a variance level (v1= 0.95) to select L-
subset of basis vectors, D1L.
4. Project the z-normalized vocal features, X2extracted
from wife’s speech at turn Wi, using D1L.
5. Compute vocal entrainment measure as the ratio of rep-
resented variance of X2, in W1Lbasis, and the prede-
fined variance level in step 3.
We can compute the other direction of entrainment, veFR ,
by interchanging X1with X2. There are two major motivations
behind these PCA-based vocal entrainment measures. First is
the elimination of concerns associated with imposing heuristics
in the computation of conventional synchronization measures
due to the the turn-taking structures of human conversation and
variable length of speaking turns (resulting in different number
of vocal feature vectors sequences per speaking turn). These
two factors can raise concerns on the reliability of using classi-
cal measures. However, with the PCA-based measures, because
the representation resides in another transformed space, the is-
sues of non-simultaneously occurring time series and variable-
length analysis chunks are both lessened. Second is the ability
to introduce the directionality of entrainment at each speaking
turn per speaker. As we can see from Figure 1, there can be
two directions of vocal entrainment for a given spouse at each
of his/her speaking turn. This directionality can be important to
understand the details of entrainment phenomenon.
2.3. Representative Vocal Feature Set
The method describes in Section 2 relies on an appropriate set of
acoustic features to represent speaking characteristics. In order
to capture the dynamics of the speaking characteristics, the PCA
is done on a speaking turn where the vocal features are com-
puted at the word level. There are two different categories of
vocal features used in this work: prosodic features and spectral
features. The details of the raw acoustic extractions from audio
files with necessary preprocessing and speaker-normalization
are described in previous work [11]. The following is the list
of final set of acoustic features calculated per word (resulting
from automatic alignment) to represent the speaking character-
istics.
Prosodic Features (Pitch x4) : third-order polynomial fit
on the pitch contour per word.
Prosodic Features (Energy x2) : mean and variance of
the energy per word.
Prosodic Features (Word Duration x1): the word dura-
tion.
Spectral Features (MFCC 2x15): mean and variance of
15 dimensional MFCC per word.
3102
This list combined with the first order delta features gen-
erates a 74-dimensional (37x2) vocal feature vector per word.
Depending on the length of speaking turns, it would result in
a variable length of 74-dimensional vocal feature sequences.
PCA are performed on the merged-turns, (merging speaking
turns into merged-turn such that it has least 74 samples), in or-
der to generates a unique set of basis vectors.
3. Experiment Setups & Results
Three different experiments were set up to analyze different as-
pects of this PCA-based vocal entrainment measure.
Experiment I: To investigate if the proposed PCA-based
vocal entrainment measures offer a reasonable quantifi-
cation of vocal entrainment using two different hypothe-
sis testings.
Experiment II: To analyze the direction of the PCA-
based vocal entrainment under different conditions of af-
fective married couples’ interactions.
Experiment III: To discriminate affective state using
this PCA-based vocal entrainment measure as features
with Support Vector Machine.
3.1. Experiment I
We used two different approaches in verifying that the PCA-
based vocal measure is indeed a viable quantitative measure of
vocal entrainment. First, we rely on the fact that there has been
a general understanding that when couples are engaged in an
interpersonal interaction with a more positive emotion, the en-
trainment level is expected to be higher than with a negative
emotion. The second is to show that the entrainment measures
computed this way between interacting couples is statistically
higher than computing the entrainment measure between ran-
dom pair of couples not engaged in conversations.
3.1.1. Hypothesis Testings Setup & Results
The first hypothesis testing was to verify the approach of PCA-
based vocal entrainment by using the Student’s T-Test (α=
0.05) to examine whether the value is bigger in cases when a
spouse was rated with high positive emotions compared with
high negative emotions. The distribution of the PCA-based en-
trainment measures was approximately normal. Table 1 shows
the results of the hypothesis testing.
Table 1: Entrainment Levels: Higher in Positive Emotion vs
Negative Emotion.
Entrainment Type High Positive High Negative p-value
Toward (veT O) 0.8276 0.8198 0.0103
From (veF R ) 0.8307 0.8256 0.0699
Table 1 shows that when a spouse was rated with positive
affective state, the associated PCA-based entrainment measures
are higher for both of the direction (toward and from) though
only the direction of toward passed the (α= 0.05) significance
level. This result provides an evidence that the PCA-based en-
trainment measure describes the entrainment phenomenon that
is generally understood in marital communication.
Another hypothesis testing was conducted using the Stu-
dent’s T-Test (α= 0.05) to examine whether this PCA-based
entrainment computed in sequence of turn takings for actual in-
teracting couples has a larger value than when computed for
any random pair of speaking turns. The intuition is that if this
method captures the notion of coherence in dialogs, this mea-
sure should have a higher value compared to when computing
two turns that are randomly selected (between two people that
were not engaged in direct interaction). Instead of examining
both directions separately, average of the values were computed
across all 372 sessions (not restricting to only positive vs. nega-
tive sessions). Random entrainment values were computed with
10,000 random draws with replacement of a pair of turns from
non-interacting couples. Table 2 shows the statistical testing
result.
Table 2: Entrainment Levels: Higher in Pairs of Sequence in
Turn-Taking vs. Random Pair of Turns.
Pairs of Turns Random Pairs p-value
Avg. of Entrainment 0.8266 0.8231 0.018
Table 2 provides additional corroborating statistical evi-
dence that indeed this PCA-based method of computing en-
trainment captures a notion of vocal synchronization because
the value is greater overall when we compute it across turn-
sequences of interacting couples compared with turn-pairs of
non-interacting “couples”. These two hypothesis testing experi-
ments provide some grounding evidence that the signal-derived
PCA-based vocal entrainment measure is a viable method to
quantify interpersonal synchronization.
3.2. Experiment II
In Experiment II, we extend our statistical analysis to analyze
the strength of vocal entrainment in each interaction direction
(toward and from) given different conditions, termed here in-
teraction atmosphere. Here, for our problem context, we de-
fine interaction atmosphere as three types: 1, both spouses were
rated as having high positive emotion, 2, only one spouse was
rated with high positive emotion or with high negative emotion,
and 3, both spouses were rated as having high negative emo-
tions. The following is the list of the statistical testings with the
Student’s T-Test (α= 0.05).
Test 1: Comparison of entrainment measure for type 1
vs. type 3 interactions: alternative hypothesis states that
the entrainment values are higher in type 1.
Test 2: Comparison of entrainment measure for type 2
interactions: alternative hypothesis states that entrain-
ment values are higher when one spouse was rated as
high positive vs. one spouse was rated as high negative.
Test 3: Comparison of entrainment measures for type
1 vs. type 2 interactions: alternative hypothesis states
that when both spouses were rated as high positive, en-
trainment values are higher compared to when only one
spouse was rated as high positive.
Test 4: Comparison of entrainment measure for type 1
vs. type 2 interactions: alternative hypothesis stating that
when both spouses were rate high negative, entrainment
values are lower compared with only one spouse was
rated as high negative.
The summary of the statistical testing results of Experiment
II is in Table 3. Several notable points can be made with the re-
sult in Table 3. First, the vocal entrainment measures are higher
(in both directions) where both spouses were rated as having
high positive emotions (Test 1), which is expected as suggested
by various psychology literatures. Second, with this quantifica-
tion of vocal entrainment, results suggest that when one spouse
was rated with high positive emotion, he/she shows higher val-
ues of entrainment toward his/her interacting partner compared
to when he/she was rated as high negative (Test 2). This im-
plies that when a person is in a more positive emotion, his/her
3103
Table 3: Hypothesis Testing Summary (Various Interaction Atmosphere Types.
Test # (Entrainment Type) Mean of HoMean of Hap-value Test # (Entrainment Type) Mean of HoMean of Hap-value
Test 1 (toward) 0.8196 0.8289 0.0314 Test 1 (from) 0.8196 0.8289 0.0314
Test 2 (toward) 0.8189 0.8265 0.050 Test 2 (from) 0.8311 0.8321 0.3831
Test 3 (toward) 0.8265 0.8289 0.3126 Test 3 (from) 0.8289 0.8321 0.7741
Test 4 (toward) 0.8189 0.8196 0.5635 Test 4 (from) 0.8311 0.8196 0.009
vocal characteristics are becoming similar toward his/her inter-
acting partner to possibly ease the tension of the interaction or
provide support. However, the results indicate that his/her inter-
acting partners may not have displayed such entrainment toward
him/her. Test 3 results suggest that there is no difference in the
level of vocal entrainment when both spouses were rated with
positive emotion compared to when only one spouse was rated
with positive emotion. Lastly, this results suggest that when
both spouses were rated as high negative, they receive less vo-
cal entrainment from their interacting partner compared to when
only one spouse was rated as high negative (Test 4). This out-
come is also intuitive because when couples are both negative,
they would be less willing to entrain toward one another (less
likely to provide emotional support to each other). Through this
series of statistical testings, it is encouraging to observe that
this method can be a viable approach to perform detailed anal-
ysis of entrainment in relation to psychologist’s affective rating
of these distressed couples with a potential of performing many
more testings on entrainment for various interaction conditions.
3.3. Experiment III
The goal of this experiment is to study the predictive ability of
the vocal entrainment measure in recognizing spouse’s session-
level affective codes. We performed a baseline binary classifica-
tion using Support Vector Machine (with radial basis functions)
to differentiate high positive vs. high negative affective states
using this vocal entrainment measure. We focus on only one di-
rection of the entrainment (toward) for each spouse, since in Ta-
ble 3, it exhibits statistical significance difference between high
positive and high negative affective states. Nine different sta-
tistical functionals were computed per session (mean, variance,
range, maximum, minimum, 25% quantile, 75% quantile, in-
terquartile range, median). Evaluation was done using an leave-
one-couple-out cross validation, and we obtained an recognition
rate of 51.79%. A more detailed classification setup of recog-
nizing affective state using a multiple instance learning frame-
work further improves recognition rate to 53.93% with salient
vocal entrainment measures [12].
4. Conclusions & Future Works
The entrainment phenomenon is an integral aspect when ana-
lyzing couples interactions. Computational measures of vocal
entrainment can provide a quantitative characterization accom-
panying qualitative descriptions of this natural human commu-
nication phenomenon. In this work, we propose a PCA-based
vocal entrainment measure. It relies on the idea that to effec-
tively capture this subtle similarity between an interacting dyad,
we first construct a space (PCA) representing speaking charac-
teristics of each interlocutor with a set of common acoustic fea-
tures; then, the entrainment level is computed as the preserved
variability of another speaker represented in the transformed-
feature space of the original speaker. Analysis presented in Sec-
tion 3 shows that this is indeed a viable approach to quantify vo-
cal entrainment, and various statistical analyses using real cou-
ple interaction data have shown the differences in the strength
of directionality of vocal entrainment when each spouse is rated
with high positive compared to high negative affect.
Future works includes investigation of better representation
of speaking characteristics using various acoustic cues since a
suitable representation of the speaking style is a crucial aspect
while utilizing this method of PCA-based entrainment. An-
other research direction involves utilizing a more sophisticated
subspace construction method to overcome inherent problems
of PCA, such as its sensitivity to outliers A further direction
is to construct different representations that effectively capture
nonverbal behaviors. Since entrainment can provide insights
into conducting research on human-human communication, we
would like to extend this quantification scheme in hope to offer
psychology experts another choice of useful objective tools for
analysis of married couples communication.
5. Acknowledgments
This research was supported in part by funds from the National
Science Foundation, the Department of Defense, and the Na-
tional Institute of Health.
6. References
[1] A. R. McGarva and R. M. Warner, “Attraction and social coordi-
nation: Mutual entrainment of vocal activity rhymes,” Journal of
Psycholinguistic Research, vol. 32, no. 3, pp. 335–354, 2003.
[2] M. J. Richardson, K. L. Marsh, and R. Schmit, “Effects of vi-
sual and verbal interaction on unintentional interpersonal coordi-
nation,” Journal of Experimental Psychology: Human Perception
and Performance, vol. 31, no. 1, pp. 62–79, 2005.
[3] K. Eldridge and B. Baucom, Positive pathways for couples and
families: Meeting the challenges of relationships. WileyBlack-
well, ch. (in press) Couples and consequences of the demand-
withdraw interaction pattern.
[4] J. Dauwels, F. Vialatte, and A. Cichocki, “Diagnosis of alzheimers
disease from eeg signals: Where are we standing?” Current
Alzheimer’s Research (Invited Paper), 2011.
[5] C.-C. Lee, M. P. Black, A. Katsamanis, A. C. Lammert, B. R.
Baucom, A. Christensen, P. G. Georgiou, and S. S. Narayanan,
“Quantification of prosodic entrainment in affective spontaneous
spoken interactions of married couples,” in Proceedings of Inter-
speech, 2010.
[6] M. Kimura and I. Daibo, “Interactional synchrony in conversa-
tions about emotional episodes: A measurement by ’the between-
participants pseudosynchrony experimental paradigm’, Journal
of Nonverbal Behavior, vol. 30, pp. 115–126, 2006.
[7] L. L. Verhofstadt, A. Buysse, W. Ickes, M. Davis, and I. Devoldre,
“Support provision in marriage: The role of emotional similar-
ity and empathic accuracy,Emotion, vol. 8, no. 6, pp. 792–802,
2008.
[8] J. M. Gottman, “The roles of conflict engagement, escalation,
and avoidance in marital interaction: A longitudinal view of five
types of couples,” Journal of Consulting and Clinical Psychology,
vol. 61, no. 1, pp. 6–15, 1993.
[9] A. Christensen, D. Atkins, S. Berns, J. Wheeler, D. H. Baucom,
and L. Simpson, “Traditional versus integrative behavioral cou-
ple therapy for significantly and chronically distressed married
couples,” J. of Consulting and Clinical Psychology, vol. 72, pp.
176–191, 2004.
[10] A. Katsamanis, M. P. Black, P. G. Georgiou, L. Goldstein, and
S. S. Narayanan, “SailAlign: Robust long speech-text alignment,”
in Very-Large-Scale Phonetics Workshop, Jan. 2011.
[11] M. P. Black, A. Katsamanis, C.-C. Lee, A. C. Lammert, B. R.
Baucom, A. Christensen, P. G. Georgiou, and S. S. Narayanan,
“Automatic classification of married couples’ behavior using au-
dio features,” in Proceedings of Interspeech, 2010.
[12] C.-C. Lee, A. Katsamanis, M. P. Black, B. R. Baucom, P. G. Geor-
giou, and S. S. Narayanan, “Affective state recognition in married
couples’ interactions using pca-based vocal entrainment measures
with multiple instance learning,” in Submitted to ACII, 2011.
3104
... To encode the structure of small graphs/groups (only about 4 nodes/speakers), we propose to initialize the node and graph embeddings by applying a set of graph algorithms where each encodes a distinctive property of the graph. As the supervision, we propose to employ the domain-specific task of predicting real versus randomly permuted conversations, which has been utilized in the entrainment domain to verify the validity of measures (De Looze et al. 2014;Lee et al. 2011;Jain et al. 2012;Rahimi and Litman 2018). Experimental evaluations demonstrate that the group entrainment embedding improves performance for the downstream task of predicting group outcomes compared to the state-of-the-art methods. ...
... So, randomly generated conversations should not show strong entrainment relations. Distinguishing between real and randomly permuted fake conversations is a validation task in the entrainment literature (De Looze et al. 2014;Lee et al. 2011;Jain et al. 2012;Rahimi et al. 2017). We thus propose to use the permuted version of each conversation to build the corresponding fake graphs and use the nodes of these fake graphs to build C (g). ...
... After learning the entrainment embedding, similar to prior works (De Looze et al. 2014;Lee et al. 2011;Jain et al. 2012;Rahimi et al. 2017;Doyle, Yurovsky, and Frank 2016;Lee et al. 2011) which have evaluated entrainment extrinsically in terms of predicting outcomes, we evaluate its utility at the downstream task of predicting Favorable and Conflict team outcomes. We use support vector machines with RBF kernel and perform leave-one-out cross validation. ...
Article
Entrainment is the propensity of speakers to begin behaving like one another in conversation. While most entrainment studies have focused on dyadic interactions, researchers have also started to investigate multi-party conversations. In these studies, multi-party entrainment has typically been estimated by averaging the pairs' entrainment values or by averaging individuals' entrainment to the group. While such multi-party measures utilize the strength of dyadic entrainment, they have not yet exploited different aspects of the dynamics of entrainment relations in multi-party groups. In this paper, utilizing an existing pairwise asymmetric entrainment measure, we propose a novel graph-based vector representation of multi-party entrainment that incorporates both strength and dynamics of pairwise entrainment relations. The proposed kernel approach and weakly-supervised representation learning method show promising results at the downstream task of predicting team outcomes. Also, examining the embedding, we found interesting information about the dynamics of the entrainment relations. For example, teams with more influential members have more process conflict.
... Entrainment is the tendency of speakers to begin behaving like one another in conversation. The development of methods for automatically quantifying entrainment in text and speech data is an active research area, as entrainment has been shown to correlate with outcomes such as success measures and social variables for a variety of phenomena, e.g., acoustic-prosodic, lexical, and syntactic (Nenkova et al., 2008;Reitter and Moore, 2007;Mitchell et al., 2012;Levitan et al., 2012;Lee et al., 2011;Stoyanchev and Stent, 2009;Lopes et al., 2013;Lubold and Pon-Barry, 2014;Moon et al., 2014;Sinha and Cassell, 2015;Lubold et al., 2015). One of the main measures of entrainment is convergence which is the main focus of this paper. ...
... We explore the effect of our method, participation weighting, and simple averaging when calculating group convergence from dyads. We conclude that our proposed weighted convergence measure performs significantly better on multiple benchmark prediction and regression tasks that have been used to evaluate convergence in prior studies (De Looze et al., 2014;Lee et al., 2011;Jain et al., 2012;Rahimi et al., 2017a;Lee et al., 2011). ...
... We explore the effect of our method, participation weighting, and simple averaging when calculating group convergence from dyads. We conclude that our proposed weighted convergence measure performs significantly better on multiple benchmark prediction and regression tasks that have been used to evaluate convergence in prior studies (De Looze et al., 2014;Lee et al., 2011;Jain et al., 2012;Rahimi et al., 2017a;Lee et al., 2011). ...
Conference Paper
Full-text available
This paper proposes a new weighting method for extending a dyad-level measure of convergence to multi-party dialogues by considering group dynamics instead of simply averaging. Experiments indicate the usefulness of the proposed weighted measure and also show that in general a proper weighting of the dyad-level measures performs better than non-weighted averaging in multiple tasks.
... Here are the algorithms used by various works: SVM [5,6,8,11,13,38,39,52,58,61,94,95,101,102], linear discriminant analysis (LDA) [6,101], markov models [21,56,57,70,101], multiple instance learning (diversity density [39,59,60], diversity density SVM [38,52]), maximum likelihood [20,21,37,70], sequential probability ratio test [60], logistic regression [8], perceptron [101], gaussian mixture model (GMM) [102], deep neural networks [22,61,62,96], LSTM [93][94][95][96], GRU [20,63], random forest [11,26], CNN [63]. ...
... For acoustic, some examples include prosodic entrainment measures computed with the following similarity measures (1) square of correlation coefficient, (2) mutual information, and (3) mean of spectral coherence over pitch and energy between the sequential turns of partner A and partner B when there is turn change [56]. Another approach leverages principal component analysis (PCA) to compute both prosodic and spectral entrainment while providing information about the directionality of the entrainment [57,58]. For the visual modality, the Kullback-Leibler (KL) divergence of the features extracted from the head motion of the partners was used as the similarity measure for synchrony [102]. ...
Preprint
Full-text available
Couples' relationships affect the physical health and emotional well-being of partners. Automatically recognizing each partner's emotions could give a better understanding of their individual emotional well-being, enable interventions and provide clinical benefits. In the paper, we summarize and synthesize works that have focused on developing and evaluating systems to automatically recognize the emotions of each partner based on couples' interaction or conversation contexts. We identified 28 articles from IEEE, ACM, Web of Science, and Google Scholar that were published between 2010 and 2021. We detail the datasets, features, algorithms, evaluation, and results of each work as well as present main themes. We also discuss current challenges, research gaps and propose future research directions. In summary, most works have used audio data collected from the lab with annotations done by external experts and used supervised machine learning approaches for binary classification of positive and negative affect. Performance results leave room for improvement with significant research gaps such as no recognition using data from daily life. This survey will enable new researchers to get an overview of this field and eventually enable the development of emotion recognition systems to inform interventions to improve the emotional well-being of couples.
... Some studies have reported that attunement of vocal parameters between therapist and patient occur during effective therapy (Rocco et al., 2017;Spivack, 1996) and can facilitate the communication of certain therapeutic contents (Wieder & Wiltshire, 2020). Besides, vocal entrainment is higher in situations with positive affect (Lee et al., 2011). Another study found a negative impact of vocal synchrony on depression severity (Reich et al., 2014). ...
Article
Research indicates an effect of nonverbal synchrony on the therapeutic relationship and patients' symptom severity within psychotherapy. However, vocal synchrony research is still rare and inconsistent. This study investigates the relationship between vocal synchrony and outcome/attachment dimensions, controlling for therapeutic alliance and movement synchrony. Our sample consisted of 64 patients with social anxiety disorder. Symptom severity was assessed with the Liebowitz Social Anxiety Scale and the Inventory of Interpersonal Problems, whereas attachment was assessed with the Experiences in Close Relationships Questionnaire at the beginning and end of therapy. Therapeutic alliance was measured with the Helping Alliance Questionnaire II. We determined vocal synchrony of the median and range of the fundamental frequency (f 0) by correlating f 0 values of manually segmented speaker turns. Movement synchrony was assessed via motion energy and time-series analyses. Patient- and therapist-led synchrony was differentiated. Statistical analyses were performed using mixed effects linear models. Vocal synchrony had a negative impact on outcome. Higher vocal synchrony led to higher symptom severity (if the patient led synchrony, at the end of therapy) as well as attachment anxiety, avoidance, and interpersonal problems at the end of therapy. Predicting attachment anxiety, the effect of therapist-led vocal synchrony went beyond the effect of therapeutic alliance and movement synchrony. High vocal synchrony may arise due to a lack of autonomy in social anxiety disorder patients or might reflect attempts to repair alliance ruptures. The results indicate that vocal synchrony and movement synchrony have different effects on treatment outcome. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
... In contrast, the current task does not use absolute duration features and has different task goals, and thus requires different methodology. Finally, this analysis complements other studies of human behavior [11,12,13]; like some such studies [14], we plan to examine cognitive processes of children with autism using collected RAN data, now that methods to establish normal patterns have begun to be explored. ...
... Some leveraged interaction dynamics among the partners (e.g. entrainment -synchrony between partners) [2,28,29] and salient instances [16,17,26] to perform recognition. These works tend to use emotion labels from external raters rather than the couples and hence do not reflect the subjective emotions of the couples. ...
Conference Paper
Full-text available
Extensive couples’ literature shows that how couples feel after a conflict is predicted by certain emotional aspects of that conver- sation. Understanding the emotions of couples leads to a better understanding of partners’ mental well-being and consequently their relationships. Hence, automatic emotion recognition among couples could potentially guide interventions to help couples im- prove their emotional well-being and their relationships. It has been shown that people’s global emotional judgment after an experience is strongly influenced by the emotional extremes and ending of that experience, known as the peak-end rule. In this work, we leveraged this theory and used machine learning to investigate, which au- dio segments can be used to best predict the end-of-conversation emotions of couples. We used speech data collected from 101 Dutch- speaking couples in Belgium who engaged in 10-minute long con- versations in the lab. We extracted acoustic features from (1) the audio segments with the most extreme positive and negative rat- ings, and (2) the ending of the audio. We used transfer learning in which we extracted these acoustic features with a pre-trained convolutional neural network (YAMNet). We then used these fea- tures to train machine learning models — support vector machines — to predict the end-of-conversation valence ratings (positive vs negative) of each partner. The results of this work could inform how to best recognize the emotions of couples after conversation- sessions and eventually, lead to a better understanding of couples’ relationships either in therapy or in everyday life.
... Such methods could provide automatic and objective tools to study interactive abilities in several psychiatric conditions, such as depression and autism. Although few studies are currently available in this specific field, they appear to be very promising: couple therapy (Lee et al., 2011), success in psychotherapy (Ramseyer and Tschacher, 2011), mother-infant interaction (Cohn, 2010). Another great potential lies in the opportunity to build robots or virtual agents with interactive abilities (Gratch et al., 2007;Al Moubayed et al., 2009;Prepin and Pelachaud, 2011;Boucenna et al., 2014). ...
... Humans entrain to each other in multiple aspects of speech, including acoustic, phonetic, lexical, and syntactic features [13,20,19]. Entrainment has been found to be associated with a variety of conversational qualities and social behaviors, e.g., liking, social attractiveness, positive affect, approval seeking, dialogue success, and task success [20,22,10,2]. Acoustic and lexical entrainment has been imple- mented in Spoken Dialogue Systems (SDS) in several studies which have shown improvement in rapport, naturalness, and overall performance of the system [15,18,16,12]. Indeed, implementing an entraining SDS is important to improve these systems' performance and quality, measured by user perceptions. ...
Chapter
Linguistic entrainment, the tendency of interlocutors to become similar to each other during spoken interaction, is an important characteristic of human speech. Implementing linguistic entrainment in spoken dialogue systems helps to improve the naturalness of the conversation, likability of the agents, and dialogue and task success. The first step toward implementation of such systems is to design proper measures to quantify entrainment. Multi-party entrainment and multi-party spoken dialogue systems have received less attention compared to dyads. In this study, we analyze an existing approach of extending pair measures to team-level entrainment measures, which is based on simple averaging of pairs. We argue that although simple averaging is a good starting point to measure team entrainment, it has several weaknesses in terms of capturing team-specific behaviors specifically related to convergence.
Article
Full-text available
Social robots such as learning companions, therapeutic assistants, and tour guides are dependent on the challenging task of establishing a rapport with their users. People rarely communicate with just words alone; facial expressions, gaze, gesture, and prosodic cues like tone of voice and speaking rate combine to help individuals express their words and convey emotion. One way that individuals communicate a sense of connection with one another is entrainment, where interaction partners adapt their way of speaking, facial expressions, or gestures to each other; entrainment has been linked to trust, liking, and task success and is thought to be a vital phenomenon in how people build rapport. In this work, we introduce a social robot that combines multiple channels of rapport-building behavior, including forms of social dialog and prosodic entrainment. We explore how social dialog and entrainment contribute to both self-reported and behavioral rapport responses. We find prosodic adaptation enhances perceptions of social dialog, and that social dialog and entrainment combined build rapport. Individual differences indicated by gender mediate these social responses; an individual’s underlying rapport state, as indicated by their verbal rapport behavior, is exhibited and triggered differently depending on gender. These results have important repercussions for assessing and modeling a user’s social responses and designing adaptive social agents.
Article
The tendency of conversation partners to adjust to each other to become similar, known as entrainment, has been studied for many years. Several studies have linked differences in this behavior to gender, but with inconsistent results. We analyze individual differences in two forms of local, acoustic-prosodic entrainment in two large corpora between English and Chinese native speakers conversing in English. The few previous studies of the effect of non-nativeness on entrainment that exist were based on much smaller numbers of speakers and focused on perceptual rather than acoustic measures. We find considerable variation in both degree and valence of entrainment behavior across speakers with some consistent trends, such as synchronous behavior being mostly positive in direction and somewhat more prevalent than convergence. However, we do not find entrainment to vary significantly based on gender, native language, or their combination. Instead, we propose as a hypothesis for further study, that gender mediates more complex interactions between sociocultural norms, conversation context, and other factors.
Conference Paper
Full-text available
Long speech-text alignment can facilitate large-scale study of rich spoken language resources that have recently become widely accessible, e.g., collections of audio books, or multime- dia documents. For such resources, the conventional Viterbi- based forced alignment may often be proven inadequate mainly due to mismatched audio and text and/or noisy audio. In this paper, we present SailAlign which is an open-source software toolkit for robust long speech-text alignment that circumvents these restrictions. It implements an adaptive, iterative speech recognition and text alignment scheme that allows for the pro- cessing of very long (and possibly noisy) audio and is robust to transcription errors. SailAlign is evaluated on artificially cre- ated long chunks of the TIMIT database. Audio is artificially contaminated with babble noise, and the corresponding tran- scriptions are corrupted at various levels. We present the corre- sponding word boundary detection results. Finally, we demon- strate the potential use of the software for the exploitation of audio books for the study of read speech.
Article
Full-text available
Interactional synchrony refers to the coordination of movements between individuals in both timing and form during interpersonal communication. Most previous studies in Western culture used a coding methodology and concluded that interactional synchrony occurred for positive episodes but not for negative episodes (e.g., Charny, E. J. (1966). Psychosomatic Medicine, 28, 305–315). In this study, we examined interactional synchrony using a between-participants pseudosynchrony experimental paradigm (Bernieri, F. J., & Rosenthal, R. (1991). In R. S. Feldman & B. Rime (Eds.), Fundamentals of nonverbal behavior (pp. 401–432). New York: Cambridge University Press). Sixty Japanese female university students viewed interaction clips and judged the level of perceived synchrony. The results show that interactional synchrony was perceived in negative episodes as well as in positive episodes. The degree of perceived synchrony was higher in positive episodes than in negative episodes.
Conference Paper
Full-text available
Recently there has been an increase in efforts in Behavioral Signal Processing (BSP), that aims to bring quantitative analysis using signal processing techniques in the domain of observational coding. Currently observational coding in fields such as psychology is based on subjective expert coding of abstract human interaction dynamics. In this work, we use a Multiple Instance Learning (MIL) framework, a saliencybased prediction model, with a signal-driven vocal entrainment measure as the feature to predict the affective state of a spouse in problem solving interactions. We generate 18 MIL classifiers to capture the variablelength saliency of vocal entrainment, and a cross-validation scheme with maximum accuracy and mutual information as the metric to select the best performing classifier for each testing couple. This method obtains a recognition accuracy of 53.93%, a 2.14% (4.13% relative) improvement over baseline model using Support Vector Machine. Furthermore, this MIL-based framework has potential for identifying meaningful regions of interest for further detailed analysis of married couples interactions.
Conference Paper
Full-text available
Interaction synchrony among interlocutors happens naturally as people adapt their speaking style gradually to promote efficient communication. In this work, we quantify one aspect of interaction synchrony - prosodic entrainment, specifically pitch and energy, in married couples' problem-solving interactions using speech signal-derived measures. Statistical testings demonstrate that some of these measures capture useful information; they show higher values in interactions with couples having high positive attitude compared to high negative attitude. Further, by using quantized entrainment measures employed with statistical symbol sequence matching in a maximum likelihood framework, we obtained 76% accuracy in predicting positive affect vs. negative affect.
Article
Full-text available
This paper reviews recent progress in the diagnosis of Alzheimer's disease (AD) from electroencephalograms (EEG). Three major effects of AD on EEG have been observed: slowing of the EEG, reduced complexity of the EEG signals, and perturbations in EEG synchrony. In recent years, a variety of sophisticated computational approaches has been proposed to detect those subtle perturbations in the EEG of AD patients. The paper first describes methods that try to detect slowing of the EEG. Next the paper deals with several measures for EEG complexity, and explains how those measures have been used to study fluctuations in EEG complexity in AD patients. Then various measures of EEG synchrony are considered in the context of AD diagnosis. Also the issue of EEG pre-processing is briefly addressed. Before one can analyze EEG, it is necessary to remove artifacts due to for example head and eye movement or interference from electronic equipment. Pre-processing of EEG has in recent years received much attention. In this paper, several state-of-the-art pre-processing tech- niques are outlined, for example, based on blind source separation and other non-linear filtering paradigms. In addition, the paper outlines opportunities and limitations of computational approaches for diagnosing AD based on EEG. At last, future challenges and open problems are discussed.
Article
Full-text available
The goal of this investigation was to identify microlevel processes in the support provider that may foster or inhibit the provision of spousal support. Specifically, the authors focused on (a) how emotional similarity between the support provider and support seeker and (b) how empathic accuracy of the support provider relate to support provision in marriage. In a laboratory experiment, 30 couples were randomly assigned to 1 of 2 conditions (support provider: man vs. woman) of a factorial design. The couples provided questionnaire data and participated in a social support interaction designed to assess behaviors when offering and soliciting social support. A video-review task was used to assess emotional similarity and empathic accuracy during the support interaction. As expected, greater similarity between the support provider's and support seeker's emotional responses, as well as more accurate insights into the support-seeking spouse's thoughts and feelings were found to be predictive of more skilful support (i.e., higher levels of emotional and instrumental support and lower levels of negative types of support).
Article
Seventy-three couples were studied at 2 time points 4 years apart. A typology of 5 groups of couples is proposed on the basis of observational data of Time 1 resolution of conflict, specific affects, and affect sequences. Over the 4 years, the groups of couples differed significantly in serious considerations of divorce and in the frequency of divorce. There were 3 groups of stable couples: validators, volatiles, and avoiders, who could be distinguished from each other on problem-solving behavior, specific affects, and persuasion attempts. There were 2 groups of unstable couples: hostile and hostile/detached, who could be distinguished from each other on problem-solving behavior and on specific negative and positive affects. A balance theory of marriage is proposed, which explores the idea that 3 distinct adaptations exist for having a stable marriage.
Article
Seventy-three couples were studied at 2 time points 4 years apart. A typology of 5 groups of couples is proposed on the basis of observational data of Time 1 resolution of conflict, specific affects, and affect sequences. Over the 4 years, the groups of couples differed significantly in serious considerations of divorce and in the frequency of divorce. There were 3 groups of stable couples: validators, volatiles, and avoiders, who could be distinguished from each other on problem-solving behavior, specific affects, and persuasion attempts. There were 2 groups of unstable couples: hostile and hostile/detached, who could be distinguished from each other on problem-solving behavior and on specific negative and positive affects. A balance theory of marriage is proposed, which explores the idea that 3 distinct adaptations exist for having a stable marriage.