Conference PaperPDF Available

Speech Adaptation and Physiological Responses: A Study on f0 and Skin Temperature

Authors:

Abstract and Figures

This study investigates f0 adaptation, skin temperature change, and the relationship between the two. While a growing number of studies have demonstrated that emotional reactions in humans lead to changes in their facial skin temperature, none of them have studied temperature change in conversational contexts. Here, we have tested whether a conversation's degree of intimacy influences emotion such that it affects facial temperature and f0 adaptation-in terms of entrainment to interlocutor and f0 change due to the conversation topic. We also ask whether temperature change and f0 adaptation are related. In our data set of 38 participants in a between-subjects design, few speakers aligned on f0 to their partner, with no identifiable patterns. Regardless of their interlocutor, however, the speakers' f0 median and standard deviation tended to decrease when they spoke about more personal topics. This adds to previous literature describing emotional speech prosody. The participants' nose temperature was modulated by social emotion, but there was no relationship between temperature change and f0 adaptation. This suggests that, although the participants' nose temperature was sensitive to the social dynamic, the emotional reactions driving thermal change do not seem to be the same leading to prosodic adaptation.
Content may be subject to copyright.
Speech Adaptation and Physiological Responses:
A Study on f0 and Skin Temperature
Tom Offrede1,2, Christine Mooshammer1, Alessandro D’Ausilio2,3, Susanne Fuchs4
1Humboldt-Universit¨
at zu Berlin, Germany
2Italian Institute of Technology, Italy
3University of Ferrara, Italy
4Leibniz-Centre General Linguistics (ZAS), Germany
offredet@hu-berlin.de, christine.mooshammer@hu-berlin.de, alessandro.dausilio@iit.it,
fuchs@leibniz-zas.de
Abstract
This study investigates f0 adaptation, skin temperature
change, and the relationship between the two. While a growing
number of studies have demonstrated that emotional reactions
in humans lead to changes in their facial skin temperature, none
of them have studied temperature change in conversational con-
texts. Here, we have tested whether a conversation’s degree of
intimacy influences emotion such that it affects facial temper-
ature and f0 adaptation—in terms of entrainment to interlocu-
tor and f0 change due to the conversation topic. We also ask
whether temperature change and f0 adaptation are related. In
our data set of 38 participants in a between-subjects design, few
speakers aligned on f0 to their partner, with no identifiable pat-
terns. Regardless of their interlocutor, however, the speakers’
f0 median and standard deviation tended to decrease when they
spoke about more personal topics. This adds to previous liter-
ature describing emotional speech prosody. The participants’
nose temperature was modulated by social emotion, but there
was no relationship between temperature change and f0 adap-
tation. This suggests that, although the participants’ nose tem-
perature was sensitive to the social dynamic, the emotional reac-
tions driving thermal change do not seem to be the same leading
to prosodic adaptation.
Index Terms: speech adaptation, physiology, temperature
change, entrainment
1. Introduction
The last few decades have seen an increase in studies show-
ing the effect of emotional reactions on the skin temperature
of the face. This happens because, as humans produce micro-
expressions, there is an increase in blood flow towards the acti-
vated muscles [1]. Researchers have even tried to map specific
emotions to thermal changes in certain regions of the face [2].
These studies have generally used a highly controlled ex-
perimental setting. For instance, the participants in [3] were
exposed to certain images on a computer screen meant to elicit
specific emotions; their temperature in different regions of in-
terest (ROIs) were then compared to that at a baseline level.
The researchers could determine that the participants’ nose be-
came warmer after they saw a positive-valence and high-arousal
picture, and, in another context, that it became colder during ex-
periences of empathy (of both joy and pain). Interestingly, joy
and laughter have been associated to colder nose temperatures
in children [4].
Some work has also used more naturalistic settings. Hahn
and colleagues [5] measured individuals’ facial temperature be-
fore and while they had physical contact with an experimenter.
They observed that the nose, mouth, and eyes became warmer
during touch, and that touches on the face and chest elicited
larger thermal changes than touches on the palm and outer arm.
The authors attributed these changes to sexual arousal. Other
work comes from [6], who found that temperature increased in
the nose, forehead, and cheeks, and decreased in the eyes, after
participants received a compliment from a confederate.
While these studies are informative, none have investigated
thermal change during a naturalistic interaction, where both in-
terlocutors are na¨
ıve participants. The present study aims to
fill this gap by investigating thermal and speech data stemming
from a highly natural conversational context.
In such a conversational context, speech production is ex-
pected to be highly variable, since prosody often carries social
and emotional information. Some researchers have described
what acoustic features in speech are associated with different
emotions and attitudes. For example, when comparing f0 con-
tours of emotional and neutral utterances, Paeschke and col-
leagues [7] demonstrated that in happy utterances, the average
f0 is higher overall and that f0 rises earlier in the sentence, in
comparison with a more neutral contour. In the case of sad and
bored speech, the contour is much more linear, with f0 chang-
ing little. Liscombe and colleagues [8] have found a positive
association between f0 mean and anxiousness, confidence, and
happiness. They also observed a negative association between
average f0 and sadness and boredom. Other work using differ-
ent methods has often reached similar findings [9, 10].
In addition to emotion effects, speech production is also so-
cially sensitive. The Audience Design Theory [11] asserts that
individuals modify their speaking styles according to who is lis-
tening to them. It also argues that speakers associate speech
styles to certain interlocutors, and that they also associate in-
terlocutors to specific topics. Hence, when talking about said
topics, the person might use the speaking style associated with
the (non-present) interlocutor. Indeed, there is some evidence
that different topics may elicit distinct speaking styles [12].
Another way that social relationships affect speech is
through entrainment behavior. Language entrainment is the rep-
etition of a linguistic item that was produced earlier by one’s
interlocutor. It has been demonstrated to occur on multiple lev-
els, such as lexical [13] and syntactic [14]. On the phonetic
level (e.g., various f0 measures), a speaker’s speech production
may become more similar to that of their interlocutor through-
out their interaction [15]. Importantly, this effect is often subtle
and highly variable [16, 17, 18].
Multiple theories offer explanations as to why linguistic en-
trainment happens. One of the most noteworthy is the Commu-
nication Accommodation Theory (CAT [19]), which argues that
entrainment (or accommodation, in CAT’s terms) is a commu-
nicative tool that can help manage social distance. It can involve
convergence (increase in similarity to the speech of an inter-
locutor), divergence (decrease in similarity), and even mainte-
nance (absence of change in relation to the interlocutor). Many
factors would play a role in this behavior, such as past history
between the interlocutors, intergroup dynamics, and the inter-
locutors’ perception of each other.
Given the many social and emotional factors that influence
speech and temperature change, we explored whether the same
social-emotional mechanisms influence both thermal change
and speech (specifically f0) adaptation. Concretely, we tested
whether two different conversational contexts—a personal and
an impersonal one—would have an effect on f0 adaptation and
temperature change. We asked the following questions.
1. Do speakers adapt f0 in different contexts?
(a) In the form of entrainment to their interlocutor
(b) In the form of f0 change depending on the conver-
sation topic
2. Do speakers change their facial temperature differently
in different contexts?
3. Are f0 adaptation and temperature change related?
2. Method
2.1. Participants & design
We recorded the speech and facial temperature of 19 same-
gender dyads of unacquainted speakers (34 female, four male)
during a naturalistic interaction in their L1, Italian. Their ages
ranged from 18 to 32 (M= 21.71, SD = 2.34).
The experiment was organized in two sections. First, they
had a semi-structured conversation following a list of questions.
In a between-speaker setting, half of the dyads received a list of
impersonal questions (the Impersonal condition; e.g., What kind
of phone do you have?), while the other half discussed ques-
tions adapted from the Relationship Closeness Induction Task
[20] (the Personal condition). In this task, the partners must
discuss questions that start out less personal (e.g., How old are
you?) and build up in intimacy (e.g., What is your happiest
early childhood memory?). This task has been demonstrated to
increase feelings of closeness between strangers [20]. In order
to avoid overly emotional reactions from the participants, we
changed [20]’s questions of more negative valence for positive-
valence ones that also scored high in intimacy according to [21].
The participants were instructed to follow the questions in
the presented order and to use them as prompts to have a natu-
ral conversation, with both speakers talking about all questions.
The task lasted 10 minutes in both conditions, and each dyad
varied on how many questions they discussed. The second part
of the experiment consisted of the Diapix task [22], but this
study only analyzes data from the experiment’s first section.
The participants from both conditions also answered [20]’s
post-interaction questionnaire to measure how they perceived
their partner. The four crucial items, rated on a 1–9 Likert
scale, referred to (a) how close and (b) how similar the par-
ticipant felt to their partner, (c) how much they liked them, and
(d) to what extent they felt like they could be friends in the fu-
ture. Linear regression models revealed that participants in the
Personal condition, as compared to the Impersonal condition,
Figure 1: An example of the manual ROI annotation of the ther-
mal image. Due to data protection, the figure does not represent
research participants, but one experimenter and a collaborator.
reported feeling closer to their partners by about 1.51 points
(F(1,36) = 7.17, p = 0.01) and reported liking their partners
more by about 1.12 points (F(1,36) = 6.47, p = 0.01).
2.2. Setup & apparatus
The participants were seated at a table next to each other,
each facing a computer screen and a unidirectional microphone
(AKG 1000s). A divider was placed between the speakers, pre-
venting them from seeing each other. A FLIR thermal cam-
era (model SC7600) was placed around 2 meters in front of the
speakers, capturing both faces in one frame (see Figure 1).
2.3. Data processing
2.3.1. Speech data
The speakers’ turns (beginning and end) were manually anno-
tated using Praat [23]. Then, with its default parameters, we
used Praat to automatically annotate voice activity such that the
shortest possible period of silence (pause) between two intervals
with vocal activity was 150 ms. Periods of vocal activity were
called Interpausal Units (IPU). Next, using autocorrelation and
Praat’s default settings, the f0 median and standard deviation
(SD) of each IPU was calculated. Since sometimes both speak-
ers’ voices were captured in a single microphone, all periods
containing overlapping speech between partners were excluded
from the analysis.
2.3.2. Temperature data
The thermal data was annotated manually using Altair software
(CEDIP Infrared Systems), which enables processing of ther-
mal recordings. Approximately every minute, one frame was
chosen where both speakers were in a position that conferred
good visibility to the four ROIs—forehead, eyes, nose, and
cheeks. Lines were manually drawn on each face around the
ROIs, obtaining the temperature (in °C) of each pixel in each
ROI. Then, in R (version 4.2.3) [24], we obtained the mean tem-
perature of the 10% hottest pixels in each ROI at each time point
[1]. Selecting only the 10% hottest pixels ensures that artifacts
are excluded from the calculation of the mean temperature, such
as potential strands of hair in front of the skin. Figure 1 shows
an example of a frame of the thermal recordings.
3. Results
3.1. Do speakers adapt f0 in different contexts?
3.1.1. Entrainment to interlocutor
Speakers’ entrainment to their partner was measured following
[15]’s method for measurement of local convergence. The f0
value of the first IPU of each turn of a speaker is subtracted from
the f0 of the last IPU of the previous turn by their interlocutor.
Then, a Pearson correlation is run between time and the abso-
lute difference in f0 between these adjacent IPUs. A negative
significant effect indicates convergence to one’s partner, and a
positive significant effect indicates divergence. To ensure that
small pvalues are not due to sample size, the same procedure
is done using the values from 10 non-adjacent IPUs. Actual en-
trainment is assumed when no more than one of the correlations
using randomly ordered data is significant. For details, see [15].
This measurement revealed that only four speakers con-
verged to or diverged from their partner: two on f0 median
and two on f0 SD. Table 1 shows the detailed results. Three of
these speakers interacted in the Impersonal condition, and one
in the Personal condition. Speakers A and B were in different
dyads; speakers C and D, however, belonged to the same dyad.
Interestingly, although both converged to each other, they did
so on different f0 measures. Since there were few effects, no
relationship to temperature change can be investigated.
Table 1: The four speakers that showed entrainment behavior
to their partner.
Speaker Condition Behavior Feature
A Personal Divergence f0 median
B Impersonal Divergence f0 SD
C Impersonal Convergence f0 median
D Impersonal Convergence f0 SD
3.1.2. F0 change & conversation topic
In order to measure participants’ f0 adaptation to the conver-
sation topic (i.e., intimacy of the conversation as explained
above), we fit mixed-effects linear regression models with the
lmerTest R package [25] where the dependent variable is the
z-scored f0 values and the fixed effects are the interaction be-
tween time and condition (Personal vs. Impersonal). Since the
intimacy of the questions did not change in the Impersonal con-
dition but increased in the Personal condition, we assume that
time indicates the progression from less to more personal topics.
These regression models also included random intercepts for
speaker, random slopes for time per speaker, and the interaction
between the two. We also compared the AIC of these models
with the corresponding models without condition as an inter-
acting effect. The more complex model (i.e., including condi-
tion as an effect) was considered a better fit only if its AIC was
smaller than that of the simpler model by at least 2 [26].
Both f0 median and f0 SD tended to decrease across time
in the Personal, but not Impersonal, condition. Figure 2 vi-
sualizes this result for f0 median, and f0 SD followed the same
pattern. F0 median decreased significantly in the Personal con-
dition (B=0.38, SE = 0.12, t =3.23, p < 0.01),
but not in the Impersonal condition (B=0.09, SE =
0.13, t =0.74, p = 0.47). Similarly, f0 SD decreased
significantly in the Personal condition (B=0.4, SE =
Figure 2: The difference between conditions in f0 median
change across time. The change in f0 SD followed a similar
pattern.
0.08, t =4.97, p < 0.001), but not in the Impersonal con-
dition (B= 0.1, SE = 0.1, t = 1.09, p = 0.28).
We hypothesized that f0 decrease could be due to modula-
tions in speech intensity: the speakers might lower their voice
to talk about more intimate topics, thus reducing f0. Although
we did observe a reduction in intensity over the course of the
task, it happened equally in both conditions. This suggests that
f0 decrease was indeed due to the conversational context.
3.2. Do speakers change their facial temperature?
To test if there was a relationship between temperature change
and social emotion, we fit a model with temperature as a de-
pendent variable and time as a predictor in each ROI and for
each participant. We thus determined, for each ROI, whether
each participant’s temperature increased, decreased, or did not
change significantly. We then fit multinomial logistic regression
models with the nnet R package [27]. Here, the temperature
change effect was the dependent variable (increase,decrease,
no change), and the experimental condition (Personal vs. Im-
personal) was the predictor. These models did not reveal any
significant effects: the participants’ facial temperature change
did not depend on the condition they were in.
We then investigated whether using the participants’ ratings
of how they perceived their partner would give a more in-depth
insight. Again, we fit multinomial logistic regression mod-
els with the temperature change effect as the dependent vari-
able, but with participants’ questionnaire ratings as the predic-
tors (closeness, similarity, degree of liking, likelihood of future
friendship—see method above).
As can be seen in Figure 3, the participants’ nose temper-
ature rose more often when they expressed a higher like-
lihood that they would become friends with their partner
(B= 0.63, SE = 0.26, p = 0.04). There were no significant
effects for any other ROIs or questionnaire items. It is note-
worthy, though, that additional effects emerged in the section of
the experiment not being reported here (i.e., during the Diapix
task). These data will be discussed elsewhere.
3.3. Are f0 adaptation and temperature change related?
To analyze the effect of temperature on f0 change, we fit a sim-
ilar model in which the dependent variable was the z-scored
f0 values and the predictor was the interaction between time
Figure 3: Frequency in the data set of nose temperature in-
creases, decreases, and non-significant changes, in relation to
the participant’s self-reported likelihood that they would be-
come friends with their partner.
and temperature change. Temperature change was the differ-
ence in temperature (in °C) between the beginning and end of
the conversational task. We fit this model for each type of f0
measure (median and SD) and for each ROI (forehead, nose,
eyes, cheeks). We also compared these models to their equiva-
lent without temperature change as a predictor; the more com-
plex model was preferred if its AIC was smaller than that of its
counterpart by at least 2. Since many tests of significance were
performed, we conducted a Bonferroni correction. After AIC
comparison and Bonferroni correction, no effects could be es-
tablished between temperature change in any ROIs and f0
change during the experiment section under analysis.
4. Discussion
In our data set, only four speakers (out of 38) showed entrain-
ment behavior to their partner. Among the ones that did, it
was not possible to establish a pattern between entrainment type
(convergence vs. divergence), experimental condition (Personal
vs. Impersonal), and feature entrained on (f0 median and SD).
Previous work has indeed observed that phonetic entrainment
is often a subtle effect with large interindividual variability
[16, 17, 18]. Even the method chosen to measure convergence
has an impact on whether or not an effect emerges [28]. In the
present study, we could not investigate any possible relation-
ships between entrainment and thermal change since there were
few entrainment effects.
Although the speakers’ f0 was not much affected by that
of their interlocutor, they did seem to be influenced by the
conversation topic. The participants talking about increasingly
personal topics—i.e., in the Personal condition—tended to de-
crease both their f0 median and SD. Such a reduction has been
most often associated with feelings of sadness [7, 8, 9, 10], and
often also of boredom and fear [7, 8, 10]. This pattern is thus
curious in this data set, since the personal questions were not
sad. In fact, the original negative-valence questions from the
Relationship Closeness Induction Task [20] were changed so as
to be more positive, so sadness, fear, and boredom are unlikely
to be the predominant emotions here.
We speculate that speaking about more intimate topics with
a stranger may have led to lower and more monotone speech.
Following [11]’s Audience Design Theory, one could claim that
f0 was modulated by the conversation topic—here, varying in
intimacy. Bell [11] argues that individuals associate types of
topics to specific addressees. Since speakers also link certain
speaking styles to given interlocutors, they might extend the as-
sociation and use such speaking styles when talking about said
topics—even in the absence of the addressees. Thus, it would
seem that more personal topics tend to be spoken about with
lower and less variable f0 (as was preliminarily observed in
[16]). Alternatively, lower f0 production may not have been di-
rectly due to the intimacy of the conversation, but to the speak-
ers’ feelings about their partner. Indeed, as indicated above,
speakers in the Personal condition liked their partner more and
felt closer to them than speakers in the Impersonal condition. If
this explanation was accurate, we would expect f0 median and
SD to remain lower in the Personal condition throughout the rest
of the interaction. Although the data stemming from the second
half of the experiment is not reported here (see Method section),
no such trend was observed, which suggests that the difference
in f0 production was due to the conversational contexts.
Next, we asked whether the participants’ facial temperature
change was susceptible to the degree of intimacy of their con-
versation. Their temperature did not change differently in the
Personal vs. Impersonal condition. However, their nose did
tend to become warmer more often when the speaker indicated
a higher likelihood that they would become friends with their
partner.
One could speculate what specific emotions might be at
play here. For example, [5] have argued that thermal increases
in the nose are linked to sexual arousal, which could be inter-
preted in this case more generally as a type of platonic “social”
arousal. In fact, following Salazar and colleagues’ [3] interpre-
tation, the temperature increase could be related more gener-
ally to higher emotional arousal. Interestingly, though, the fre-
quency of thermal decreases in the nose also tended to be higher
when the participant reported higher likelihood of future friend-
ship, despite this effect not reaching statistical significance (see
Figure 3). This effect would be in line with [4], who reported
lower nose temperatures during experiences of joy, or with [3],
who associated this pattern with empathy feelings. These nu-
anced findings highlight the complex relationship between emo-
tion and skin temperature. It is not this study’s goal to attribute
temperature changes to specific emotions, but rather to investi-
gate possible thermal patterns in naturalistic social interactions.
Finally, we also tested if there was a link between temper-
ature change and f0 adaptation. Such a relationship could not
be established. This suggests that the emotions driving thermal
modulation are not the same leading to speech change—at least
during the section of the experiment analyzed here. Whether
this type of relationship emerged later in the experiment, when
the participants had already established what they felt about
each other, will be explored elsewhere.
5. Conclusion
This study has shown that speakers modulate their f0 production
in accordance with the conversation topics, and that their facial
temperature is sensitive to how they feel towards their conversa-
tion partner. Both findings add to previous literature describing
emotional speech and emotion-related thermal changes. How-
ever, no direct relationship between f0 and temperature changes
was found. These findings on thermal imaging and speech in
social interaction are only the first ones in this domain and open
new avenues for future research.
6. Acknowledgements
The authors would like to thank Marco Bilato for his help with
data collection and Melina Pfundstein, Lara Burchardt, and Phil
Hoole for their help with data annotation. COBRA is a Euro-
pean project funded by the European Union’s Horizon 2020 re-
search and innovation programme under the Marie Skłodowska-
Curie grant agreement 859588.
7. References
[1] D. T. Robinson, J. Clay-Warner, C. D. Moore, T. Everett, A. Watts,
T. N. Tucker, and C. Thai, “Toward an unobtrusive measure
of emotion during interaction: Thermal imaging techniques,” in
Biosociology and neurosociology. Emerald Group Publishing
Limited, 2012, vol. 29, pp. 225–266.
[2] S. Ioannou, V. Gallese, and A. Merla, “Thermal infrared imaging
in psychophysiology: potentialities and limits,” Psychophysiol-
ogy, vol. 51, no. 10, pp. 951–963, 2014.
[3] E. Salazar-L´
opez, E. Dom´
ınguez, V. J. Ramos, J. De la Fuente,
A. Meins, O. Iborra, G. G´
alvez, M. Rodr´
ıguez-Artacho, and
E. G´
omez-Mil´
an, “The mental and subjective skin: Emotion, em-
pathy, feelings and thermography,” Consciousness and cognition,
vol. 34, pp. 149–162, 2015.
[4] R. Nakanishi and K. Imai-Matsumura, “Facial skin temperature
decreases in infants with joyful expression, Infant Behavior and
Development, vol. 31, no. 1, pp. 137–144, 2008.
[5] A. C. Hahn, R. D. Whitehead, M. Albrecht, C. E. Lefevre, and
D. I. Perrett, “Hot or not? Thermal reactions to social contact,
Biology letters, vol. 8, no. 5, pp. 864–867, 2012.
[6] S. Ioannou, P. H. Morris, M. Baker, V. Reddy, and V. Gallese,
“Seeing a blush on the visible and invisible spectrum: a func-
tional thermal infrared imaging study, Frontiers in Human Neu-
roscience, vol. 11, p. 525, 2017.
[7] A. Paeschke, M. Kienast, W. F. Sendlmeier et al., “F0-contours
in emotional speech,” in Proc. 14th Int. Congress of Phonetic Sci-
ences, vol. 2, 1999, pp. 929–932.
[8] J. Liscombe, J. Venditti, and J. B. Hirschberg, “Classifying sub-
ject ratings of emotional speech using acoustic features,” in Proc.
Eurospeech, 2003, pp. 725–728.
[9] A. Paeschke and W. F. Sendlmeier, “Prosodic characteristics
of emotional speech: Measurements of fundamental frequency
movements, in Isca tutorial and research workshop (itrw) on
speech and emotion, 2000.
[10] J. Tao, Y. Kang, and A. Li, “Prosody conversion from neu-
tral speech to emotional speech,” IEEE transactions on Audio,
Speech, and Language processing, vol. 14, no. 4, pp. 1145–1154,
2006.
[11] A. Bell, “Language style as audience design,” Language in soci-
ety, vol. 13, no. 2, pp. 145–204, 1984.
[12] T. Devlin, P. French, and C. Llamas, “Vowel change across time,
space, and conversational topic: the use of localized features in
former mining communities,” Language Variation and Change,
vol. 31, no. 3, pp. 303–328, 2019.
[13] H. P. Branigan, M. J. Pickering, and A. A. Cleland, “Syntactic co-
ordination in dialogue,” Cognition, vol. 75, no. 2, pp. B13–B25,
2000.
[14] A. Tobar-Henr´
ıquez, H. Rabagliati, and H. P. Branigan, “Lexi-
cal entrainment reflects a stable individual trait: Implications for
individual differences in language processing. Journal of Exper-
imental Psychology: Learning, Memory, and Cognition, vol. 46,
no. 6, p. 1091, 2020.
[15] R. Levitan and J. Hirschberg, “Measuring acoustic-prosodic en-
trainment with respect to multiple levels and dimensions, in Pro-
ceedings of Interspeech 2011, 2011.
[16] T. Offrede, C. Mishra, G. Skantze, S. Fuchs, and C. Mooshammer,
“Do humans converge phonetically when talking to a robot?” in
Proceedings of the 20th International Congress of Phonetic Sci-
ences. Guarant International, 2023, pp. 3507—-3511.
[17] J. S. Pardo, A. Urmanche, S. Wilman, and J. Wiener, “Phonetic
convergence across multiple measures and model talkers, Atten-
tion, Perception, & Psychophysics, vol. 79, pp. 637–659, 2017.
[18] A. Weise, S. I. Levitan, J. Hirschberg, and R. Levitan, “Individual
differences in acoustic-prosodic entrainment in spoken dialogue,
Speech Communication, vol. 115, pp. 78–87, 2019.
[19] C. Gallois, T. Ogay, and H. Giles, “Communication accommoda-
tion theory: A look back and a look ahead,” in Theorizing about
intercultural communication. Thousand Oaks: Sage, 2005, pp.
121–148.
[20] C. Sedikides, W. K. Campbell, G. D. Reader, and A. J. Elliot, “The
relationship closeness induction task,” Representative Research in
Social Psychology, vol. 23, pp. 1–4, 1999.
[21] C. Mishra, T. Offrede, S. Fuchs, C. Mooshammer, and G. Skantze,
“Does a robot’s gaze aversion affect human gaze aversion?” Fron-
tiers in Robotics and AI, vol. 10, 2023.
[22] R. Baker and V. Hazan, “DiapixUK: task materials for the elicita-
tion of multiple spontaneous speech dialogs,” Behavior research
methods, vol. 43, pp. 761–770, 2011.
[23] P. Boersma and D. Weenink, “Praat: doing phonetics by com-
puter [computer program],” http://www.praat.org/, 2022, version
6.2, retrieved August 23, 2022.
[24] R Core Team, R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing, Vienna,
Austria, 2023. [Online]. Available: https://www.R-project.org/
[25] A. Kuznetsova, P. B. Brockhoff, and R. H. B. Christensen,
“lmerTest package: Tests in linear mixed effects models, Jour-
nal of Statistical Software, vol. 82, no. 13, pp. 1–26, 2017.
[26] M. R. Symonds and A. Moussalli, “A brief guide to model se-
lection, multimodel inference and model averaging in behavioural
ecology using akaike’s information criterion, Behavioral ecology
and sociobiology, vol. 65, pp. 13–21, 2011.
[27] W. N. Venables and B. D. Ripley, Modern Applied Statistics with
S, 4th ed. New York: Springer, 2002, iSBN 0-387-95457-0.
[Online]. Available: https://www.stats.ox.ac.uk/pub/MASS4/
[28] J. Kruyt, D. de Jong, A. D’Ausilio, and ˇ
S. Beˇ
nuˇ
s, “Measuring
prosodic entrainment in conversation: A review and comparison
of different methods,” Journal of Speech, Language, and Hearing
Research, vol. 66, no. 11, pp. 4280–4314, 2023.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Purpose This study aims to further our understanding of prosodic entrainment and its different subtypes by analyzing a single corpus of conversations with 12 different methods and comparing the subsequent results. Method Entrainment on three fundamental frequency features was analyzed in a subset of recordings from the LUCID corpus (Baker & Hazan, 2011) using the following methods: global proximity, global convergence, local proximity, local convergence, local synchrony (Levitan & Hirschberg, 2011), prediction using linear mixed-effects models (Schweitzer & Lewandowski, 2013), geometric approach (Lehnert-LeHouillier, Terrazas, & Sandoval, 2020), time-aligned moving average (Kousidis et al., 2008), HYBRID method (De Looze et al., 2014), cross-recurrence quantification analysis (e.g., Fusaroli & Tylén, 2016), and windowed, lagged cross-correlation (Boker et al., 2002). We employed entrainment measures on a local timescale (i.e., on adjacent utterances), a global timescale (i.e., over larger time frames), and a time series–based timescale that is larger than adjacent utterances but smaller than entire conversations. Results We observed variance in results of different methods. Conclusions Results suggest that each method may measure a slightly different type of entrainment. The complex implications this has for existing and future research are discussed.
Article
Full-text available
Gaze cues serve an important role in facilitating human conversations and are generally considered to be one of the most important non-verbal cues. Gaze cues are used to manage turn-taking, coordinate joint attention, regulate intimacy, and signal cognitive effort. In particular, it is well established that gaze aversion is used in conversations to avoid prolonged periods of mutual gaze. Given the numerous functions of gaze cues, there has been extensive work on modelling these cues in social robots. Researchers have also tried to identify the impact of robot gaze on human participants. However, the influence of robot gaze behavior on human gaze behavior has been less explored. We conducted a within-subjects user study (N = 33) to verify if a robot's gaze aversion influenced human gaze aversion behavior. Our results show that participants tend to avert their gaze more when the robot keeps staring at them as compared to when the robot exhibits well-timed gaze aversions. We interpret our findings in terms of intimacy regulation: humans try to compensate for the robot's lack of gaze aversion.
Conference Paper
Full-text available
Phonetic convergence-i.e., adapting one's speech towards that of an interlocutor-has been shown to occur in human-human conversations as well as human-machine interactions. Here, we investigate the hypothesis that human-to-robot convergence is influenced by the human's perception of the robot and by the conversation's topic. We conducted a within-subjects experiment in which 33 participants interacted with two robots differing in their eye gaze behavior-one looked constantly at the participant; the other produced gaze aversions, similarly to a human's behavior. Additionally, the robot asked questions with increasing intimacy levels. We observed that the speakers tended to converge on F0 to the robots. However, this convergence to the robots was not modulated by how the speakers perceived them or by the topic's intimacy. Interestingly, speakers produced lower F0 means when talking about more intimate topics. We discuss these findings in terms of current theories of conversational convergence.
Article
Full-text available
The tendency of conversation partners to adjust to each other to become similar, known as entrainment, has been studied for many years. Several studies have linked differences in this behavior to gender, but with inconsistent results. We analyze individual differences in two forms of local, acoustic-prosodic entrainment in two large corpora between English and Chinese native speakers conversing in English. The few previous studies of the effect of non-nativeness on entrainment that exist were based on much smaller numbers of speakers and focused on perceptual rather than acoustic measures. We find considerable variation in both degree and valence of entrainment behavior across speakers with some consistent trends, such as synchronous behavior being mostly positive in direction and somewhat more prevalent than convergence. However, we do not find entrainment to vary significantly based on gender, native language, or their combination. Instead, we propose as a hypothesis for further study, that gender mediates more complex interactions between sociocultural norms, conversation context, and other factors.
Article
Full-text available
Language use is intrinsically variable, such that the words we use vary widely across speakers and communicative situations. For instance, we can call the same entity refrigerator or fridge. However, attempts to understand individual differences in how we process language have made surprisingly little progress, perhaps because most psycholinguistic instruments are better-suited to experimental comparisons than differential analyses. In particular, investigations of individual differences require instruments that have high test-retest reliability, such that they consistently distinguish between individuals across measurement sessions. Here, we established the reliability of an instrument measuring lexical entrainment, or the tendency to use a name that a partner has used before (e.g., using refrigerator after a partner used refrigerator), which is a key phenomenon for the psycholinguistics of dialogue. Online participants completed two sessions of a picture matching-and-naming task, using different pictures and different (scripted) partners in each session. Entrainment was measured as the proportion of trials on which participants followed their partner in using a low-frequency name, and we assessed reliability by comparing entrainment scores across sessions. The estimated reliability was substantial, both when sessions were separated by minutes and when sessions were a week apart. These results suggest that our instrument is well-suited for differential analyses, opening new avenues for understanding language variability. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
So far blushing has been examined in the context of a negative rather than a positive reinforcement where visual displays of a blush were based on subjective measures. The current study used infrared imaging to measure thermal patterns of the face while with the use of a video camera quantified on the visible spectrum alterations in skin color related to a compliment. To elicit a blush a three-phase dialog was adopted ending or starting with a compliment on a female sample (N = 22). When the dialog ended with a compliment results showed a linear increase in temperature for the cheek, and forehead whereas for the peri-orbital region a linear decrease was observed. The compliment phase marked the highest temperature on the chin independent of whether or not the experiment started with a compliment contrary to other facial regions, which did not show a significant change when the experiment started with a compliment. Analyses on the visible spectrum showed that skin pigmentation was getting deep red in the compliment condition compared to the serious and social dialog conditions for both the forehead and the cheeks. No significant association was observed between temperature values and erythrocyte displays on the forehead and cheek. Heat is the physiological product of an arousing social scenario, however, preconceived notions about blushing propensity seem to drive erythrocyte displays and not necessarily conscious awareness of somatic sensations.
Article
Full-text available
This study consolidates findings on phonetic convergence in a large-scale examination of the impacts of talker sex, word frequency, and model talkers on multiple measures of convergence. A survey of nearly three dozen published reports revealed that most shadowing studies used very few model talkers and did not assess whether phonetic convergence varied across same- and mixed-sex pairings. Furthermore, some studies have reported effects of talker sex or word frequency on phonetic convergence, but others have failed to replicate these effects or have reported opposing patterns. In the present study, a set of 92 talkers (47 female) shadowed either same-sex or opposite-sex models (12 talkers, six female). Phonetic convergence was assessed in a holistic AXB perceptual-similarity task and in acoustic measures of duration, F0, F1, F2, and the F1 × F2 vowel space. Across these measures, convergence was subtle, variable, and inconsistent. There were no reliable main effects of talker sex or word frequency on any measures. However, female shadowers were more susceptible to lexical properties than were males, and model talkers elicited varying degrees of phonetic convergence. Mixed-effects regression models confirmed the complex relationships between acoustic and holistic perceptual measures of phonetic convergence. In order to draw broad conclusions about phonetic convergence, studies should employ multiple models and shadowers (both male and female), balanced multisyllabic items, and holistic measures. As a potential mechanism for sound change, phonetic convergence reflects complexities in speech perception and production that warrant elaboration of the underspecified components of current accounts.
Article
This study focuses on speakers who continue to use forms that are recessive in a community, and the phonological and conversational contexts in which recessive forms persist. Use of a local, recessive form is explored across males from four ex-mining communities in Northeast England. Older speakers, who lived in the area when the mines were open, frequently produce the localized variant of the mouth vowel, especially in speech produced during conversation about the locally resonant topic of mining, and, most frequently, in communities closest to the location with which the form is associated. Conversely, speakers born since the loss of mining and with little connection to the industry hardly produce the local form in any community or conversational topic. Exploring conversational topic provides evidence for the connections between shifting social contexts and sound change, specifically that speakers retain otherwise recessive features in speech concerning topics which are locally resonant to them.