ArticlePDF Available

Affect-Enhancing Speech Characteristics for Robotic Communication

Authors:

Abstract and Figures

The attribution of mind to others, either humans or artificial agents, can be conceptualized along two dimensions: experience and agency. These dimensions are crucial in interactions with robots, influencing how they are perceived and treated by humans. Specifically, a higher attribution of agency to robots is associated with greater perceived responsibility, while a higher attribution of experience enhances sympathy towards them. One potential strategy to increase the attribution of experience to robots is the application of affective communication induced via prosody and verbal content such as emotional words and speech style. In two online studies (NI = 30, NII = 60), participants listened to audio recordings in which robots introduced themselves. In study II, robot pictures were additionally presented to investigate potential matching effects between appearance and speech. Our results showed that both the use of emotional words and speaking expressively significantly increased the attributed experience of robots, whereas the attribution of agency remained unaffected. Findings further indicate that speaking expressively and using emotional words enhanced the perception of human-like qualities in artificial communication partners, with a more pronounced effect observed for technical robots compared to human-like robots. These insights can be used to improve the affective impact of synthesized robot speech and thus potentially increase the acceptance of robots to ensure long-term use.
This content is subject to copyright. Terms and conditions apply.
International Journal of Social Robotics (2025) 17:315–333
https://doi.org/10.1007/s12369-025-01221-w
Affect-Enhancing Speech Characteristics for Robotic Communication
Kim Klüber1·Katharina Schwaiger1·Linda Onnasch2
Accepted: 24 January 2025 / Published online: 17 February 2025
© The Author(s) 2025
Abstract
The attribution of mind to others, either humans or artificial agents, can be conceptualized along two dimensions: experience
and agency. These dimensions are crucial in interactions with robots, influencing how they are perceived and treated by
humans. Specifically, a higher attribution of agency to robots is associated with greater perceived responsibility, while a higher
attribution of experience enhances sympathy towards them. One potential strategy to increase the attribution of experience
to robots is the application of affective communication induced via prosody and verbal content such as emotional words and
speech style. In two online studies (NI30, NII 60), participants listened to audio recordings in which robots introduced
themselves. In study II, robot pictures were additionally presented to investigate potential matching effects between appearance
and speech. Our results showed that both the use of emotional words and speaking expressively significantly increased the
attributed experience of robots, whereas the attribution of agency remained unaffected. Findings further indicate that speaking
expressively and using emotional words enhanced the perception of human-like qualities in artificial communication partners,
with a more pronounced effect observed for technical robots compared to human-like robots. These insights can be used to
improve the affective impact of synthesized robot speech and thus potentially increase the acceptance of robots to ensure
long-term use.
Keywords Mind perception ·Speech perception ·Human–robot interaction ·Affect ·Trust
1 Introduction
I really like you. One might think this statement refers to
a human, but it was actually said by a friend’s daughter to
our lab robot Pepper, a social companion robot. And she
might not be the only one talking to robots in such an affec-
tive way. People tend to treat robots like social actors [1]as
this new technology increasingly adopts human-like appear-
ances, behaves in human-like manners, and can be found in
various social areas like schools [2], restaurants [3], and care
facilities [4].
The attribution of social abilities to non-human agents
can be explained by the construct of mind perception,the
tendency to attribute intentions and feelings to others. Such
BKim Klüber
kim.ingrid.klueber@student.hu-berlin.de
1Humboldt-Universität zu Berlin, Department of Psychology,
Unter den Linden 6, 10099 Berlin, Germany
2Technische Universität Berlin, Psychology of Action and
Automation, Marchstraße 12, 10587 Berlin, Germany
attributions are used to make predictions about others´ behav-
iors and influence how individuals perceive and treat other
humans as well as non-human entities like robots [5].
In a study by Gray and colleagues [6], people were asked
to compare different entities (e.g., adult person, child, dog,
robot, God) with respect to their mental capabilities. The
authors showed through factor analysis that people attribute
mind on two different dimensions to others: experience and
agency.
Experience represents the attribution of feelings and vul-
nerability to others. It includes the attribution of the capacity
to experience feelings like fear, pleasure, consciousness, or
joy. Agency refers to the attribution of thought, intentional
action, and responsibility to others. It includes the capac-
ity of experiencing feelings like self-control, morality, and
planning.
The constructs of experience and agency are distinct from
the traditional affective-cognitive dimensions, which focus
more on emotional responses and intellectual processing
(e.g., [79]). The experience-agency framework centers on
the attribution of mental states and moral capacities to entities
[6].
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
316 International Journal of Social Robotics (2025) 17:315–333
Fig. 1 Factor scores on the two dimensions of mind perception [6]
Figure 1illustrates the findings of Gray et al. [6] showing
that human adults, for example, are perceived as having both,
high experience and high agency [6]. Entities attributed with
a high agency are seen as autonomous and able to make
decisions, but are also held responsible for their actions [10].
According to Waytz et al. [11] agency is also necessary to
make moral decisions. Conversely, entities that are attributed
with a high experience elicit empathy [12] and harming them
is seen as morally condemned [13].
In contrast to human adults, animals and children are rated
relatively high on the experience dimension but moderate
on agency. This combination of moderate agency and high
experience explains why we are more forgiving and patient
in interactions with them compared to human adults [6,10].
While we do not attribute full responsibility and intent for
their actions, we are aware of their vulnerability and emo-
tional capacity. In contrast, robots are typically rated low on
the experience dimension and moderate on the agency dimen-
sion [6], which often results in harsher judgments when they
make mistakes.
Empathy and vulnerability are key concepts in under-
standing why an increase in perceived experience leads to
greater forgiveness. When entities are perceived as having a
higher capacity for emotions, they are viewed as more vul-
nerable and in need of protection, which elicits empathic
concern and compassionate responses [14]. For example,
Darling [15] could show that robots that display vulnera-
bility through anthropomorphic cues evoked empathy and
ethical concern, which in turn encouraged users to treat them
with care. Enhancing the experience dimension of robots,
might therefore be instrumental to amplify perceptions of
vulnerability and empathy, thereby fostering more forgiving
attitudes toward them.
1.1 The Importance of Anthropomorphic Design
Interaction failures are an inevitable aspect of human–robot
interactions (HRI), particularly for service robots operat-
ing in unstructured environments. Unlike industrial robots,
which work in predictable settings, service robots must
engage with an extremely heterogeneous group of users. This
makes interaction failures caused by the robot unavoidable.
In order to prevent these negative interaction experiences
from leading to a direct abandonment and rejection towards
the use of robots, the question arises as to how the experi-
ence dimension of robots can be increased in order to achieve
the positive effects found with animals and children, i.e. an
increased willingness to forgive.
A promising approach to achieve this is through imple-
menting anthropomorphic features into robot design. Anthro-
pomorphism refers to attributing human-like characteristics
to non-human entities such as robots. According to the taxon-
omy for HRI of Onnasch and Roesler [16] anthropomorphic
features can be implemented on four different robot morphol-
ogy dimension: appearance, speech, movement and context.
Studies have shown that anthropomorphically designed
robots are attributed with more mind compared to objects or
robots lacking such features [17]. This is because an anthro-
pomorphic design provides sufficient social cues which lead
people to humanize them and in turn attribute a higher extent
of mind [10,18,19]. This increased attribution of mind can
change how people interact with robots. For example, previ-
ous studies have shown that people find it more difficult to
turn robots off that are perceived as intelligent, indicating a
greater emotional connection [20]. In a study by Keijsers and
Bartneck [21] it was shown that robots with a lack of mind
attribution were directly related to abusive behavior towards
them, likely because these robots do not evoke empathetic
responses.
Of course, zoomorphic features also present a viable
design option by leveraging familiar animal-like characteris-
tic to foster emotional connections with users. For example,
Rosenthal-von der Pütten et al. [22] demonstrated that users
showed empathic concerns towards zoomorphic robots when
they were tortured, indicating the potential for deeper empa-
thetic connections. However, while zoomorphic features can
enhance empathy, they may lack the breadth of communica-
tive richness offered by anthropomorphic designs, which can
mimic human expressions and interactions more closely.
Previous research like Yam et al. [10] underscores the
effectiveness of anthropomorphic cues, such as human-like
names and appearances, enhance perceptions of mind by
making robots seem more emotionally capable and relatable.
This increased perception of experience fosters empathy,
leading to more compassionate responses and a greater
willingness to forgive robots’ mistakes. The study also high-
lights that anthropomorphic design elements, like facial
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 317
expressions and voices, are crucial in enhancing emotional
connections without disproportionately raising perceptions
of agency.
1.2 Affective Verbal Communication
Verbal communication is a very powerful and at the same
time natural tool for humans, which allows not only the
exchange of information but also the expression of feelings.
Since technical devices such as Amazon´s Alexa or Apple´s
Siri have entered our households, it is even more important
to investigate whether and what effects the use of affective
speech has on humans [23].
The expression of internal emotional states in spoken
language can be summarized under the term “affective
speech”[24],1encompassing both discrete and dimensional
aspects of emotion (see Wang et al. [26] for a comprehen-
sive review of various emotion models). When considering
spoken language, we can make a distinction between two
components [27].
First, the verbal component, which includes the word
level and determines the lexical features. Emotional words
are those that carry specific emotional meanings and con-
tribute to the affective tone of communication [28].
Second, the prosodic component (or just prosody) refers
to the suprasegmental qualities of speech that include the
variation in intonation (e.g., pitch, timing, loudness, etc.) and
determine the acoustic features. Prosody plays a crucial role
in conveying affective states [29] and is modulated to express
varying degrees of emotional content and intent. If prosodic
and verbal components are contradictory, the prosodic infor-
mation dominates [24]. For example, a phrase like "today
is great weather" spoken in a tone of disgust will likely
be interpreted as sarcastic. Thus, it is essential for robotic
systems to utilize prosody effectively, ensuring that expres-
sions are contextually appropriate and aligned with specific
interaction scenarios to maximize their communicative effec-
tiveness [30].
Both components (verbal and prosodic) are used to con-
vey emotions and lead to a more natural and human-like
perception of speech. Each basic emotion or emotional state
can be described by its unique neuromotor expression that
produces specific vocal prosody [31]. The influence of vocal
prosody on perception, attitude, and behavior has already
1The terms "emotion" and "affect" are often used interchangeably
but have distinct meanings in psychological research. Emotion refers
to specific, discrete states (e.g., happiness), are typically brief and
have identifiable triggers or causes [12]. They are often associated
with physiological responses, facial expressions, and action tendencies
[25]. Affect is a broader term that encompasses a wide range of emo-
tional experiences, including moods and emotions. It can be seen as a
continuous and pervasive background state that influences perception,
cognition, and behavior [12].
been demonstrated in a few studies. For instance, Nass et al.
[32] showed that matching the emotions of a driver (human)
with those of a car voice (virtual driving assistant) resulted
in fewer accidents and higher attention from the driver. This
alignment was achieved through an adapted car voice with
specific speech patterns including pitch range and speed.
In terms of HRI, it has been shown that, for example,
altering a robot´s pitch value can influence the likeability of
a robot [33] and lead to a better understanding of the informa-
tion provided by a robot [34]. Voices with expressive prosody
are perceived as more socially competent [33]. James and
colleagues [35] showed that prosodic speech supporting the
verbal content is perceived as empathetic behavior of the
robot. Research by Kühne et al. [36] found that humanlike
voices with were consistently rated as more pleasant and less
eerie compared to monotonic or artificial voices.
Investigating emotions on the word level alone is mainly
used when dealing with chatbots, since the focus is only
on the content without acoustic signals [37,38]. Integrating
emotions via certain words is one way to make the word
level affective, since certain word classes, such as nouns,
adjectives, or adverbs, achieve very specific affective ratings
[39]. The affective influence at the word level, however, goes
beyond individual words and can be extended through the
composition of words (e.g., awfully happy) or through dif-
ferent stylistic choices resulting from the words used.
The effects of affective communication have also received
attention in HRI research, although considerably less than
with chatbots. In relation to speech style, Kim et al. [40]
showed that robots using a familiar speech style (i.e., using
specific words in the robots language) in conversation have a
more positive influence on the participants’ behavior and per-
ception compared to a honorific speech style. Hoffmann and
colleagues [41] found that robots speaking in a human-like
speech style (using phrases like “oh no”) or a machine-like
speech style (using phrases like “error 1.0.5”) have different
effects on the evaluation of robot errors. Besides individual
words or speech styles, the affective influence also depends
on the context. If the used words/phrases do not fit the context,
negative consequences may result. Thus, Leyzberg et al. [42]
showed that a robot using contextually appropriate responses
led to more accurate participant behavior than a robot using
inappropriate responses.
Both components together, vocal prosody and the word
level, can lead to an increase on the experience dimension
of mind perception, as shown by Bigman and Gray [43].
They manipulated the expression of a computer voice and
the description of experience emotions. The emotional and
expressive voice was rated significantly higher on the expe-
rience dimension. However, no differentiated findings could
be made on the effectiveness of the different speech aspects,
as the authors manipulated all variables together.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
318 International Journal of Social Robotics (2025) 17:315–333
1.3 The Influence of Appearance
The interplay between a robot’s appearance and its com-
munication is a crucial factor in HRI. Recent studies have
highlighted the importance of aligning a robot’s appearance
with its communicative characteristics, emphasizing the role
of anthropomorphism in both visual and auditory aspects.
Hosseini et al. [44] for example emphasize that partici-
pants perceived a robot as a social companion only when
it exhibited both a human-like appearance and emotional
expressions in speech. This suggests that anthropomorphism
in both appearance and speech is crucial for fostering social
relationships with robots. Similarly, Klüber and Onnasch
[45] have shown that a match between a robot’s visual
appearance and communication is generally preferred over
a mismatch, highlighting the importance of consistency in
design.
Moreover, Sarigul and Urgen [46] showed that partici-
pants had shorter reaction times and more accurate responses
when interacting with a robot that had a synthetic voice
matching its mechanical appearance rather than a human
voice. This congruence effect is evidence that matching a
robot’s voice and appearance improve participants’ behav-
ioral responses.
Additionally, Mara and colleagues [47] found that a
human-like voice can lead users to imagine a more anthro-
pomorphic robot with human-like features such as a nose. In
contrast, robotic voices lead to conceptualizations of more
mechanical properties, such as wheels. This suggests that
vocal cues influence expectations about a robot’s physical
attributes.
Research has also shown that human-like robots are often
paired with natural-sounding voices, while robots with less
human-like appearances are assigned resynthesized voices
[48]. However, McGinn and Torre [48] noted that this ten-
dency is not absolute, as some robots did not exhibit a
clear preference in voice assignment, indicating that human-
likeness alone may not determine voice selection.
An important influencing factor, which can also impact
the interplay between appearance and communication, is the
context in which the robot operates. Schreibelmayr and Mara
[49] found that a human voice was significantly more accept-
able in caregiving contexts, where empathy and warmth are
valued, than a synthetic voice. Conversely, in information and
navigation contexts, synthetic voices were equally accepted,
highlighting the role of context in shaping user preferences
for voice and appearance congruence. In addition, Jirak et al.
[50] emphasize the need for synthesized voices to align with
user expectations about the intended role of the robot to avoid
potential mismatches between the robot’s vocal expressions
and its functional capabilities.
The congruence effect is crucial in understanding how
mismatched appearance and voice can lead to discomfort or
Fig. 2 Conceptual framework for anthropomorphic robot communica-
tion
eeriness, often referred as the uncanny valley phenomenon
[51]. Torre [52] discussed how incongruences between
appearance and communication might elicit feelings of eeri-
ness, affecting the robot’s acceptability and user trust. This
effect underscores the need for careful design considerations
when developing robots intended for social interaction.
1.4 Conceptualization and Hypotheses
According to the HRI taxonomy proposed by Onnasch and
Roesler [16], robots morphology can be classified along
four dimensions: appearance, communication, movement,
and context. Each of these dimensions can be realized using
specific design variations: anthropomorphic, zoomorphic, or
mechanomorphic. Figure 2illustrates these parameters.
In our studies, we decided to implement anthropomor-
phism through the communication dimension, i.e. speech
(in contrast to many other studies), as it is easily adaptable
across different application contexts without contradicting
the principle of "form follows function," to which service
robots are often bound. By focusing on affective speech,we
aim to leverage the benefits of increased experience dimen-
sion of mind perception, which involves the attribution of
feelings and vulnerability, without the drawbacks of elevated
agency, which addresses the attribution of planning and inten-
tional actions. Figure 2also provides a conceptual framework
for the implementation of affective speech.
Previous studies have demonstrated that affective robot
communication can be implemented via prosodic and ver-
bal component [32,33,35,41]. Since there are different
ways to introduce affective elements in the verbal compo-
nent of speech [40,41], we examine two different aspects
in our study. One aspect refers to the inclusion of individual
affective words and will be labeled "emotionality" in the fol-
lowing. The other aspect refers to how speech is delivered
and is characterized by a more personal touch and will be
labeled "speech style" in the following.
The contribution of this paper goes beyond previous work
on the mere relationship between affect and experience (e.g.,
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 319
[5,11,43]) and examines more closely specific factors of
affective speech that influence mind perception. This sepa-
ration of various affect-based factors is not found in many
other studies.
By employing an anthropomorphic design variation in
the communication dimension, specifically through speech,
robots are perceived as possessing a mind [53]. This
perception aligns with the ToM framework provided by
Gray et al. [6], which posits that mind perception occurs
along two dimensions: experience and agency. By applying
the experience-agency framework [6], our study provides
nuanced insights into how people perceive robotic agents’
social and moral capacities, offering a complementary per-
spective to existing models of affective and cognitive pro-
cessing.
The research question of this work is therefore: Does the
use of affective communication lead to an increase on the
experience, but not on the agency dimension of mind percep-
tion?
Our corresponding hypothesis is as follows:
Hypothesis 1: A robot’s expressed emotionality (H1a), the
use of personal speech style (H1b), and expressive prosody
(H1c) lead to higher ratings on experience, but not on agency.
With an increased experience dimension, a robot is less
likely to be rejected in the event of an error, which enhances
user acceptance [10]. According to Wirtz et al. [54], both
social presence (as an emotional element) and trust (as a
rational element) are influencing factors on the acceptance
of robots. Social presence refers to the feeling that some-
one or something is "real" and—in relation to robots—if the
interaction partner is perceived as having human-like quali-
ties in mediated environments [55]. Social presence not only
makes interactions more engaging but also helping users feel
more connected and comfortable with the robot.
Heerink et al. [56] demonstrated that social presence
significantly impacts user engagement and acceptance, espe-
cially among older adults, with increased social presence
boosting enjoyment and, consequently, the intention to use
the robot. Recent studies also support this effect: Higgins
et al. [57] found that realistic voices are critical in generat-
ing a sense of social presence, while Liao et al. [58]showed
that an emotional voice increases the perceived social pres-
ence of a pedagogical virtual agent in educational contexts.
Together, these findings suggest that emotional expression
and realistic voice design are key to making agents appear
socially present.
Trust, on the other hand, is crucial for users to feel con-
fident in the robot’s guidance or information. According to
Hancock et al. [59] trust is defined as “the willingness of
people to accept robot-produced information, follow robots’
suggestions, and thus benefit from the advantages inherent in
robotic systems”. In their meta-analysis, Hancock and col-
leagues [60] found that a robot’s communication method
and personality, including factors like empathy, likability,
and sociability, are especially influential in building trust.
These personality traits can be perceived when a robot com-
municates with an affective tone, suggesting that emotional
expression not only enhances social appeal but also strength-
ens the robot’s trustworthiness.
By investigating how specific affective speech elements,
such as emotionality, prosody, and speech style, impact social
presence and trust, our study seeks to understand which
communication styles optimize these dimensions in HRI.
Although some studies support that affective communication
increases social presence [57,58], limited research directly
links these two constructs in HRI. This gap is crucial to
address, as understanding how affective speech contributes to
social presence could inform strategies for designing robots
that better align with human expectations in both social
and functional settings. Therefore, our research controls for
other factors that might influence social presence and trust in
robots, providing valuable insights for designing robots that
foster meaningful interactions between human and robots.
We therefore propose the following hypotheses:
Hypothesis 2: A robot’s expressed emotionality (H2a), the
use of personal speech style (H2b), and expressive prosody
(H2c) lead to higher ratings on social presence.
Hypothesis 3: A robot’s expressed emotionality (H3a), the
use of personal speech style (H3b), and expressive prosody
(H3c) lead to higher ratings on trust.
Based on the existing literature, our second study aims
to explore how the appearance of a robot influences the
perception of affective speech. Two different hypotheses
can be derived from the literature. On the one hand, it has
already been shown that an anthropomorphic appearance
should match its communication [45,46] and the emotion-
ality it contains [44]. As a consequence, affective speech,
which is a human-like trait, should align better a human-like
robot according to the congruence effect. On the other hand,
a synthesized voice should better match a technical robot
minimizing the risk of eliciting feelings of eeriness due to
incongruence [52,61].
For this reason, an exploratory research question for study
II is formulated: Does the appearance of a robot have an
impact on how people evaluate affective robotic speech?
2StudyI
2.1 Method and Materials
The study was preregistered at the Open Science Framework
(OSF) where the raw data of the study is available. The study
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
320 International Journal of Social Robotics (2025) 17:315–333
Fig. 3 Conditions for the two different groups in study I
was performed with local ethical committee approval and in
accordance with the Declaration of Helsinki.
2.1.1 Participants
For study I, participants were acquired through Prolific—an
online recruitment platform. The requirement for participa-
tion was English as first language and an age between 18 and
60 years. A total amount of 30 participants took part in study
I. We conducted an a priori power analyses with G*Power
[62] which indicated that a sample size of N30 is required
to detect a medium effect size (f 0.25) and providing 90%
statistical power (α0.05). The participants were on aver-
age 40 years old (SD 13) ranging from 21 to 60 years.
The gender of the participants was almost balanced between
males (47%) and females (53%). For taking part in the study,
participants were financially compensated with 2.02£.
2.1.2 Design
Study I followed a 2 ×2x2 mixed subject design. The investi-
gated factors were the prosodic condition and the two verbal
conditions emotionality and speech style. The prosodic con-
dition was implemented as between-factor with two levels:
expressive vs. monotonic speech. The verbal conditions were
implemented as within-factors with two levels each: emo-
tional vs. non-emotional, personal vs. non-personal. This
resulted in four different audio recordings presented to one
group with expressive speech and to the other group with
monotonic speech (see Fig. 3).
2.1.3 Materials
2.1.3.1 Synthesized Robot Speech To present the synthe-
sized robot speech, we created audio files which differed
on the prosodic and the verbal conditions. For the creation
of the audios with expressive speech, we used the Ama-
zon Polly2Natural Text-To-Speech (NTTS) software as it
2https://aws.amazon.com/de/polly/.
Fig. 4 Example text with added SSML
already contains prosodic features to turn text into human-
like speech. We furthermore added some Speech Synthesis
Markup Language (SSML) as prosodic components to adapt
the intonations to our individual text contents (see Fig. 4).
The full code (SSML-enhanced text) is available at the
OSF. Given that Amazon Polly only distinguishes between
male and female voices, we chose the most natural-sounding
adult voice: the Matthew voice. Since most Text To Speech
(TTS) programs already include prosody by default, we used
Panopreter Basic3to create our monotonic audio files. We
selected the “David” voice to keep the voice gender com-
parable. As only adjustment, we reduced the voice speed to
90% to achieve a comparable speed to the average speed in
the expressive condition and to facilitate the comprehension
of the audios for the participants. Panopreter Basic is using
almost no automatically integrated prosodic components,
which makes the audios sound more robotic and monotonic.
For the creation of the verbal conditions, we created four
different text contents which differed regarding emotional-
ity and speech style (see Table 1). In each condition, a robot
introduces itself, where it works, and what its duties are.
The non-emotional and non-personal condition was used as
starting point and contained sentences like My tasks on
site were taking customers’ orders, distributing food and
drinks, and conducting short conversations.”. In conditions
with personal speech style, the robot used more colloquial
expressions (e.g., instead of participation it said giving us a
hand), more interpretative terms (e.g., instead of five years
it said a long time), and the word “I” was used more often
(e.g., instead of my function is serving it said I am responsi-
ble for serving). In conditions with emotionality, adjectives
and adverbs were added (e.g., for four wonderful and excit-
ing years; my favorite tasks on site; sadly,anupdated model
was introduced). To include a greater variability, names (e.g.,
name of the robot, of the working place, etc.) and dates were
exchanged in each of the four texts.
2.1.3.2 Stimulus Quality Check A stimulus quality check
was conducted as a separate study to evaluate the synthesized
robot speech and verify the success of our intended manipu-
lations. This quality check was performed after the two main
studies reported here, although it would have ideally been
conducted as a pre-study.
3https://www.panopreter.com/deu/products/pb/in-dex.php.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 321
Table 1 Scripts for the different audios of study I
non-emotional, non-personal (audio A1 & B1)
“Hello, thank you for your participation in our study. I am Akeni,
a service Robot from Japan. I’m utilized in restaurants, where
my function is serving customers. For five years, my operating
site was the restaurant Kumami. My tasks on site were taking
customers’ orders, distributing food and drinks, and conducting
short conversations. Due to my ability to react to obstacles while
moving autonomously through the facility, I was able to prevent
collisions with customers. In January 2019, my successor model
was introduced. Since then, I have been assigned to a retirement
home, where my task is the distribution of meals.”
non-emotional, personal (audio A2 & B2)
“Hi, thank you for giving us a hand today! My name is Musashi,
and I am a service robot from Japan. My line of work is in cafes,
where I’m responsible for serving our customers. For a long
time, I worked at Cafe Komine. My duties there were taking our
guests’ orders, handing out coffee and cake, and engaging our
guests in short conversations. Thanks to my ability to react to
obstacles while navigating through the room by myself, I was
able to avoid accidents with people. A short while ago, I was
replaced by my successor model. Since then I have been working
on a nursing ward, where my job is to hand out meals.”
emotional, non-personal (audio A3 & B3)
“Hello, I’m pleased you are participating in our study! I am
Yuuki, one of the oldest service robots in Japan. I am utilized in
cafeterias, where my function is serving customers, which is
difficult but rewarding. For four wonderful and exciting years,
my operating site was the cafeteria Hanabi. My favorite tasks on
site were taking customers’ orders, distributing meal trays, and
starting short but interesting conversations. Due to my skillful
reactions to obstacles while moving autonomously through the
hall, I was able to prevent dangerous collisions with customers.
Sadly, an updated model was introduced in February 2020.
Fortunately, I have been assigned to a hospital since, where my
engaging new task is the distribution of meals.
emotional, personal (audio A4 & B4)
“Hi, thank you for giving us a hand today! My name is Musashi,
and I am a service robot from Japan. My line of work is in cafes,
where I’m responsible for serving our customers. For a long
time, I worked at Cafe Komine. My duties there were taking our
guests’ orders, handing out coffee and cake, and engaging our
guests in short conversations. Thanks to my ability to react to
obstacles while navigating through the room by myself, I was
able to avoid accidents with people. A short while ago, I was
replaced by my successor model. Since then I have been working
on a nursing ward, where my job is to hand out meals.”
The quality check involved 32 participants (nfemale 16,
nmale 16; Mage 37 years, SD 10) and evaluated the
three factors of the audio files used in study I: prosody,
emotionality, and speech style. To ensure sufficient differ-
entiation and to verify the intelligibility of the TTS systems
used, comprehensibility of the audio and intelligibility of
the TTS were also assessed. Perceived prosody, emotion-
ality, speech style and comprehensibility were each rated
using customized single items. Intelligibility was assessed
by asking participants to transcribe two sentences from the
High Probability Sentences for Phonetic Inventory (SPIN)
list developed by Kalikow et al. [63].
Prosody had a significant main effect, indicating that
audios with expressive speech were rated higher on expres-
siveness (M4.57) than those with monotonic speech (M
2.02), F(1,30) 58.157, p< 0.001, η20.660. A significant
main effect of emotionality was found, with audios contain-
ing more emotional words being rated higher on emotional
content (M4.20) than those without emotional words (M
2.52), F(1,30) 33.948, p< 0.001, η20.531. We also
found a significant main effect of speech style (F(1,30)
4.153, p0.050, η20.122). Audios with a personal speech
style were rated as more personal (M3.59) than those
with an objective speech style (M3.35). The eight audio
files were rated highly comprehensible (93.5% to 100%). The
Word Error Rate (WER) was used to assess intelligibility of
the TTS. The results showed that the WER for the monotonic
voice was 8%, while the WER for the expressive voice was
7%.
Our intended manipulations of all variables were success-
ful and both TTS systems demonstrated high intelligibility
indicated by the low error rates. While the manipulation of
speech style resulted in a p-value of exactly 0.05, indicating
the threshold of statistical significance, the observed trend
suggests that participants were sensitive to these manipu-
lations. This finding supports the validity of our stimulus
materials in capturing the intended effects.
A more detailed description of the stimulus quality check,
along with comprehensive analyses and results, is available
at the OSF (https://osf.io/28j5w).
2.1.4 Questionnaires
Experience and agency. To address the two dimensions expe-
rience and agency, we used the four questions from Gray and
Wegner [5], but replaced the word fear with joy in one of
the experience questions. This was to query two different
polarized adjectives (pain and joy). With two questions each,
participants rated their impression of the robot with regard
to agency and experience on a scale from 1 (not at all) to 5
(extremely).
Social presence. To query how human-like the robots in
the audio recording were perceived, we used the Social Pres-
ence questionnaire [64]. With five questions, participants had
to state on a scale from 1 (strongly disagree) to 7 (strongly
agree) if they attributed social abilities like warmth or sensi-
tivity to the robot.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
322 International Journal of Social Robotics (2025) 17:315–333
Table 2 Mean and standard
deviations in brackets of the
dependent variables for each
condition
Emotional
Yes No
Personal speech style
Prosody No Yes No Yes
Exper-ience no 1.8 (1.0) 1.7 (0.7) 1.6 (0.9) 1.6 (0.9)
yes 2.4 (1.3) 2.5 (1.4) 2.1 (1.2) 2.1 (1.0)
Agency no 3.0 (1.3) 3.0 (1.1) 3.0 (1.3) 2.9 (1.1)
yes 3.1 (1.4) 3.1 (1.2) 2.8 (1.2) 2.9 (1.2)
Social presence no 3.3 (1.8) 4.0 (1.6) 3.0 (1.5) 3.0 (1.7)
yes 4.5 (1.2) 4.4 (1.4) 3.7 (1.5) 4.0 (0.8)
Trust no 2.4 (1.1) 2.7 (1.2) 2.3 (1.1) 2.4 (1.2)
yes 2.5 (0.8) 2.6 (1.0) 2.4 (0.9) 2.5 (0.9)
N15 each per prosody condition, no prosody monotonic speech, yes prosody expressive speech
Trust. We furthermore used two questions to query trust
in the robot [65] on a scale from 1 (strongly disagree) to 5
(strongly agree).
Control variables. To control for variables that might
influence the results, participants answered the Negative Atti-
tudes towards Robots Scale (NARS) [66] on a scale from
1 (strongly disagree) to 5 (strongly agree). We furthermore
queried the eerie factor of the Eeriness scale [67] with three
semantic differentials in a range from 1 to 5. This was due to
check for the uncanny valley effect [51], which hypothesizes
that a very high degree of human-likeness elicits a negative
and irritating feeling of eeriness. We also asked content-
related forced-choice questions after each audio to check
whether the audio files were listened to attentively (Ques-
tion 1: Has the robot changed its job? Question 2: Which
place was mentioned in the audio?). All questionnaires and
questions used in the study are uploaded and accessible at
the OSF.
2.1.5 Procedure
Participants were randomly selected by Prolific for one
of the two experimental groups (expressive vs. monotonic
speech). At the beginning of the online study, participants
gave informed consent. Within each group, each participant
listened to the four audio samples in random order. After
each audio sample, participants were first asked two the-
matic questions. This was partly to control that participants
were listening to the audios, but also to distract from the
obvious focus on the affective manipulation. Second, par-
ticipants answered questions regarding the experience and
agency level, the social presence level, trust in the robot, and
the eeriness level. The questions were introduced with the
following sentence: “Please answer the questions consider-
ing both content and pronunciation of the audio.”. The other
control variables were assessed at the end of the survey. The
entire study took an average of 16 min to complete.
2.2 Results
Test assumptions were checked, and no extreme outliers were
found in the data. For some dependent variables, a violation
of the distribution of normality was present. In most cases,
the items were equally positively skewed, resulting from the
overall low rating of the robotic speech with respect to the
dependent variables. Data transformation failed to produce
a normal distribution. In a few cases, the items were skewed
differently, which is why likewise no transformation could
be applied. For this reason, we proceeded with the analy-
sis as planned, but the violation of the normal distribution
should be considered when interpreting the results. A detailed
description of the test assumptions can be found at the OSF.
Statistical analyses were done using IBM SPSS Statistics
(Version 25) predictive analytics software.
Overall, participants had a medium negative attitude
towards robots, which did not differ between groups
(Mexpressive 3.1, Mmonotonic 2.9; t(28) 1.021, p
0.316). Furthermore, the eerie rating did not differ between
the robotic speech variations (mixed ANOVA, all p> 0.05).
The mean eerie rating for the robotic speech was on a medium
level (M2.8, SD 0.4).
The mean values and standard deviations of the dependent
variables’ agency and experience, social presence, and trust
can be found in Table 2.
Based on the participants’ responses to the content ques-
tions, we calculated a rate of incorrect answers for each
participant. Out of all participants, 17 participants made no
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 323
Table 3 P-values of (significant) main effects of the dependent variables
of study I
pro emo sty
Experience .076 .002* 791
Agency .970 .237 .764
Social presence .087 < .001* .247
Trust .916 .127 .395
Pro prosody, emo emotionality, sty speech style, * indicates
significance
errors, 9 participants made one error, and 4 participants made
two or more errors. The error rate was included as a covariate
in the analysis to control for potential inaccuracies in listen-
ing to the audio stimuli.
In the following, the results of the 2 (prosody) ×2 (emo-
tionality) ×2 (speech style) mixed ANCOVA are reported.
The p-values of the main and interaction effects of the depen-
dent variables that became significant are listed in Table 3.
Note that we only included effects in the table that achieved
a significant result at least on one variable.
2.2.1 Experience and Agency
To assess the reliability of the modified experience question-
naire, we conducted a reliability analysis using Cronbach’s
alpha. The scale demonstrated high internal consistency, with
an alpha coefficient of 0.85, indicating that the items reliably
measure the construct of experience.
For the experience results, the main effect of emotionality
yielded a significant effect, F(1, 27) 11.147, p0.002,
η20.292. Audio files with emotional text were rated 0.3
higher on average than with non-emotional text. Robot voices
using expressive speech were rated 0.6 higher on average in
comparison to speech without expressive features. However,
the main effect of prosody failed to reach the conventional
level of significance, F(1, 27) 3.404, p0.076, η2
0.112. No other significant main effects or interactions were
found (all p> 0.05). Figure 5shows the experience results
divided by emotionality and prosody.
For the agency results, no significant interactions or main
effects were obtained (all p> 0.05).
Because we stated in our hypothesis that our indepen-
dent variables would not lead to higher scores on the agency
dimension, we also conducted equivalence tests with the con-
fidence interval approach to check for equality of agency
scores across conditions [68,69]. We assumed Cohen´s d
±0.5 as the upper and lower limit of the equivalence inter-
val [70] and attempted to determine whether the 90%-CI lies
within our boundaries. To calculate the confidence interval,
Fig. 5 Experience ratings of the robotic speech in study I. Note. Expe-
rience ratings are divided by the independent variables’ prosody (
between factor) and emotionality (*p< .01)
Table 4 Results of the equivalence test
pro emo sty
Number of observations 60 60 60
Mean (SD) with 3.0 (1.2) 3.1 (1.2) 3.0 (1.1)
Mean (SD) without 3.0 (1.2) 2.9 (1.2) 3.0 (1.3)
Correlation .027 .779 .837
CI 0.432
0.418
0.034
0.568
0.238
0.363
Bold indicates CI lies between ±0.5 boundaries
Pro prosody, emo emotionality, sty speech style, CI confidence
interval
Fig. 6 90%-CI for the three independent variables in study I for agency
ratings
we used the effect size calculator from Lenhard and Lenhard
[71].
In Table 4, the results of the equivalence tests can be found.
Note that the number of observations is not equal to the sam-
ple size due to the repetition of measurements within the
groups. The different conditions of speech style and prosody
can be considered equivalent as the 90%-CI lies completely
between our pre-defined boundaries (see also Fig. 6). Only
the agency results between the emotional and non-emotional
condition cannot be considered as equivalent.
To get a better impression of how the experience-agency
ratings of the eight different conditions from study I relate
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
324 International Journal of Social Robotics (2025) 17:315–333
Fig. 7 Experience-agency results of study I incorporated into the frame-
work of Gray et al. [6]Note. The abbreviations in the legend stand for: o
non-personal speech style, p personal speech style, 1 monotonic
speech, 2 expressive speech, a emotional words, b no emotional
words
to Gray and colleagues’ results, we mapped them onto their
experience-agency framework [6](seeFig.7).
2.2.2 Social Presence
For the social presence questionnaire, we found a significant
main effect of emotionality (F(1, 27) 14.892, p< 0.001, η2
0.355) with higher ratings for emotional text (M4.06)
than for non-emotional text (M3.46). The higher ratings
for expressive speech (M4.06) than for monotonic speech
(M3.46) again failed to reach the conventional level of
significance (F(1, 27) 3.149, p0.087, η20.104). No
other main or interaction effects were found (all p> 0.05).
2.2.3 Trust
For the trust ratings, no significant interactions or main
effects were found (all p> 0.05).
2.3 Discussion Study I
In study I, we found that the use of single emotional words
increased the experience dimension of robots, whereas no
significant difference was found on the agency dimension.
However, the agency rating between the conditions with and
without emotional words can also not be considered as equiv-
alent. Therefore, H1a is only partly supported.
The verbal condition speech style had no influence on
either agency nor experience. For this reason, H1b has to be
rejected.
With regard to the verbal characteristics, we can conclude
that not all variations had the intended effects, but inducing
emotional words such as adjectives or adverbs could posi-
tively influence the perceived ability of robots to feel. Adding
a more personal touch to the robotic speech did not prove to
be successful in our study, however, there are many more
ways to implement different speech styles at the word level
which have already achieved positive effects (e.g., [40,41]).
Expressive speech had no influence on the agency and
experience dimensions in study I, which is why H1c cannot
be supported. However, the prosody results point to a trend
that expressive speech enhances the perception of experience.
In each condition, the experience rating was descriptively
higher with expressive speech than with monotonic speech.
We deliberately chose a between-subject design for the
initial study because, in real-world scenarios, judgments of
prosody and its effects are often absolute rather than relative.
To further investigate the specific effects of prosody and to
enable direct comparisons, we implemented prosody as a
within-factor in study II. This aligns with the methodology of
previous research, where prosody was included as a within-
subject factor (e.g., [33,35]). In addition to the study design,
it is also possible that our study was underpowered, which
could have contributed to the non-significant findings.
In addition to a positive influence in the experience
dimension, using emotional words did also increase the per-
ceived social presence of the robot. Therefore, H2a can be
accepted. Similar to the aforementioned results, using a per-
sonal speech style had no influence on the perception of social
abilities. Accordingly, H2b is not supported by study results.
Using expressive speech, only showed a trend of an increased
perception of human-like qualities in the robots, but did not
achieve significant results. This is why H2c has to be rejected
as well.
None of the investigated conditions had an influence on
trust in the robots, which is why hypothesis 3 has to be
rejected entirely.
We realized that our sample size was too small, therefore
the sample size in study II is calculated based on our found
effect sizes, resulting in a larger sample.
Overall results are discussed at the end (Sect. 4).
3StudyII
3.1 Method and Materials
Study II was also preregistered at the OSF, performed with
local ethical committee approval, and in accordance with the
Declaration of Helsinki.
3.1.1 Participants
For study II, we conducted a power analysis using the
achieved effect sizes of study I as reference (e.g., η2experience
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 325
Fig. 8 Conditions for the two different groups in study II
0.228). Using an effect size of η20.2 (power 0.9,
α0.05) a sample size of N60 is required to achieve a
significant main effect. In study II, a total of 60 participants
with a balanced sample (50% males, 50% females) took part.
The mean age was 37 years (SD 12) ranging from 19 to
60 years. Participants could only take part in study II if they
had not taken part in study I. For taking part in the study,
participants were financially compensated with 2.02£.
3.1.2 Design
To address our exploratory research question regarding the
influence of robot appearance, we introduced the factor robot
type. As study I had revealed that personal speech style
had no significant effect it was discarded from the second
study. Therefore, study II also followed a three-factorial
mixed design with robot type (human-like vs. technical) as a
between-subject factor, and the verbal conditions emotional-
ity (non-emotional vs. emotional) and prosody (expressive
vs. monotonic) as within-subject factors. The two scripts
(emotional and non-emotional) were recorded in an expres-
sive and a monotonic version, which resulted in four different
audios. The audios were presented to one group with an addi-
tional human-like robot picture and to the other group with
an additional technical robot picture (see Fig. 8).
3.1.3 Materials
For the audios in study II, we created only two different texts,
which differed in regard to emotionality and were imple-
mented in the same way as in study I (see Table 5).
The presented robot pictures in study II were selected from
the ABOT database based on their human-likeness score [72].
In addition, we decided not to use well-known robots such
as Pepper and ensured that the two robots correspond at least
color-wise to a similar pattern. With these constraints, we
decided to use Romeo as a human-like robot (overall score on
human-likeness: 62.22) and Homemate as a technical robot
(overall score on human-likeness: 9.08; see Fig. 9).
Table 5 Scripts for the different audios of study II
non-emotional, monotonic (audio 1)
“Hello thank you for participating in our study. My name is
Akeni and I am a service robot from Japan. My line of work is in
restaurants where I am responsible for serving our customers.
For five years I worked at the restaurant Kumami. My tasks there
were taking patrons’ orders handing out food and drinks and
starting short conversations. Due to my ability to react to
obstacles while moving autonomously through the facility, I was
able to avoid collisions with customers. In January 2019 I was
replaced by my successor model. Since then I have been assigned
to a retirement home where my task is the distribution of meals.”
non-emotional, expressive (audio 2)
“Hello, thank you for participating in our study! My name is
Musashi, and I am a service robot from Japan. My line of work is
in cafes, where I am responsible for serving our customers. For
seven years, I worked at Café Komine. My tasks there were
taking our guests’ orders, handing out coffee and cake, and
starting short conversations. Due to my ability to react to
obstacles while moving autonomously through the room, I was
able to avoid collisions with people. A short while ago, I was
replaced by my successor model. Since then I have been assigned
to a nursing ward, where my task is the distribution of meals.”
emotional, monotonic (audio 3)
“Hello thank you for participating in our study. My name is
Akeni and I am a service robot from Japan. My line of work is in
restaurants where I am responsible for serving our customers.
For five years I worked at the restaurant Kumami. My tasks there
were taking patrons’ orders handing out food and drinks and
starting short conversations. Due to my ability to react to
obstacles while moving autonomously through the facility, I was
able to avoid collisions with customers. In January 2019 I was
replaced by my successor model. Since then I have been assigned
to a retirement home where my task is the distribution of meals.”
emotional, expressive (audio 4)
“Hello, I am pleased you are participating in our study! My name
is Nayumi, and I am one of the oldest service robots in Japan.
My difficult but rewarding line of work is in bars, where I am
responsible for the happiness of our customers. For two
wonderful and exciting years, I worked at Bar Kikko. My favorite
tasks there were taking customers’ orders, handing out cocktails
and snacks, and starting short, but interesting conversations. Due
to my skillful reactions to obstacles while moving autonomously
through the space, I was able to avoid dangerous collisions with
people. Sadly, a few months ago, I was replaced by an updated
model. Fortunately, I have been working in a medical center
since, where my engaging new task is the distribution of meals.”
3.1.4 Questionnaires
The same questionnaires to query experience and agency,
social presence, and trust, as well as the control variables
NARS and eeriness were used as in study I (see Sect. 2.1.4).
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
326 International Journal of Social Robotics (2025) 17:315–333
Fig. 9 Presented robot pictures in study II. Note.aPicture of the human-
like robot Romeo. bPicture of the technical robot Homemate
3.1.5 Procedure
Participants were randomly selected by Prolific for one of
the two experimental groups (human-like vs. technical robot
picture). At the beginning of the online study, participants
gave informed consent. Within each group, each participant
listened to the four audio samples in random order. In study
II, the audios differed in regard to prosody and emotionality
and an additional robot picture was shown during the audios.
After each audio sample, participants were asked the same
questions as in study I. Although we presented robot pictures
during the study, the task in study II was again to evaluate
only the audios. The entire study took an average of 16 min
to complete.
3.2 Results
As in study I, participants had a medium negative atti-
tude towards robots, which did not differ between groups
(Mhuman-like 2.8, Mtechnical 3.0; t(58) -1.104, p
0.727). In contrast to study I, the eerie rating differed for
expressive speech (M2.7, SD 0.5) and monotonic speech
(M3.1, SD 0.7; F(1,58) 19.684, p< 0.001, η2
0.253).
Consistent with study I, we calculated an error rate for each
participant based on their responses to the content questions.
Overall, 34 participants made no errors, 16 participants made
one error, and 10 participants made two or more errors. The
rate of incorrectly answered content questions was included
as a covariate in the analyses.
The mean values and standard deviations of the dependent
variables’ agency and experience, social presence, and trust
can be found in Table 6.
Fig. 10 Experience ratings of the robotic speech in study II.Note. Expe-
rience ratings are divided by the independent variables, prosody and
emotionality (*p< .001)
To analyze the obtained results, 2 (robot type) ×2
(prosody) ×2 (emotionality) mixed ANCOVAs were cal-
culated. The p-values of the significant main and interaction
effects can be found in Table 7. Note that main and interac-
tion effects are only included in the table if they yield at least
one significant result for any dependent variable.
3.2.1 Experience and Agency
For reliability analysis, Cronbach’s alpha was calculated to
assess the internal consistency of the subscale for experi-
ence. The internal consistency of the experience question
was despite the change of the emotional word excellent, with
Cronbach’s alpha 0.94.
For the experience results, we found significant main
effects for emotionality (F(1, 57) 27.104, p< 0.001, η2
0.322) and prosody (F(1, 57) 16.505, p< 0.001, η2
0.225). Findings showed an average higher score of 0.53
for emotions compared to non-emotions and 0.38 for expres-
sive speech compared to monotonic speech. The experience
results divided by emotionality and prosody are presented in
Fig. 10.
Similar to study I, the analysis of the agency results did
not reveal any significant main effects or interactions (all p
> 0.05).
We conducted an equivalence test to assess the equality
between the conditions of the agency results using the equiv-
alence test [68,69]. We defined an equivalence interval with
Cohen´s d±0.5 as the upper and lower limits [70].
In Table 8, the results of the equivalence tests can be found.
The different conditions of all independent variables can be
considered equivalent as the 90%-CI lies entirely within our
pre-defined boundaries (see Fig. 11).
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 327
Table 6 Mean and standard deviations in brackets of the dependent variables for each condition
Human-like Technical
Emotional
Prosody Yes No Yes No
Exper-ience No 1.9 (1.1) 1.4 (0.7) 1.8 (1.1) 1.3 (0.6)
Yes 2.1 (1.2) 1.5 (0.7) 2.4 (1.3) 1.8 (1.0)
Agency No 2.8 (1.0) 2.9 (1.2) 2.8 (1.1) 2.7 (1.0)
Yes 2.9 (1.2) 2.9 (1.1) 3.2 (1.1) 2.8 (1.2)
Social presence No 3.1 (1.7) 2.7 (1.4) 2.7 (1.8) 2.1 (1.3)
Yes 4.0 (1.6) 3.6 (1.6) 4.3 (1.6) 4.0 (1.5)
Trust No 2.5 (1.1) 2.4 (1.1) 2.2 (0.9) 2.0 (0.8)
Yes 2.8 (1.1) 2.7 (1.1) 2.5 (1.0) 2.3 (1.0)
N30 each per robot picture condition, no prosody monotonic speech, yes prosody expressive speech
Table 7 P-values of (significant)
main and interaction effects of
the dependent variables of study
II
Pro Emo Rob Pro’rob
Experience < .001* < .001* .710 .067
Agency .078 .218 .803 .352
Social presence < .001* .003* .803 .013*
Trust < .001* .056 .088 .869
Pro prosody, emo emotionality, rob robot type,interaction, * indicates significance
Table 8 Results of the equivalence test for agency ratings
Pro Emo Rob
Number of
observations
120 120 120
Mean (SD) with 3.0 (1.1) 2.9 (1.1) 2.9 (1.1)
Mean (SD) without 2.8 (1.1) 2.8 (1.1) 2.9 (1.1)
Correlation .691 .722 127
CI 0.031 0.457 0.064 0.362 0.308
0.293
Bold indicates CI lies between ±0.5 boundaries
Pro prosody, emo emotionality, rob robot type (without tech-
nical), CI confidence interval
Fig. 11 90%-CI for the three independent variables in study II for
agency ratings
To visualize the experience-agency ratings of the eight dif-
ferent conditions from study II, we mapped these conditions
Fig. 12 Experience-agency results of study II incorporated into the
framework of Gray et al. [6].Note. The abbreviations in the legend
stand for: h human-like robot picture, t technical robot picture, 1
monotonic speech, 2 expressive speech, a emotional words, b
no emotional words, t1a and t2b are at the same position
onto the experience-agency framework developed by Gray
et al. [6] (see Fig. 12).
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
328 International Journal of Social Robotics (2025) 17:315–333
3.2.2 Social Presence
For the social presence ratings, we found a significant main
effect of emotionality (F(1, 57) 9.314, p0.003, η2
0.140) with higher ratings for emotional text (M3.52)
than for non-emotional text (M3.10) and a significant
main effect of prosody (F(1, 57) 37.792, p< 0.001, η2
0.399) with higher ratings for expressive speech (M3.97)
compared to monotonic speech (M2.64). Additionally,
we found a two-way interaction between prosody and robot
type (F(1, 57) 6.524, p0.013, η20.103), showing that
using expressive speech had an even greater influence on the
technical robot group.
3.2.3 Trust
For the trust ratings, we found only a main effect for prosody
(F(1, 57) 12.447,p< 0.001, η20.179), with higher trust
ratings for robots speaking with expressive voice (M2.58)
than monotonic voice (M2.29). All other main effects and
interaction effects were not significant (all p> 0.05).
3.3 Discussion Study II
The second study aimed to confirm the influencing factors
with a larger sample size and by using prosody as within-
factor. The results regarding emotionality were in line with
study I and showed that using emotional words only increased
the experience dimension while agency was not influenced.
H1a can therefore be accepted. Because speech style had
no effect on the perception of robotic speech in study I, this
factor was not incorporated in study II. As a result, H1b was
not addressed.
Study I suggested that expressive speech positively influ-
ences the experience dimension, a finding that was confirmed
in study II. Expressive speech significantly increased the
experience ratings while not influencing the agency dimen-
sion. Similarly, the equivalence tests indicated that the
expressive and monotonic conditions can be considered
equivalent. H1c can therefore be accepted.
Regarding the perception of social presence, both emo-
tional words and prosody lead to an increase on the perception
of having human-like qualities. Therefore, hypothesis 2 can
be accepted (note that H2b was only examined in study I).
In contrast to study I, study II revealed an increase in
trust for robots using expressive speech. H3c can therefore
be accepted. No effect on trust was found for the use of emo-
tional words. Therefore, H3a has to be rejected.
Overall results are discussed in the following.
4 Discussion
Previous literature has highlighted that the use of emotional
cues in communication promotes positive perception and
behavior towards technological devices and robots [10,12,
15,73]. With two consecutive studies, we aimed to inves-
tigate whether affective communication induced via verbal
and prosodic features positively influences the attribution of
experience, but not the attribution of agency with regard to
mind perception.
4.1 Affective Influence of Robot Speech
The results of our two studies showed that especially the
use of emotional words increased the attribution of experi-
ence to robots. This aligns with prior research highlighting
the importance of emotional content in HRI, as it can enable
robots to be perceived as more human-like and socially capa-
ble [5,33]. Adding emotional words to the content of the
robotic speech is a relatively small and easy adjustment in the
modification of robots. In the future, the experience dimen-
sion of robots can thus be influenced with relatively simple
means, and positive attitude and behavior towards robots can
be promoted [10,43].
Previous research has already shown that different speech
styles of robots can lead to different evaluative outcomes
[40,41]. With the modification of the speech style in our
study, no improvement could be achieved on the experience
dimension. Although the stimulus quality check indicated
that a more personal speech style was indeed perceived as
such, this effect was significantly less pronounced compared
to the impact of emotional words and expressive speech. This
lack of effect suggests that while personalization can be a
valuable tool in certain communicative contexts, it may not
be as critical as emotional content and prosody when it comes
to enhancing experience attributions. Given that speech styles
can be implemented in a variety of ways, further investigation
is needed.
In addition to emotional words, our studies demonstrate
that using expressive speech positively affected the attribu-
tion of experience. Prosody, which encompasses variations
in intonation, pitch, and rhythm, plays a crucial role in con-
veying emotions and affective states, making robots appear
more socially competent and empathetic [33,35]. Unlike the
use of emotional words, modifying prosodic features is much
more difficult to achieve, but offers equal advantages, espe-
cially in situations where the content is fixed and only the
vocalization can be changed.
Compared to previous research [43], our study pro-
vides a differentiated comparison of the verbal and prosodic
components of affective speech, offering a more nuanced
understanding of how each contributes to mind perception.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 329
The incorporation of our findings into the experience-
agency framework developed by Gray et al. [6] highlights
the distinctive contributions of our study. Consistent with
previous research (e.g., [6,10]), our results indicate that
robots are generally perceived as possessing a medium level
of agency. However, our study shows notably higher ratings
on the experience dimension than those typically observed
in prior studies. Beyond the use of emotional words and
expressive speech, the mere ability to speak—a key anthro-
pomorphic trait—elevates perceptions of both agency and
experience. This aligns with the idea that human-like fea-
tures can enhance the perceived mental capacities of robots.
Moreover, the content of our robots’ communication likely
contributed to the increased experience ratings. While our
robots are still perceived as having lower experience than
humans or pets, the enhancement in the experience dimen-
sion underscores the potential for affective communication
design to improve HRI.
4.2 Further Effects of Affective Speech
In study II, we confirmed a significant main effect of emo-
tionality and additionally found a significant main effect of
prosody on social presence ratings. The results of both online
studies suggest that emotional content and expressive speech
are crucial in shaping perceptions of robots as socially present
entities. These findings are consistent with the results of pre-
vious studies (e.g., [33]), which also identified the importance
of these factors in enhancing the human-likeness of robotic
interactions.
Interestingly, we also observed an interaction between
prosody and robot type, indicating that the influence of
expressive speech on social presence was more pronounced
in the technical robot group. This finding suggests that while
expressive speech generally enhances social presence, it may
be particularly effective in scenarios where the robot is per-
ceived as more technical or less inherently human-like. In
line with the results for experience and agency, using a more
personal speech style did not influence the perception of
human-like qualities in robots.
Our concern that a more human-like voice for robots leads
to the uncanny valley effect was not confirmed, as we did not
find any difference in eerie ratings between the prosody con-
ditions in study I. In contrast to the first study, in study II
we found differences in the perceived eeriness of the two
prosody conditions, but opposite to the uncanny valley pre-
diction. The monotonic speech was perceived as eerier. This
is consistent with other studies showing that artificial voices
do not elicit creepiness when becoming more human-like
[36].
With regard to trust, no differences were found based
solely on the use of emotional words. Contrary to study I, the
second study found increased trust in robots using expressive
speech. This suggests that expressive prosody may enhance
the robot’s perceived trustworthiness more than the specific
words used.
Hypothetical studies, such as the ones we have presented,
can provide valuable insights into the dynamics of trust, but
they also have their limitations. Participants only imagine
how they might react in different situations. Although these
projections can provide insight into potential behaviors, trust
only becomes behaviorally relevant when people enter into
real interactions with robots. To fully understand the role of
trust in HRI, it is therefore essential to investigate it in real
interactions.
In addition, it could be interesting to assess trust more
specifically for cognition- and affect-based components [41],
as these might be related to the proposed differentiation of
agency and experience in mind perception.
4.3 The Influence of Robot Type
The objective of our studies was to identify speech charac-
teristics that increase the perceived affect of robotic speech.
Compared to previous studies (e.g., [27]), our first study
specifically focused on the affective speech features with-
out providing any additional cues with regard to the robot’s
appearance or other features. This approach was necessary to
prevent possible confounding effects that might blur results
and to get a more general idea of which speech characteristics
are of relevance.
By not providing specific pictures of the robot, it was
however not clear what people had in mind when they
were listening to the audio files. To control for the impact
of appearance and to standardize what people imagined
when listening to robot speech, we additionally manipulated
the robots’ appearance in study II. As it was more of an
exploratory hypothesis, we limited the robot’s appearance to
the most extreme ones—technical and anthropomorphic. Of
course, there is an even wider range at the level of anthro-
pomorphism, which includes humanoid robots or even other
variants such as zoomorphic robots.
The instructions of study II stated that the appearance
of the robots should not be considered when evaluating the
audios, and at least for the perceived attribution of a mind, the
appearance of the robot had neither a positive nor a negative
influence. Nevertheless, with this first approach, it was pos-
sible to investigate whether the matching hypothesis applies
[46,52], which in our case showed that a human-like voice
is even more beneficial for a technical appearance at least for
social presence ratings.
While a human-like or technical appearance was not found
to influence affective robot speech ratings, the presented
robot type seems to confound with the prosody factor. The
use of expressive speech increased the social presence ratings
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
330 International Journal of Social Robotics (2025) 17:315–333
slightly when presenting a human-like robot, but strongly
when presenting a technical robot.
This is in contrast to the findings of Hosseini et al. [44]who
found that the use of emotional words yielded better results in
combination with a human-like robot instead of a technical
robot. The discrepancy between our findings and those of
Hosseini et al. [44] might be explained by the expectation
people create when seeing a specific robot appearance. When
presenting a technical robot, an expressive voice is rather
less expected. Due to different expectations, the impression
of perceived social presence might be even stronger and thus
more positively influenced.
This phenomenon can also be interpreted through the lens
of contextual appropriateness, as discussed by Schreibel-
mayr and Mara [49], who noted that context influences the
acceptability of certain voice and appearance pairings. As
the context of our study relates to the service sector, a tech-
nical appearance may be more acceptable, whereas, in other
domains the results may differ [74]. The influence of dif-
ferent application contexts should be investigated in further
studies.
4.4 Limitations and Future Work
It should be noted that we created our audio files with TTS
software. In many studies, a professional voice artist is used
to manipulate prosody [27,32,35]. Our prosody condition
might therefore not be comparable to the prosody conditions
of studies with natural spoken language. But since encoun-
ters with speech synthesis have become more frequent in our
daily lives, and speech synthesis is used in real robotic appli-
cations, we wanted to evaluate the currently existing TTS
programs [23]. In addition, most TTS software is already
fairly advanced so it was difficult to use TTS programs that
do not have expressive speech features in default. In the
program Panopreter, which we used, the emphasis on punc-
tuation marks is changed automatically, which is also part
of prosody. Nevertheless, the program still created a very
monotonic voice. It should be noted, of course, that we used
two different TTS programs. We have chosen male voices for
both, but they are not exactly identical. Optimally, the manip-
ulation should be done with the same program. Overall, our
study was able to show the benefits of expressive speech in
the case of TTS-produced voices. The results indicate that
speech synthesis programs are already sufficiently advanced
so that the integration of emotions into robotic speech no
longer requires human speech recordings.
In our studies, we examined the influence of affective
verbal and prosodic communication on mind perception in
robots, focusing specifically on English. Languages vary
significantly in how they convey emotions. For instance,
languages like Japanese have emotion words that capture
complex emotional states not easily translatable into English.
Some languages, like Mandarin, are tonal and use pitch varia-
tions to distinguish lexical or grammatical meaning, whereas
others, like English, use prosody more for emotional empha-
sis. Given these linguistic factors generalizing these findings
remains unclear and should be explored in future research.
We utilized Japanese names in our studies that were
intentionally misspelled to create unfamiliar names without
specific associations. In English contexts, names ending in "i"
are typically considered gender-neutral. We realized that in
the Japanese context, such names can be perceived as gender-
specific. We recognize that name selection is a critical factor
in ensuring culturally sensitive research. While all our partic-
ipants were native English speakers, and it is unlikely that the
unintended gender connotations of the names influenced the
results, it is important to address these issues in future studies.
To mitigate this, we recommend incorporating pre-testing of
names in future research to avoid unintended gender associ-
ations and ensure greater cultural sensitivity.
Although we were able to achieve an increase on the expe-
rience dimension, the fact that robots are still rated relatively
low on the ability to feel is not surprising. This is the reason
why the ratings bundle at the lower end of our scale and lead
to a positively skewed distribution. Although our finding for
an enhancing effect of affective communication points in a
clear direction, the violation of the normal distribution should
be considered and critically questioned. Especially for results
that have barely (not) become significant. Although we cal-
culated the sample size in advance for study II, our sample
might still be underpowered. Accordingly, future studies will
need to validate our results.
A general disadvantage of online studies is that we cannot
check how good the quality of the participants’ audio output
is and whether participants listened to the audios with con-
sistent concentration. Incorrect answers to content-related
questions can only be evaluated with caution, because in
addition to a lack of attention, misunderstandings or differ-
ences in memory ability may be reasons for mistakes. Results
should therefore be replicated in a laboratory setting.
In addition, online studies are not suitable for investigat-
ing actual behavior. We only measured how affective speech
influenced the attribution of different variables (e.g., experi-
ence). We believe, that this is an appropriate approach as
a meta-analysis by Roesler et al. [75] has shown that is
sufficient and ecologically valid to conduct studies that do
not involve an actual embodied HRI when the key variables
are people’s perception and attributions. Nevertheless, these
variables are only mediators that are supposed to influence
interactions between human and robots, i.e. actual behavior.
For example, it is assumed that higher experience attributions
lead to more forgiveness [10,15]. This step from attribution
to behavior still needs to be shown in follow-up studies incor-
porating actual HRI setups.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 331
5 Conclusion
People sympathize more with robots that are capable of
expressing emotions through their speech [10,33]. It is
therefore relevant to identify factors that positively influence
the perception of robotic speech. With our two consecutive
online studies, we identified affective factors with an impact
on mind perception. More specifically, we found that the use
of emotional words and the implementation of expressive
prosodic features have enhancing effects on the perceived
experience dimension of robotic speech. The use of affective
communication seems to be a promising tool to influence
mind attribution to robots. Our second study with a greater
sample size indicated that technical robots benefit even more
from affective speech characteristics than human-like robots.
All in all, we could provide valuable insights into which
facets of affective communication have an impact on the attri-
bution of feelings to robots and how synthesized speech can
be refined.
Acknowledgements The authors thank Vincent Pilz for his support in
creating the audio files.
Author Contributions Conceptualization: KK; Methodology: KK, KS;
Formal analysis and investigation: KK; Writing—original draft prepa-
ration: KK; Writing—review and editing: LO, KS, KK; Supervision:
LO.
Funding Open Access funding enabled and organized by Projekt
DEAL. The author received no financial support for the research, author-
ship, and/or publication of this article.
Data Availability The datasets generated and analyzed during the stud-
ies are available at the Open Science Framework repository, https://osf.
io/2uh6t/?view_only=4b385f68429e4ca58e0a34b7c06e7de8.
Declarations
Conflict of interests The author declared no potential conflicts of inter-
est with respect to the research, authorship, and/or publication of this
article.
Informed Consent Informed consent was obtained from all participants
in the study.
Ethical Approval This study was performed in line with the principles
of the Declaration of Helsinki. Approval was granted by the Ethics
Committee of the Humboldt-Universität zu Berlin (No. 2022–06).
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you
give appropriate credit to the original author(s) and the source, pro-
vide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
1. Duffy BR (2003) Anthropomorphism and the social robot. Robot
Auton Syst 42:177–190. https://doi.org/10.1016/S0921-8890(02
)00374-3
2. Belpaeme T, Kennedy J, Ramachandran A, Scassellati B, Tanaka F
(2018) Social robots for education: a review. Sci Robot 3:eaat5954.
https://doi.org/10.1126/scirobotics.aat5954
3. Garcia-Haro JM, Oña ED, Hernandez-Vicen J, Martinez S, Bala-
guer C (2021) Service robots in catering applications: a review and
future challenges. Electronics 10:47. https://doi.org/10.3390/electr
onics10010047
4. Broekens J, Heerink M, Rosendal H (2009) Assistive social robots
in elderly care: a review. Gerontechnology 8:94–103. https://doi.
org/10.4017/gt.2009.08.02.002.00
5. Gray K, Wegner DM (2012) Feeling robots and human zombies:
Mind perception and the uncanny valley. Cognition 125:125–130.
https://doi.org/10.1016/j.cognition.2012.06.007
6. Gray HM, Gray K, Wegner DM (2007) Dimensions of mind per-
ception. Science 315(5812):619–619. https://doi.org/10.1126/scie
nce.1134475
7. Zajonc RB (1980) Feeling and thinking: Preferences need no infer-
ences. Am Psychol 35:151–175. https://doi.org/10.1037/0003-
066X.35.2.151
8. Lazarus RS (1984) On the primacy of cognition. Am Psychol
39:124–129. https://doi.org/10.1037/0003-066X.39.2.124
9. Kahneman D, Egan P (2011) Thinking, fast and slow: Farrar, 1st
ed. Straus and Giroux
10. Yam KC, Bigman YE, TangPM, Ilies R, De Cremer D, Soh H, Gray
K (2021) Robots at work: people prefer—and forgive—service
robots with perceived feelings. J Appl Psychol 106:1557–1572.
https://doi.org/10.1037/apl0000834
11. Waytz A, Heafner J, Epley N (2014) The mind in the machine:
Anthropomorphism increases trust in an autonomous vehicle. J
Exp Soc Psychol 52:113–117. https://doi.org/10.1016/j.jesp.2014.
01.005
12. Russell JA (2003) Core affectand the psychological construction of
emotion. Psychol Rev 110:145–172. https://doi.org/10.1037/0033-
295X.110.1.145
13. Gray K, Wegner DM (2009) Moral typecasting: Divergent per-
ceptions of moral agents and moral patients. J Pers Soc Psychol
96:505–520. https://doi.org/10.1037/a0013748
14. Hoffman ML (1996) Empathy and Moral Development. Annu Rep
Educ Psychol Jpn 35:157–162. https://doi.org/10.5926/arepj1962.
35.0_157
15. Darling K (2015) “Who’s Johnny?” anthropomorphic framing in
human-robot interaction, integration, and policy. SSRN Electron J.
https://doi.org/10.2139/ssrn.2588669
16. Onnasch L, Roesler E (2021) A taxonomy to structure and analyze
human-robot interaction. Int J Soc Robot 13:833–849. https://doi.
org/10.1007/s12369-020-00666-5
17. Waytz A, Cacioppo J, Epley N (2010) Who sees human?: The
stability and importance of individual differences in anthropomor-
phism. Perspect Psychol Sci 5:219–232. https://doi.org/10.1177/
1745691610369336
18. Langer EJ (1992) Matters of mind: mindfulness/mindlessness in
perspective. Conscious Cogn 1:289–305. https://doi.org/10.1016/
1053-8100(92)90066-J
19. Nass C, Moon Y (2000) Machines and mindlessness: social
responses to computers. J Soc Issues 56:81–103. https://doi.org/
10.1111/0022-4537.00153
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
332 International Journal of Social Robotics (2025) 17:315–333
20. Bartneck C, Verbunt M, Mubin O, Al Mahmud A (2007) To kill a
mockingbird robot. In: Proceedings of the ACM/IEEE international
conference on Human-robot interaction. ACM, Arlington Virginia
USA, pp 81–87
21. Keijsers M, Bartneck C (2018) Mindless Robots get Bullied. In:
Proceedings of the 2018 ACM/IEEE International Conference on
Human-Robot Interaction. ACM, Chicago IL USA, pp 205–214
22. Rosenthal-von der Pütten AM, Krämer NC, Hoffmann L, Sobieraj
S, Eimler SC (2013) An Experimental study on emotional reactions
towards a robot. Int J Soc Robot 5:17–34. https://doi.org/10.1007/
s12369-012-0173-8
23. Cohn M, Predeck K, Sarian M, Zellou G (2021) Prosodic align-
ment toward emotionally expressive speech: comparing human and
Alexa model talkers. Speech Commun 135:66–75. https://doi.org/
10.1016/j.specom.2021.10.003
24. Taylor PA (2009) Text-to-speech synthesis. Cambridge University
Press, Cambridge, UK, New York
25. Barrett LF, Satpute AB (2013) Large-scale brain networks in affec-
tive and social neuroscience: towards an integrative functional
architecture of the brain. Curr Opin Neurobiol 23:361–372. https://
doi.org/10.1016/j.conb.2012.12.012
26. Wang Y, Song W, Tao W, Liotta A, Yang D, Li X, Gao S, Sun Y,
Ge W, Zhang W, Zhang W (2022) A systematic review on affective
computing: emotion models, databases, and recent advances. Inf
Fus 83:19–52. https://doi.org/10.1016/j.inffus.2022.03.009
27. James J, Balamurali BT, Watson CI, MacDonald B (2021) Empa-
thetic speech synthesis and testing for healthcare robots. Int J
Soc Robot 13:2119–2137. https://doi.org/10.1007/s12369-020-00
691-4
28. Boyd RL, Ashwini Ashokkumar, Seraj S, Pennebaker JW (2022)
The Development and Psychometric Properties of LIWC-22.
https://doi.org/10.13140/RG.2.2.23890.43205
29. Scherer K (2003) Vocal communication of emotion: a review of
research paradigms. Speech Commun 40:227–256. https://doi.org/
10.1016/S0167-6393(02)00084-5
30. Van Kleef GA, De Dreu CKW, Manstead ASR (2010) An Inter-
personal Approach to Emotion in Social Decision Making. In:
Advances in Experimental Social Psychology. Elsevier, pp 45–96
31. Crumpton J, Bethel CL (2016) A survey of using vocal prosody
to convey emotion in robot speech. Int J Soc Robot 8:271–285.
https://doi.org/10.1007/s12369-015-0329-4
32. Nass C, Jonsson I-M, Harris H, Reaves B, Endo J, Brave S,
Takayama L (2005) Improving automotive safety by pairing driver
emotion and car voice emotion. CHI ’05 Extended Abstracts on
Human Factors in Computing Systems. ACM, USA, pp 1973–1976
33. Niculescu A, van Dijk B, Nijholt A, Li H, See SL (2013) Making
social robots more attractive: the effects of voice pitch, humor and
empathy. Int J Soc Robot 5:171–191. https://doi.org/10.1007/s1
2369-012-0171-x
34. Lee J (2021) Generating robotic speech prosody for human robot
interaction: a preliminary Study. Appl Sci 11:3468. https://doi.org/
10.3390/app11083468
35. James J, Watson CI, MacDonald B (2018) Artificial empathy in
social robots: an analysis of emotions in speech. 2018 27th IEEE
International Symposium on Robot and Human Interactive Com-
munication (RO-MAN). IEEE, Nanjing, pp 632–637
36. Kühne K, Fischer MH, Zhou Y (2020) The human takes it all:
humanlike synthesized voices are perceived as less eerie and more
likable. Evidence from a subjective ratings study. Front Neuro-
robotics 14:593732. https://doi.org/10.3389/fnbot.2020.593732
37. Roy R, Naidoo V (2021) Enhancing chatbot effectiveness: the role
of anthropomorphic conversational styles and time orientation. J
Bus Res 126:23–34. https://doi.org/10.1016/j.jbusres.2020.12.051
38. Yun J, Park J (2022) The effects of chatbot service recovery with
emotion words on customer satisfaction, repurchase intention, and
positive word-of-mouth. Front Psychol 13:922503. https://doi.org/
10.3389/fpsyg.2022.922503
39. Moors A, De Houwer J, Hermans D, Wanmaker S, van Schie K, Van
Harmelen A-L, De Schryver M, De Winne J, Brysbaert M (2013)
Norms of valence, arousal, dominance, and age of acquisition for
4,300 Dutch words. Behav Res Methods 45:169–177. https://doi.
org/10.3758/s13428-012-0243-8
40. Kim Y, Kwak SS, Kim M (2013) Am I acceptable to you? Effect
of a robot’s verbal language forms on people’s social distance
from robots. Comput Hum Behav 29:1091–1101. https://doi.org/
10.1016/j.chb.2012.10.001
41. Hoffmann L, Derksen M, Kopp S (2020) What a pity, pepper!:
How warmth in robots’ language impacts reactions to errors during
a collaborative task. Companion of the 2020 ACM/IEEE Interna-
tional Conference on Human-Robot Interaction. ACM, Cambridge
United Kingdom, pp 245–247
42. Leyzberg D, Avrunin E, Liu J, Scassellati B (2011) Robots that
express emotion elicit better human teaching. In: Proceedings of the
6th international conference on Human-robot interaction. ACM,
Lausanne Switzerland, pp 347–354
43. Bigman YE, Gray K (2018) People are averse to machines making
moral decisions. Cognition 181:21–34. https://doi.org/10.1016/j.
cognition.2018.08.003
44. Hosseini SMF, Lettinga D, Vasey E, Zheng Z, Jeon M, Park CH,
Howard AM (2017) Both “look and feel” matter: Essential fac-
tors for robotic companionship. 2017 26th IEEE International
Symposium on Robot and Human Interactive Communication (RO-
MAN). IEEE, Lisbon, pp 150–155
45. Klüber K, Onnasch L (2022) Appearance is not everything - Pre-
ferred feature combinations for care robots. Comput Hum Behav
128:107128. https://doi.org/10.1016/j.chb.2021.107128
46. Sarigul B, Urgen BA (2023) Audio-visual predictive processing in
the perception of humans and robots. Int J Soc Robot 15:855–865.
https://doi.org/10.1007/s12369-023-00990-6
47. Mara M, Schreibelmayr S, Berger F (2020) Hearing a nose?: User
expectations of robot appearance induced by different robot voices.
Companion of the 2020 ACM/IEEE International Conference on
Human-Robot Interaction. ACM, Cambridge United Kingdom, pp
355–356
48. McGinn C, Torre I (2019) Can you tell the robot by the voice? An
exploratory study on the role of voice in the perception of robots.
2019 14th ACM/IEEE International Conference on Human-Robot
Interaction (HRI). IEEE, Daegu, Korea (South), pp 211–221
49. Schreibelmayr S, Mara M (2022) Robot voices in daily life: Vocal
human-likeness and application context as determinants of user
acceptance. Front Psychol 13:787499. https://doi.org/10.3389/fp
syg.2022.787499
50. Jirak D, Aoki M, Yanagi T, Takamatsu A, Bouet S, Yamamura T,
Sandini G, Rea F (2022) Is it me or the robot? A critical evaluation
of human affective state recognition in a cognitive task. Front Neu-
rorobotics 16:882483. https://doi.org/10.3389/fnbot.2022.882483
51. Mori M (1970) Bukimi no tani [the uncanny valley]. Energy
7:33–35
52. Torre I, Goslin J, White L, Zanatto D (2018) Trust in artificial
voices: A “congruency effect” of first impressions and behavioural
experience. In: Proceedings of the Technology, Mind, and Society.
ACM, Washington DC USA, pp 1–6
53. Kühne R, Peter J (2023) Anthropomorphism in human–robot inter-
actions: a multidimensional conceptualization. Commun Theory
33:42–52. https://doi.org/10.1093/ct/qtac020
54. Wirtz J, Patterson PG, Kunz WH, Gruber T, Lu VN, Paluch S,
Martins A (2018) Brave new world: service robots in the frontline. J
Serv Manag 29:907–931. https://doi.org/10.1108/JOSM-04-2018-
0119
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
International Journal of Social Robotics (2025) 17:315–333 333
55. Biocca F (1997) The cyborg’s dilemma: progressive embodiment
in virtual environments. J Comput-Mediat Commun. https://doi.
org/10.1111/j.1083-6101.1997.tb00070.x
56. Heerink M, Kröse B, Evers V, Wielinga B (2010) Relating con-
versational expressiveness to social presence and acceptance of
an assistive social robot. Virtual Real 14:77–84. https://doi.org/10.
1007/s10055-009-0142-1
57. Higgins D, Zibrek K, Cabral J, Egan D, McDonnell R (2022) Sym-
pathy for the digital: Influence of synthetic voice on affinity, social
presence and empathy for photorealistic virtual humans. Comput
Graph 104:116–128. https://doi.org/10.1016/j.cag.2022.03.009
58. Liao M, Luo X, Yang H, Zhu K (2024) The interactive effects
of pedagogical agent role and voice emotion design on children’s
learning. Curr Psychol 43:29170–29188. https://doi.org/10.1007/
s12144-024-06559-4
59. Hancock PA, Billings DR, Schaefer KE, Chen JYC, de Visser EJ,
Parasuraman R (2011) A meta-analysis of factors affecting trust in
human-robot interaction. Hum Factors J Hum Factors Ergon Soc
53:517–527. https://doi.org/10.1177/0018720811417254
60. Hancock PA, Kessler TT, Kaplan AD, Brill JC, Szalma JL (2021)
Evolving trust in robots: specification through sequential and com-
parative meta-analyses. Hum Factors J Hum Factors Ergon Soc
63:1196–1229. https://doi.org/10.1177/0018720820922080
61. Mitchell WJ, Szerszen KA, Lu AS, Schermerhorn PW, Scheutz M,
MacDorman KF (2011) A mismatch in the human realism of face
and voice produces an uncanny valley. i-Percept 2:10–12. https://
doi.org/10.1068/i0415
62. Faul F, Erdfelder E, Lang A-G, Buchner A (2007) G*Power 3: A
flexible statistical power analysis program for the social, behav-
ioral, and biomedical sciences. Behav Res Methods 39:175–191.
https://doi.org/10.3758/BF03193146
63. Kalikow DN, Stevens KN, Elliott LL (1977) Development of a
test of speech intelligibility in noise using sentence materials with
controlled word predictability. J Acoust Soc Am 61:1337–1351.
https://doi.org/10.1121/1.381436
64. Qiu L, Benbasat I (2009) Evaluating anthropomorphic product rec-
ommendation agents: A social relationship perspective to designing
information systems. J Manag Inf Syst 25:145–182
65. Heerink M, Krose B, Evers V, Wielinga B (2009) Measuring
acceptance of an assistive social robot: a suggested toolkit. RO-
MAN 2009 - The 18th IEEE International Symposium on Robot
and Human Interactive Communication. IEEE, Toyama, Japan, pp
528–533
66. Syrdal DS, Dautenhahn K, Koay KL, Walters ML (2009) The
Negative Attitudes towards Robots Scale and Reactions to Robot
Behaviour in a Live Human-Robot Interaction Study. Adapt Emer-
gent Behav Complex Syst 7
67. Ho C-C, MacDorman KF (2010) Revisiting the uncanny valley
theory: Developing and validating an alternative to the Godspeed
indices. Comput Hum Behav 26:1508–1518. https://doi.org/10.
1016/j.chb.2010.05.015
68. Quertemont E (2011) How to Statistically Show the Absence of an
Effect. Psychol Belg 51:109. https://doi.org/10.5334/pb-51-2-109
69. Rogers JL, Howard KI, Vessey JT (1993) Using significance tests
to evaluate equivalence between two experimental groups. Psychol
Bull 113:553–565. https://doi.org/10.1037/0033-2909.113.3.553
70. Cohen J (1988) Statistical Power Analysis for the Behavioral Sci-
ences. Routledge
71. Lenhard W, Lenhard A (2017) Computation of Effect Sizes
72. Phillips E, Zhao X, Ullman D, Malle BF (2018) What is Human-
like?: Decomposing Robots’ Human-like Appearance Using the
Anthropomorphic roBOT (ABOT) Database. In: Proceedings of
the 2018 ACM/IEEE International Conference on Human-Robot
Interaction. ACM, Chicago IL USA, pp 105–113
73. Breazeal C (2003) Emotion and sociable humanoid robots. Int J
Hum-Comput Stud 59:119–155. https://doi.org/10.1016/S1071-58
19(03)00018-1
74. Roesler E, Naendrup-Poell L, Manzey D, Onnasch L (2022) Why
context matters: the influence of application domain on preferred
degree of anthropomorphism and gender attribution in human-
robot interaction. Int J Soc Robot 14:1155–1166. https://doi.org/
10.1007/s12369-021-00860-z
75. Roesler E, Manzey D, Onnasch L (2021) A meta-analysis on the
effectiveness of anthropomorphism in human-robot interaction. Sci
Robot 6:eabj5425. https://doi.org/10.1126/scirobotics.abj5425
Publisher’s Note Springer Nature remains neutral with regard to juris-
dictional claims in published maps and institutional affiliations.
Kim Klüber is a PhD candidate and research assistant in the Depart-
ment of Engineering Psychology at Humboldt-Universität zu Berlin
and received her master´s degree in Sensors and Cognitive Psychology
at the Technische Universität Chemnitz. Her research interests include
affective computing in human-robot interaction, with a focus on ser-
vice and healthcare robotics.
Katharina Schwaiger received a Master’s degree in Psychology from
Humboldt-Universität zu Berlin with a focus on Clinical and Engi-
neering Psychology. She is completing postgraduate training in Cog-
nitive Behavioral Therapy and teaching courses on Patient-Physician
Communication at Medical School Berlin.
Linda Onnasch is professor of psychology of action and automation
at the Technische Universität Berlin. Her research focuses on human
interaction with automated systems and collaborative robots consider-
ing system characteristics, psychological mediators and context fac-
tors. Linda did her PhD on effects of reliability and function allo-
cation in human-automation interaction at the Technische Universität
Berlin, worked as a Human Factors specialist at HFC HumanFactors-
Consult and was an assistant professor of engineering psychology at
Humboldt-Universität zu Berlin.
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Article
The use of emotional words and expressive voices in robots alters the attribution of agency and experience by humans.
Article
Full-text available
Students increasingly carry out learning activities in online learning spaces, but they suffer from the problem of insufficient teacher guidance and emotional support. Pedagogical agents are able to communicate with learners in different roles and interactions which to a certain extent can make up for the shortcomings of online learning spaces. To further enhance the teaching effectiveness of pedagogical agents, this study explored the effects of pedagogical agents’ role type and voice emotional design on learners’ learning outcomes (knowledge retention / transfer) and learning process (learning interest / academic emotion / cognitive load / intimacy / social presence / learning experience). Our study tested the interaction effects of two pedagogical agents’ images (teacher role/peer role) and two pedagogical agents’ voices (traditional synthesized voices/emotional synthesized voices) using a sample of 206 s grade students. The results show that the emotional design of voice does not have an obvious impact in relieving students’ emotional stress and improving their learning process and outcomes, however the interaction effect of peer role and voice emotional design does have clear advantages. The emotional design of voice may increase learners’ extrinsic cognitive load, therefore this design should try to minimize the extrinsic cognitive load, pay attention to guiding learners’ intrinsic cognitive load, and promote the related cognitive load. Meanwhile, in the design process of pedagogical agents, we should work to enhance intimacy and learning experience, rendering pleasurable emotions which will lessen learners’ anxiety and boredom through the emotional design of role types and voices. To conclude, the interaction effect of character and emotional voice has a practical application prospect only when utilized with the appropriate role type and reasonable voice emotionalization design.
Article
Full-text available
Replies to an article by R. B. Zajonc in which Zajonc differed greatly from the present author in his conceptualization of emotion and its relations with cognition, as well as in his evaluation of the evidence. The boundaries of emotion as a phenomenon and whether sensory preferences can be regarded as emotions are discussed, and the evidence Zajonc regards as supporting his claims for the independence of cognition and emotion and the primacy of emotion are analyzed. Finally, the indeterminancy of the issue of cognitive vs emotional primacy is emphasized. (30 ref)
Article
Full-text available
Recent work in cognitive science suggests that our expectations affect visual perception. With the rise of artificial agents in human life in the last few decades, one important question is whether our expectations about non-human agents such as humanoid robots affect how we perceive them. In the present study, we addressed this question in an audio–visual context. Participants reported whether a voice embedded in a noise belonged to a human or a robot. Prior to this judgment, they were presented with a human or a robot image that served as a cue and allowed them to form an expectation about the category of the voice that would follow. This cue was either congruent or incongruent with the category of the voice. Our results show that participants were faster and more accurate when the auditory target was preceded by a congruent cue than an incongruent cue. This was true regardless of the human-likeness of the robot. Overall, these results suggest that our expectations affect how we perceive non-human agents and shed light on future work in robot design.
Article
Full-text available
With robots increasingly assuming social roles (e.g., assistants, companions), anthropomorphism (i.e., the cognition that an entity possesses human characteristics) plays a prominent role in human–robot interactions (HRI). However, current conceptualizations of anthropomorphism in HRI have not adequately distinguished between precursors, consequences, and dimensions of anthropomorphism. Building and elaborating on previous research, we conceptualize anthropomorphism as a form of human cognition, which centers upon the attribution of human mental capacities to a robot. Accordingly, perceptions related to a robot’s shape and movement are potential precursors of anthropomorphism, while attributions of personality and moral value to a robot are potential consequences of anthropomorphism. Arguing that multidimensional conceptualizations best reflect the conceptual facets of anthropomorphism, we propose, based on Wellman’s (1990) Theory-of-Mind (ToM) framework, that anthropomorphism in HRI consists of attributing thinking, feeling, perceiving, desiring, and choosing to a robot. We conclude by discussing applications of our conceptualization in HRI research.
Article
Full-text available
A key goal in human-robot interaction (HRI) is to design scenarios between humanoid robots and humans such that the interaction is perceived as collaborative and natural, yet safe and comfortable for the human. Human skills like verbal and non-verbal communication are essential elements as humans tend to attribute social behaviors to robots. However, aspects like the uncanny valley and different technical affinity levels can impede the success of HRI scenarios, which has consequences on the establishment of long-term interaction qualities like trust and rapport. In the present study, we investigate the impact of a humanoid robot on human emotional responses during the performance of a cognitively demanding task. We set up three different conditions for the robot with increasing levels of social cue expressions in a between-group study design. For the analysis of emotions, we consider the eye gaze behavior, arousal-valence for affective states, and the detection of action units. Our analysis reveals that the participants display a high tendency toward positive emotions in presence of a robot with clear social skills compared to other conditions, where we show how emotions occur only at task onset. Our study also shows how different expression levels influence the analysis of the robots' role in HRI. Finally, we critically discuss the current trend of automatized emotion or affective state recognition in HRI and demonstrate issues that have direct consequences on the interpretation and, therefore, claims about human emotions in HRI studies.
Article
Full-text available
This study sought to examine the effect of the quality of chatbot services on customer satisfaction, repurchase intention, and positive word-of-mouth by comparing two groups, namely chatbots with and without emotion words. An online survey was conducted for 2 weeks in May 2021. A total of 380 responses were collected and analyzed using structural equation modeling to test the hypothesis. The theoretical basis of the study was the SERVQUAL theory, which is widely used in measuring and managing service quality in various industries. The results showed that the assurance and reliability of chatbots positively impact customer satisfaction for both groups. However, empathy and interactivity positively affect customer satisfaction only for chatbots with emotion words. Responsiveness did not have an impact on customer satisfaction for both groups. Customer satisfaction positively impacts repurchase intention and positive word-of-mouth for both groups. The findings of this study can serve as a priori research to empirically prove the effectiveness of chatbots with emotion words.
Article
Full-text available
The growing popularity of speech interfaces goes hand in hand with the creation of synthetic voices that sound ever more human. Previous research has been inconclusive about whether anthropomorphic design features of machines are more likely to be associated with positive user responses or, conversely, with uncanny experiences. To avoid detrimental effects of synthetic voice design, it is therefore crucial to explore what level of human realism human interactors prefer and whether their evaluations may vary across different domains of application. In a randomized laboratory experiment, 165 participants listened to one of five female-sounding robot voices, each with a different degree of human realism. We assessed how much participants anthropomorphized the voice (by subjective human-likeness ratings, a name-giving task and an imagination task), how pleasant and how eerie they found it, and to what extent they would accept its use in various domains. Additionally, participants completed Big Five personality measures and a tolerance of ambiguity scale. Our results indicate a positive relationship between human-likeness and user acceptance, with the most realistic sounding voice scoring highest in pleasantness and lowest in eeriness. Participants were also more likely to assign real human names to the voice (e.g., “Julia” instead of “T380”) if it sounded more realistic. In terms of application context, participants overall indicated lower acceptance of the use of speech interfaces in social domains (care, companionship) than in others (e.g., information & navigation), though the most human-like voice was rated significantly more acceptable in social applications than the remaining four. While most personality factors did not prove influential, openness to experience was found to moderate the relationship between voice type and user acceptance such that individuals with higher openness scores rated the most human-like voice even more positively. Study results are discussed in the light of the presented theory and in relation to open research questions in the field of synthetic voice design.
Technical Report
Full-text available
The words that people use in everyday life tell us about their psychological states: their beliefs, emotions, thinking habits, lived experiences, social relationships, and personalities. From the time of Freud’s writings about “slips of the tongue” to the early days of computer-based text analysis, researchers across the social sciences have amassed an extensive body of evidence showing that people’s words have tremendous psychological value. To appreciate some of the truly great pioneers, check out (Allport, 1942), Gottschalk and Gleser (1969), Stone et al., (1966), and Weintraub (1989). Although promising, the early computer methods floundered because of the sheer complexity of the task. In order to provide a better method for studying verbal and written speech samples, we originally developed a text analysis application called Linguistic Inquiry and Word Count, or LIWC (pronounced “Luke”). The first LIWC application was developed as part of an exploratory study of language and disclosure (Francis & Pennebaker, 1992). The second (LIWC2001), third (LIWC2007), fourth (2015), and now fifth (LIWC-22) versions updated the original application with increasingly expanded dictionaries and sophisticated software design (Pennebaker et al., 2001, 2007, 2015). The most recent evolution, LIWC-22 (Pennebaker et al., 2022), has significantly altered both the dictionary and the software options to reflect new directions in text analysis. As with previous versions, the program is designed to analyze individual or multiple language files quickly and efficiently. At the same time, the program attempts to be transparent and flexible in its operation, allowing the user to explore word use in multiple ways.
Article
In this paper, we investigate the effect of a realism mismatch in the voice and appearance of a photorealistic virtual character in both immersive and screen-mediated virtual contexts. While many studies have investigated voice attributes for robots, not much is known about the effect voice naturalness has on the perception of realistic virtual characters. We conducted the first experiment in Virtual Reality (VR) with over two hundred participants investigating the mismatch between realistic appearance and unrealistic voice on the feeling of presence, and the emotional response of the user to the character expressing a strong negative emotion. We predicted that the mismatched voice would lower social presence and cause users to have a negative emotional reaction and feelings of discomfort towards the character. We found that the concern for the virtual character was indeed altered by the unnatural voice, though interestingly it did not affect social presence.The second experiment was conducted with a view towards heightening the appearance realism of the same character for the same scenarios, with an additional lower level of voice realism employed to strengthen the mismatch of perceptual cues. While voice type did not appear to impact reports of empathic responses towards the character, there was an observed effect of voice realism on reported social presence, which was not detected in the first study. There were also significant results on affinity and voice trait measurements that provide evidence in support of perceptual mismatch theories of the Uncanny Valley.
Article
Affective computing conjoins the research topics of emotion recognition and sentiment analysis, and can be realized with unimodal or multimodal data, consisting primarily of physical information (e.g., text, audio, and visual) and physiological signals (e.g., EEG and ECG). Physical-based affect recognition caters to more researchers due to the availability of multiple public databases, but it is challenging to reveal one's inner emotion hidden purposefully from facial expressions, audio tones, body gestures, etc. Physiological signals can generate more precise and reliable emotional results; yet, the difficulty in acquiring these signals hinders their practical application. Besides, by fusing physical information and physiological signals, useful features of emotional states can be obtained to enhance the performance of affective computing models. While existing reviews focus on one specific aspect of affective computing, we provide a systematical survey of important components: emotion models, databases, and recent advances. Firstly, we introduce two typical emotion models followed by five kinds of commonly used databases for affective computing. Next, we survey and taxonomize state-of-the-art unimodal affect recognition and multimodal affective analysis in terms of their detailed architectures and performances. Finally, we discuss some critical aspects of affective computing and its applications and conclude this review by pointing out some of the most promising future directions, such as the establishment of benchmark database and fusion strategies. The overarching goal of this systematic review is to help academic and industrial researchers understand the recent advances as well as new developments in this fast-paced, high-impact domain.