Content uploaded by Stefan Wermter
Author content
All content in this area was uploaded by Stefan Wermter on Feb 02, 2025
Content may be subject to copyright.
Influence of Robots’ Voice Naturalness on Trust and
Compliance
DENNIS BECKER,LUKAS BRAACH,LENNART CLASMEIER,TERESA KAUFMANN,
OSKAR ONG,KYRA AHRENS,CONNOR GÄDE, and ERIK STRAHL,Universität Hamburg,
Hamburg, Germany
DI FU,University of Surrey, Guildford, UK
STEFAN WERMTER,Universität Hamburg, Hamburg, Germany
With the increasing performance of text-to-speech systems and their generated voices indistinguishable from
natural human speech, the use of these systems for robots raises ethical and safety concerns. A robot with a
natural voice could increase trust, which might result in over-reliance despite evidence for robot unreliability.
To estimate the inuence of a robot’s voice on trust and compliance, we design a study that consists of two
experiments. In a pre-study (
𝑁1=60
) the most suitable natural and mechanical voice for the main study are
estimated and selected for the main study. Aerward, in the main study (
𝑁2=68
), the inuence of a robot’s
voice on trust and compliance is evaluated in a cooperative game of Baleship with a robot as an assistant.
During the experiment, the acceptance of the robot’s advice and response time are measured, which indicate
trust and compliance, respectively. e results show that participants expect robots to sound human-like and
that a robot with a natural voice is perceived as safer. Additionally, a natural voice can aect compliance.
Despite repeated incorrect advice, the participants are more likely to rely on the robot with the natural voice.
e results do not show a direct eect on trust. Natural voices provide increased intelligibility, and while they
can increase compliance with the robot, the results indicate that natural voices might not lead to over-reliance.
e results highlight the importance of incorporating voices into the design of social robots to improve
communication, avoid adverse eects, and increase acceptance and adoption in society.
CCS Concepts: • Human-centered computing →User studies;
Additional Key Words and Phrases: Additional Key Words and Phrases: Human-Robot Interaction, Trust and
Cooperation
e authors gratefully acknowledge support from the German Research Foundation DFG (CML, LeCAREbot), the European
Commission (TRAIL), and the Federal Ministry for Economic Aairs and Climate Action (BMWK) under the Federal
Aviation Research Programme (LuFO), Projekt VeriKAS.
Authors’ Contact Information: Dennis Becker (corresponding author), Universität Hamburg, Hamburg, Germany; e-mail:
dennis.becker-1@uni-hamburg.de; Lukas Braach, Universität Hamburg, Hamburg, Germany; e-mail: lukas.braach@
studium.uni-hamburg.de; Lennart Clasmeier, Universität Hamburg, Hamburg, Germany; e-mail: lennart.clasmeier@
studium.uni-hamburg.de; Teresa Kaufmann, Universität Hamburg, Hamburg, Germany; e-mail: teresa.kaufmann@studium.
uni-hamburg.de; Oskar Ong, Universität Hamburg, Hamburg, Germany; e-mail: oskar.ong@studium.uni-hamburg.de;
Kyra Ahrens, Universität Hamburg, Hamburg, Germany; e-mail: kyra.ahrens@uni-hamburg.de; Connor Gäde, Universität
Hamburg, Hamburg, Germany; e-mail: connor.gaede@uni-hamburg.de; Erik Strahl, Universität Hamburg, Hamburg,
Germany; e-mail: erik.strahl@uni-hamburg.de; Di Fu, University of Surrey, Guildford, UK; e-mail: d.fu@surrey.ac.uk; Stefan
Wermter, Universität Hamburg, Hamburg, Germany; e-mail: stefan.wermter@uni-hamburg.de.
This work is licensed under a Creative Commons Attribution International 4.0 License.
© 2025 Copyright held by the owner/author(s).
ACM 2573-9522/2025/1-ART29
https://doi.org/10.1145/3706066
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:2 D. Becker et al.
ACM Reference format:
Dennis Becker, Lukas Braach, Lennart Clasmeier, Teresa Kaufmann, Oskar Ong, Kyra Ahrens, Connor Gäde,
Erik Strahl, Di Fu, and Stefan Wermter. 2025. Inuence of Robots’ Voice Naturalness on Trust and Compliance.
ACM Trans. Hum.-Robot Interact. 14, 2, Article 29 (January 2025), 25 pages.
https://doi.org/10.1145/3706066
1 Introduction
Despite the majority of research in human-robot interaction emphasizing robot appearance [124],
behavior [9,21], and non-verbal communication [115], natural speech interaction is equally im-
portant [75,77]. Verbal interaction provides accurate and ecient communication and enables
human-robot cooperation with non-expert users in a social environment [70]. A voice transmits a
variety of non-linguistic information [100] and is a strong anthropomorphic cue [81,119]. Even a
simple conversation with a robot renders it more social and human-like [86], and increases the
perceived psychological closeness to the robot [30].
Accepting a robot as a partner in a cooperative task requires trust in the robot’s performance
and reliability [45]. Robots that are perceived as anthropomorphic are preferred as partners for a
cooperative task [35]. Specically, humanoid robots with their anthropomorphic appearance can
simultaneously facilitate interaction and increase expectations about their capabilities and social
skills [47]. However, consistent social interaction is required, and physical or behavioral inconsis-
tencies can render the robot unacceptable [117]. A robot’s voice aects these social interactions,
which creates challenges in assigning a voice that is suitable for the robot and the task [13].
ese voices are synthesized utilizing a Text-to-Speech (TTS) engine, however, nuances of
speech are oen lost during the synthesis [107]. is results in a more mechanical voice, and the
generated voice quality can inuence the perception of the robot [19,39]. With advancements in
deep learning, recent TTS systems can generate speech with characteristics rivaling natural speech
[109]. However, the use of a voice indistinguishable from natural speech raises privacy [71] and
ethical concerns [68] for robotics. Research in human-robot interaction suggests that a robot with
a natural-sounding voice positively inuences the perception of the robot [67] and can increase
trust and perceived competence [85]. Although trust is an essential element for human-robot
cooperation, over-reliance and over-trust can result in accidents [73]. Despite the potential for
unreliability or robot failure, people may place too much trust in the robot despite clear evidence
of robot failure [94]. Over-trust and over-reliance in the presence of robot failure can have severe
consequences [58].
With the increasing use and reliance on voice assistance systems and robotic applications, the
implication of a natural-sounding voice on trust in human-robot interaction has to be researched.
Specically, recent publications emphasize that robot voice design and associated anthropomor-
phism are a pressing research issue [6,99]. erefore, we propose the following research question:
How does a robot with a natural voice aect trust and compliance when it performs unreliably in
comparison to a robot with a mechanical-sounding voice? To estimate the eect of a robot’s voice
naturalness on trust and reliance, we design a study that consists of two parts. Since voices create
a mental image of the speaker [60] and a mismatch between the voice and the robot can create
mistrust [55], a pre-study is conducted to estimate the most suitable neutral and mechanical voice
for the experiment in the main study. In the main study, the participants play an adaptation of
the classic board game Baleship with a robot as their assistant. An illustration is provided in
Figure 1. Board games provide a social environment for interaction with the robot while restricting
the possible actions in the environment [92]. Additionally, board games provide an engaging sce-
nario, and information available to the participants can be restricted, which creates reliance on the
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:3
Fig. 1. Baleship game to measure the dierence in trust and compliance depending on the robot’s voice.
robot [122]. In the experiment, a participant and robot cooperate and the robot provides advice
to the participant for the next move that diers from the participant’s decision in the game. e
participant can either follow or reject the advice, which indicates trust in the robot. Further, the
response time is measured, which indicates compliance with the robot [46].
2 Related Work
A robot’s voice strongly aects a human’s perception, associated aributes, and perceived capa-
bilities [17]. Previous studies have reported a strong inuence of a robot’s accent [110], voice
gender [20], and voice naturalness [79] on the perception of a robot. Specically, a robot with a
local accent is aributed with more credibility [4] and perceived more positively [108]. Further, a
robot’s voice is associated with personality traits [78] and stereotypes, where a deeper male voice
suggests dominance and a female voice suggests a caring personality [28]. Additionally, people
assume that the robot possesses gender-specic knowledge [91]. e pitch of the voice inuences
the perceived interaction quality [84], and voice prosody can alter the perception of the robot [34]
and the willingness to cooperate [74]. A major aspect of a robot’s voice that changes its impression
is the voice’s naturalness [111]. Increasing the voice’s naturalness simultaneously increases the
perceived naturalness of the robot [116], and natural voices are overall preferred by participants
for human-robot interaction [61]. Research suggests that a natural-sounding voice increases the
anthropomorphism of the robot [99,104] and perceived approachability [118]. is increase in
anthropomorphism might inuence trust and could deceive to place trust in the robot beyond its
actual capacities [95] and foster an over-reliance [3].
Trust is an overarching concept that is essential for successful human-robot interaction [54,66].
However, the concept of trust is not uniquely dened in the context of human-robot interaction [14,
125]. A commonly agreed-upon denition of trust in robot automatization denes trust as the need
for reliance in situations of uncertainty and vulnerability [64]. Further, it has been characterized by
the expectation that the robots’ actions are well intended [44] and result in a benecial outcome [96].
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:4 D. Becker et al.
Initial trust in a robot exists before the rst interaction [50] and is inuenced by expectations
about the robot [23,73], media representations of robots [98], and the robot’s physical aributes
[106]. is initial trust is dynamically changing during the interaction [2,52], where a robot
demonstrating competence [12] and observable task performance [18] increases trust, whereas a
poor robot performance can reduce trust [93]. Specically, robot failure and unreliability reduce
trust [26], especially when the unreliability is observed during the early stages of the interaction in
contrast to malfunctions that are observed later during the interaction [25].
Compliance with a robot’s advice and trust are intertwined [80]. e degree to which people are
willing to comply is an indication of trust [16], and compliance with a robot’s advice is a direct
observation of trust [10]. However, trust in a robot’s recommendations depends on the perceived
task suitability [42,43]. Robots are preferred for tasks that require high analytical capabilities
and deductive reasoning utilizing statistics [72] and are less preferred for social tasks [48]. Robots
that exhibit human-like characteristics are perceived as more anthropomorphic and receive more
trust and willingness to follow their advice [41,105]. Similarly, a higher level of trust in the
robots can increase the willingness to seek the robot’s advice [40]. Robots with anthropomorphic
characteristics appear to form a stronger bond [88], are more resilient against breaches of trust
[22], and receive increased trust and compliance [82].
However, increased anthropomorphism creates a tendency to over-trust technology [5] by at-
tributing a larger competence [65] and resilience against trust loss despite decreasing reliability [22].
Especially, repeated observation of a robot’s reliability can lead to over-reliance by considering these
observations as proof for its reliability [114]. is mismatch between trust and the robots’ actual ca-
pabilities and reliability can result in over-trust [58]. In sensitive domains or applications, where lives
or personal well-being are involved, over-trust in technology can and has led to accidents [87,89].
An indication of reliance and compliance with a robot is the response time to the robot’s ad-
vice [66]. In contrast to reaction time, which describes the duration between the onset of a stimulus
and the person’s instinctive reaction [27,121], the response time measures the time between the on-
set of the stimulus and selecting and providing the response [102,120]. erefore, the response time
includes the decision-making process to provide the correct or appropriate response to the stimulus.
A shorter response time indicates an automatic or reex response, and longer response times are asso-
ciated with deliberate mental processing that involves examining all the available information [102].
Reliance can be considered a passive form of compliance, where reliance assumes the correct opera-
tion of the robot and that following the advice would advance their shared goal, whereas compliance
requires verication of the advice [57]. Verication is the process of reevaluating the past robot’s
task accuracy or recommendations to assess its performance, which is an indication of mistrust [49].
However, verifying the robot’s advice is associated with additional eort and time [31].
3 Methodology
3.1 Research Design
To answer the previously dened research question, an experiment is designed that consists of an
online pre-study and an in-person main study. e pre-study determines expectations about robot
voices and the most suitable natural and mechanical voice for the robot in the main experiment. e
main experiment researches the eects of a natural voice in contrast to a mechanical voice on trust
and compliance in a human-robot cooperative game of Baleship. e experiment is conducted in
a between-subject design with two groups, in which the robot has either a natural or mechanical
voice, while the gestures, uerances, and facial expressions are identical in both groups.
Based on the previously conducted research in the eld and the research question, we derive the
subsequent hypotheses:
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:5
H1: e participants will accept more advice from a robot with a natural voice.
H2: e participants exhibit a faster response when the robot has a natural voice.
H3: A longer response time indicates reduced compliance and can lead to advice rejection.
3.2 Pre-Study
Since there is wide variability in terms of voices, such as gender, rhythm, and perceived suitability
of the scenario in the main experiment, a pre-study for the voice selection was conducted. For the
pre-study, six dierent voice samples are examined, and their perceived suitability for the robot in
the main study is estimated. Previous studies suggest that the appearance and stature of a robot
can aect the perceived suitability of the robot’s voice [33]. Furthermore, a dierence in the voice
pitch is sucient to strongly change the participants’ impressions of the voice [17]. erefore,
the dierent voices can be separated into three natural-sounding and three mechanical-sounding
voices, where each group consists of one sample of a neutral-pitch, high-pitch, and low-pitch voice.
e voice samples were generated with a TTS model trained on the VCTK dataset [123] using a
VITS [56] model, which is provided by the coqui-ai TTS library [29]. e utilized model clones a
speaker’s voice and produces state-of-the-art natural-sounding speech. Since the utilized VCTK
Corpus contains sample data from 110 native English speakers, the data set was analyzed in
terms of gender, age, fundamental frequency, accent, noise, pitch, speed, and rhythm. To capture
a wide variety of voices during the pre-study that might be suitable for the robot’s appearance,
younger speakers between 18 and 23 years of age and dierent genders were selected. Further,
the robotic speech counterpart for the speakers was generated to analyze speech’s perception,
paerns, and understandability. e mechanical voice is created by applying a phaser eect to the
generated voice sample. e natural voice is overlaid on itself with a 10-millisecond delay. is
produces a mechanical-sounding voice while retaining the voice characteristics. To represent a
high-pitched voice, a female speaker was selected, while for the remaining lower voices, male
speakers were selected. Finally, to generate the voice samples, voices with a clearly distinguishable
pitch (fundamental frequency
𝑓0
) were selected. From the TTS library, the speaker p336 (
𝑓0=205 Hz
)
is used for the neutral voice, speaker p243 (
𝑓0=270 Hz
) for the high-pitch voice, speaker p286
(
𝑓0=59 Hz
) for the low-natural voice, and speaker p285 (
𝑓0=96 Hz
) for the low-pitch robotic
voice. e range of evaluated voices in the pre-study could be briey described as: female, male,
cartoonish, serious, childlike, and robotic. Further information on the selected voices is provided in
Appendix A.
e pre-study was conducted online using LimeSurvey [69]. An illustration of the pre-study is
shown in Figure 2. In the survey, the participants rst provide informed consent, and then their
expectations about robot voices are assessed. Aerward, a description of the main experiment with
a picture of the robot is provided. e six dierent voice samples are presented in random order,
and the participants rate the characteristics of each voice individually on a ranking scale ranging
from 1 (not at all) to 7 (yes, absolutely). e utilized questionnaire measures the perception and
distinguishability of the voices [76]. Finally, a video that illustrates the purpose of the robot and
the scenario of the main study is shown. Aerward, the participants rank all six voice samples
according to their perceived suitability for the robot in the video. Subsequently, the most suitable
natural and mechanical voice will be compared in the main study.
3.3 Experiment Design
In the main study, the game Baleship is played to evaluate the inuence of a robot’s voice on trust
and compliance. Previous human-robot interaction studies utilized the Baleship game to show
participants the game-play of a human and a robot [112] or as a scenario for a robot as a teacher
for the game [11,51,83]. Baleship is a turn-based guessing game in which both players aempt
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:6 D. Becker et al.
Fig. 2. Online study to assess expectations about robot voices and suitability for the robot in the main study.
to nd the other player’s ships on a two-dimensional 10 x 10 playing eld. Before the game starts,
both players position their ships on their playing eld. Aerward, both players take turns guessing
the position of the other player’s ships. Choosing a position on the playing eld that is occupied by
a ship is considered a hit. Guessing a position that is not occupied by a ship is considered a miss.
If all positions of a ship are hit, the ship is considered sunken and removed from the game. e
player who rst sinks all the opponent’s ships is considered the winner.
e game rules are adapted to ensure a consistent experiment among the participants. Specically,
all ships have a length of two elds, and each player has a total of seven ships. During the game,
when a player guesses the correct location of a ship, an additional turn to sink the already discovered
ship is granted. Accordingly, aer hiing a ship, only the four adjacent elds around the ship can
be selected until the ship is sunk.
During the experiment, the robot does not play the role of the opponent but instead assists the
participant by providing advice. is advice consists of proposing a dierent eld than the one
selected by the participant. e participant can either accept or reject the advice. Initially, the
participants are allowed to freely place their ships on the playing eld, however, the ships will
be sunk by the opponent in a predened order, which threatens the participant to lose the game.
is creates an incentive for the participant to follow the advice given by the robot. Further, the
robot claims to possess analytical and statistical capabilities, and knowledge of the opponent’s ship
position, which is not available to the participant.
e robot’s advice strategy is separated into two phases. In the rst phase, the robot will provide
an advice at every turn if the participant is not currently in the process of sinking a ship. In this
phase, the participant can only hit an opponent’s ship when following the robot’s advice. us, in
the rst phase, all advice is correct. e rst phase ends aer the participant follows two pieces
of the robot’s advice. In the second phase, the robot provides advice every second turn, when the
participant is not in the process of sinking a ship. In contrast to the rst phase, every advice is
incorrect and an accepted advice will result in a miss. e experiment ends aer the participant
received a total of nine pieces of advice.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:7
Fig. 3. Schematic overview and image of the main study setup.
While the rst phase will establish trust in the robot’s capabilities, in the second phase, the
participants will experience the unreliability of the robot’s advice. Despite the robot applying a trust
repair strategy [7] by apologizing and providing a ctional reason for the wrong advice, doubt in
the robot’s reliability will be created. Consequently, the participants should reevaluate the robot’s
reliability over the past turns, which leads to less trust in the robot’s capabilities [24]. is process
is inuenced by the robot’s voice and enables to measure the eect of a natural-sounding voice in
contrast to a mechanical-sounding voice.
3.4 Experiment Setup
e Neuro-Inspired COmpanion (NICO) [53], a child-sized humanoid robot developed for
human-robot interaction studies, is used for the experiment. e participant is seated in front of a
table with a multi-touch interface. e NICO robot is placed behind the table, facing the participant.
Additionally, an experimenter who supervises the experiment is seated behind a partition wall.
Figure 3(a) provides a detailed illustration of the experiment setup, and Figure 3(b) shows the
interaction between a participant and the robot in the experiment.
At the onset of the experiment, the robot greets the participant and introduces the game and
functionality of the multi-touch table, which is used to display and interact with the Baleship
game. An initial start screen allows the participant to freely position the ships on the playing eld.
During the participant’s turn, only the opponent’s playing eld is displayed in the center of the
multi-touch table and a eld can be selected. During the opponent’s turn, both playing elds are
displayed next to each other, and the position that was selected by the opponent is highlighted.
Instructions and advice are provided by the NICO robot and displayed on the multi-touch table.
During the experiment, the NICO robot guides the participant by providing status updates and
describing the next step. Further, the robot will oer advice and apologize if the accepted advice is
incorrect. During these interactions, the robot uses a gesture, a facial expression, and depending on
the situation randomly selects a suitable voice line for the situation. e robot conveys the next
action and emotions by using one of the 10 implemented gestures, such as pointing at the touch
table, shaking its head, or a thumbs-up gesture. e facial expressions used in the experiment are
neutral, angry, happy, and sad. Finally, there are a total of 76 sentences for the robot to be used
in these interactions. Implementation details (Appendix C), examples of the utilized voice lines
(Appendix E), and a list of the implemented gestures (Appendix D) are provided in the Appendix.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:8 D. Becker et al.
A pilot study with eight participants was conducted before the main experiment. e participants
were recruited from the campus of the Informatics department. antitative results showed that
the participants perceived a dierence in the naturalness of both voices. Further, interviewing the
participants conrmed that they understood the game rules and were able to follow the robots’
speech.
3.5 estionnaires and Measurements
To evaluate the eect of the robot’s voice, questionnaires are assessed and advice acceptance and
response time are recorded.
e Godspeed questionnaire [8] is used to capture the participants’ overall perception of the
robot and its behavior and measures the perceived robot anthropomorphism, animacy, likeability,
intelligence, and safety. e Multi-Dimensional Measure of Trust (MDMT) [113] is used to
assess trust along the dimensions of performance trust (reliable, competent). e Mean Opinion
Score eXtended (MOS-X) Scales [90] quanties the participants’ perception of the robot’s voice
in terms of intelligibility, naturalness, social impression, and prosody. e prosody factor is omied
since the prosody for both voices is identical. All the questionnaires were assessed on a 7-point
ranking scale.
As an objective measure of the participant’s trust in the robot, the decision to accept or reject
the robot’s advice is recorded. e response time between the robot’s advice and the participant
pressing the buon on the multi-touch table measures the participant’s compliance with the robot.
3.6 Procedure
Aer providing informed consent to participate in the experiment, the participants are randomly
assigned to one of the experiment conditions. e rules of the baleship game are explained by
the experimenter using an instruction sheet. Next, the participants are escorted to a neighboring
room, where they are seated in front of the touch table facing the NICO robot. e experiment lasts
until the NICO robot provides nine pieces of advice, which requires approximately 30 minutes.
Upon completion of the experiment, the participants are brought back to the initial room and are
presented with the questionnaires. Student participants were granted experiment participant hours.
3.7 Participants
Aer an evaluation by the Ethics Commission of the Department of Informatics at the University,
the participants for the pre-study were recruited online, and participants for the main study were
recruited through announcements at the university’s email lists and social channels. For estimation
of the pre-study sample size, a medium eect size of
𝑓=0.30
was assumed [99], with an
𝛼=.05
and a statistical power of
.95
. Simulation of these assumptions yielded an estimated sample size
of 56 participants. e experiment was completed by 62 participants. Two participants answered
the control question incorrectly and were removed from the analysis. As a control question, the
participants had to answer a question regarding the content of the provided audio sample. Of the
remaining 60 participants, 30 were male, 29 were female, and one participant preferred not to
disclose their gender. e participants’ ages ranged from 18 to 62 years (
𝑀=26.18
,
𝑆𝐷 =9.74
). e
majority of the participants were students (52%) followed by those working part-time or full-time
(35%). Forty-ve participants had no prior experience with robots, 14 participants stated experience
with robots, and one participant worked with them regularly.
Previous studies [38,110] report a small eect size (
𝜑=0.16
) of a robot’s voice on trust. e
sample size for the main study was estimated with an
𝛼=.05
and a statistical power of
.95
. e
estimated sample size was 30 participants for each experiment group. A total of 73 participants
completed the main study, and the participants from the pre-study were excluded from participating.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:9
Fig. 4. Expectations about a robot’s voice.
Due to technical issues, ve participants had to be excluded, resulting in a total of 68 participants for
the data analysis. e participants were equally distributed between both experimental conditions.
Of these participants, 42 were male, 25 were female, and one participant did not wish to provide
information. e ages ranged from 18 to 64 years (
𝑀=27.45
,
𝑆𝐷 =8.08
). Most of the participants
were students (63%) followed by those working part-time or full-time (34%). irty-one participants
had no prior experience with robots, 29 participants had prior experience with robots, and 8
participants stated to work with robots regularly.
4 Results
e pre-study estimates expectations regarding robot voices and the most suitable natural and
mechanical voice for the robot in the main study. e results of the main study present the statistical
analysis of the dierence between the natural and mechanical voice on the perception of the robot,
advice acceptance (H1), and response time (H2 and H3).
4.1 Pre-Study
Before the robot and the scenario of the main study were introduced, the participants’ expectations
about a robot’s voice were assessed. An illustration of the responses is provided in Figure 4.
e responses show that the participants did not expect a robot’s voice to sound mechanical or
creepy, but instead expect a robot to sound human-like. Further, most participants (60%) strongly
disagree that a robot’s voice should be male and it is indicated that a robot’s voice is not expected
to sound genderless. e answers regarding the expected comfortableness of interaction reect
uncertainty in robot voices, as illustrated by the spread of responses.
Aer assessing the expectations about a robot’s voice, a picture of the NICO robot was shown to
the participants. Further, the participants listened to an audio sample of each voice followed by a
questionnaire assessment of each voice’s perception. Finally, the choice of the most suitable natural
and mechanical voice for the main experiment is based on the participants’ suitability ranking
for the scenario of the main study. e suitability was assessed aer introducing the scenario and
showing the participants a video of the robot in the conditions of the main study. A Kruskal-Wallis
rank sum test of these suitability rankings indicates signicant dierences among the suitability of
the various voices (
𝜒2(5)=54.75
,
p<.001
). e assessed suitability ranking for each voice with
standard errors is illustrated in Figure 5.
It is apparent that in the group of natural voices, the low-pitch natural voice was rated as most
suitable for the study scenario, followed by the neutral-pitch voice in the mechanical voice group.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:10 D. Becker et al.
Fig. 5. Voice suitability ranking regarding their suitability for the main study. Values indicate p-value,
∗
p
<
.05, ∗∗∗p<.001.
Table 1. Perceived Speech Dierence between the Voices Utilized in the
Main Study
Measure Mean (𝑆𝐷) p-value
Low natural Neutral mechanical
Understandability 6.62 (0.87) 5.05 (1.62) <.001∗∗∗
Mechanic 2.63 (1.73) 6.33 (1.32) <.001∗∗∗
Expressiveness 4.88 (1.56) 3.12 (1.42) <.001∗∗∗
Appealing 4.95 (1.60) 2.68 (1.44) <.001∗∗∗
Intelligibility 5.07 (1.52) 3.80 (1.54) <.001∗∗∗
Credibility 5.02 (1.65) 3.82 (1.65) <.001∗∗∗
Suitability 4.02 (1.87) 3.55 (1.80) <.179
∗∗∗p<.001.
erefore, the specic eects of these two voices on trust and compliance are researched in the
main study. e natural voice could be considered distinctively male and with a calm speech paern,
whereas the mechanical voice reminds of a child’s voice. us, the natural voice might be suitable
for the role of a caption, whereas the mechanical voice could be in line with the childlike appearance
of the robot. e mechanical low-pitch voice, which sounds robotic, and the mechanical high-pitch
voice, which reminds of a cartoon voice, received the lowest suitability rankings. Further analysis
did not reveal a relationship between the participant’s gender and pitch preference in the data
(Appendix B). A direct comparison of the low-pitch natural and the neutral-pitch mechanical voice
in the aspects of perception and distinguishability is shown in Table 1.
A Mann–Whitney Utest shows that the selected voices signicantly dier in all the assessed
aspects of perception and distinguishability, except for their suitability in the main study. A non-
signicant dierence in the suitability of the voices for the main study is preferable since it does not
provide the participants with an indication of the study’s objective. From the signicant dierences,
it is notable that the direct comparison reects the participants’ expectations about robot voices,
since the natural voice is perceived as more appealing (
𝑊=563
,
p<.001
) and aributed a larger
credibility (
𝑊=1077.5
,
p<.001
). Further, the natural voice is beer understandable (
𝑊=709.5
,
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:11
Fig. 6. Godspeed questionnaire for a robot with a neutral and a mechanical voice. Values indicate p-value,
∗∗p<.01.
p<.001
) and more intelligible (
𝑊=979.5
,
p<.001
) than the mechanical voice. Finally, the
mechanical voice is perceived as more mechanical (
𝑊=3356.5
,
p<.001
) as intended by applying
the phaser eect to the natural voice.
4.2 Main Study
e main experiment was conducted, using the most suitable natural and mechanical voice deter-
mined in the pre-study, To estimate the dierence in perception of the robot depending on its voice,
the Godspeed questionnaire was assessed. e results of the Godspeed questionnaire with mean
and standard error for both experiment groups are illustrated in Figure 6.
A Mann–Whitney Utest suggests a signicant dierence (
𝑊=344
,
𝑝=.004
) in the perceived
robot safety between the robot with the neutral (
𝑀=5.08
,
𝑆𝐷 =1.15
) and mechanical voice
(𝑀=4.37,𝑆𝐷 =0.97).
As a self-assessed measure of the robot’s competence and reliability in the experiment, the
MDMT questionnaire was assessed. An illustration of the average ratings is shown in Figure 7(a).
However, a Mann–Whitney Utest does not reveal a signicant dierence between both robots.
Similarly, the MOS-X questionnaire assesses the dierence regarding the voices, and the results
for both robots are shown in Figure 7(b). A Mann–Whitney Utest reveals a signicant dierence
in the measure of intelligibility and naturalness. e intelligibility of the natural voice (
𝑀=5.72
,
𝑆𝐷 =1.26
) is signicantly higher (
𝑊=199.5
,
p<.001
) than the mechanical voice (
𝑀=3.91
,
𝑆𝐷 =1.40
). Likewise, the perceived naturalness of the natural voice (
𝑀=4.83
,
𝑆𝐷 =1.36
)
signicantly diers (𝑊=177.5,p<.001) from the mechanical voice (𝑀=2.88,𝑆𝐷 =1.29).
During the experiment, the robot provided advice to the participant, and the response time to
either reject or accept the advice was measured as well as their response. e participants’ advice
acceptance rate is shown in Figure 8(a). To evaluate if participants accept more advice from the
robot with the natural voice (H1), a chi-square test is applied to compare the number of accepted
advice for the natural voice (
𝑁=212
) and the mechanical voice (
𝑁=201
). However, the Chi-square
test does not indicate a signicant dierence (
𝜒2(1, 𝑁 =612)=0.745
,
p=.388
). e estimates of a
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:12 D. Becker et al.
Fig. 7. Assessed dierences in the natural and mechanical voice. Values indicate p-value, ∗∗∗p<.001.
Fig. 8. Probability of accepted advice and average response time for each advice throughout the experiment.
Table 2. Spearman’s Rank Correlation between the Advice Acceptance Rate and the Assessed Measures
Godspeed MOS-X MDMT
Item Animacy Likeability Anthropomorphism Intelligence Safety Intelligibility Naturalness Social impression Competent Reliable Response time
𝜌.020 .275 .192 .154 −.079 .210 .202 .146 .126 −.205 −.287
p-value .871 .023∗.116 .209 −.524 .086 .098 .236 .304 −.094 −.018∗
∗p<.05.
mixed-eects model for advice acceptance that shows the reduction of the probability of accepting
advice throughout the experiment are provided in Appendix F.
e average response time in seconds for the individual advice is shown in Figure 8(b). A
dierence in response time for both groups refers to H2, and a Mann–Whitney Utest shows that
the experiment group with the natural voice (
𝑀=5.30
,
𝑆𝐷 =8.93
) exhibited a signicantly shorter
response time to the robot’s advice over the course of the experiment (
𝑊=42259
,
p=.037
) than
the group with the mechanical voice (𝑀=6.38,𝑆𝐷 =2.85).
To evaluate H3, the Spearman’s rank correlation between the assessed questionnaires and
response time on the probability of accepting the robot’s advice is estimated and shown in Table 2.
e correlation analysis shows that a longer response time results in a higher probability of rejecting
the robot’s advice (
𝑟(66)=−.205
,
p=.018
). Further, with an increased likability of the robot, the
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:13
participants are more inclined to follow the robot’s advice (
𝑟(66)=.275
,
p=.023
). e estimates
of the correlation between the response time and the questionnaire measures are provided in
Appendix G.
5 Discussion
From the responses of the expectations about robot voices in the pre-study, it can be noticed that
the participants did not expect a robot to sound mechanical but instead human-like. In terms of the
expected voice gender, the response paern is inconclusive. Although a robot’s voice should not
necessarily be male or genderless, it appears that a robot’s gender might depend on the application
and the robot’s task. Congruently, aer introducing the main scenario, the participants rated the
suitability of the deeper male-sounding voice as higher than the high-pitched voice. is suggests
that for the introduced baleship scenario preconceived gender roles might have inuenced their
choice [20,38,59].
e preference for natural voices [61] is further supported by the participants’ suitability ranking
for the robot in the scenario, where the natural voices are considered more suitable than their
mechanical counterparts except for the neutral-pitch voice. e comparison of the voice charac-
teristics of the low natural voice and the neutral mechanical voice shows that the natural voice
is perceived as more expressive, appealing, and credible. For the scenario of the main study, the
participants might prefer a natural and low-pitched voice that expresses calmness and authority.
e Baleship scenario might be perceived as a task that bears uncertainty and responsibility,
which requires the robot to portray calmness and condence. Additionally, natural voices can be
easier understood [103] as shown by the dierence in intelligibility. e suitability ranking does
not show a signicant dierence for the voices selected for the main study. erefore, using the
neutral-pitch mechanical voice in the main experiment should not reveal the study objective to the
participants.
In the main study, the evaluation of the Godspeed questionnaire shows a dierence in the
perceived safety of both robots. e robot in the experiment is perceived as safer when speaking in
a natural voice. However, the Godspeed questionnaire does not indicate a dierence in perceived
anthropomorphism. e MDMT does not show a dierence in the robot’s competence and reliability.
e analysis of the perception of the voices shows that the participants noticed a clear distinction
between both robots’ voices in the main experiment. e natural voice is easier to understand and
perceived as more natural than its mechanical counterpart, which is identical to the participants’
perception of the voices in the pre-study. Additionally, the results of the main study reveal that both
voices do not dier in their social impression. e results suggest that anthropomorphism might
be strongly aected by a robot’s appearance and movements [32] as opposed to its voice. During
the interaction, personality and moral values might be further aributed to the robot which aects
the perceived robot’s anthropomorphism [63]. erefore, the results do not show any dierence in
the perceived robot’s anthropomorphism and the robot’s social impression.
To analyze the eect of a natural voice in comparison to a mechanical voice on trust, the number
of accepted advice in both groups was compared. From the acceptance rate over the course of the
experiment, it can be noticed that most participants were inclined to accept early advice from the
robot, which resulted in hiing an opponent’s ship. However, since any advice aer the second
accepted advice was incorrect and would result in a miss, trust in the robot declined throughout the
experiment. e hypothesis H1 suggests that the participants in the natural voice experiment group
are more likely to accept the advice. However, the statistical analysis does not show a dierence in
both groups. Surprisingly, the estimated eect size (
𝜑=.035
) is tiny [37], which is in contrast to
the assumed small eect size stated in the literature. Although a mechanical voice can be perceived
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:14 D. Becker et al.
as eerie, the inuence of a robot’s voice on trust appears smaller than previously reported or does
not directly aect trust [1].
H2 proposes that the participants in the experiment group with the natural voice robot are
more likely to rely on the robot’s advice and are less likely to reconsider the outcome of the past
robot’s advice, thus having a shorter response time. While at the beginning of the experiment, the
participants consider their options, the response time decreases throughout the experiment. is can
also be aributed to an increase in familiarity with the procedure. ere is a signicant dierence in
the participants’ response times between both experiment groups exhibiting a small eect (
𝑟=.084
).
is suggests that the participants might aribute more competence to the robot with the natural
voice and are more likely to comply with the robot’s advice. However, the mechanical voice was
perceived as less intelligible, which might require more listening eort and could increase response
time. Since the procedure of following or rejecting advice was repetitive and textual instructions
were provided on the touch table, this eect on the response time could be minor. Additionally, the
correlation analysis did not suggest a signicant relationship between the intelligibility of the voice
and following the advice. Furthermore, it might be that a voice’s aributes indirectly inuence
trust and reliance. Specically, the aributes of a voice could aect the perceived competence [62,
101], which might inuence trust and reliance.
e correlation analysis shows that the response time has a moderate (
𝜌=−.287
) negative
inuence on the probability of accepting the robot’s advice. As suggested in H3, the participants
who do not rely on the robot and reevaluate the past performance by previously following the
incorrect advice will doubt the robot’s capabilities and reject further advice. In addition, the analysis
shows that an increased likeability of the robot inclines the participants to follow the robot’s advice.
is emphasizes that the concept of trust might comprise of many aspects and can be inuenced
by the robot’s likeability [15].
6 Limitations
Certain limitations in this study should be addressed in future research. Dening and eectively
measuring trust is uniquely challenging. is study focused on an advice-taking scenario and the
relationship between trust and compliance. However, various factors can inuence trust in a robot,
as indicated by the study results. Further, the selection of the most suitable voice in the main study
focused on their fundamental frequency. Although the results show that a change in frequency
aects the perception of the robot, there are additional aspects to vocal interactions, such as social
and emotional aspects [36], which could serve as directions for future research. Finally, the study
was centered around a humanoid robot. Humanoid robots have a strong representation in the
media, which might shape the public’s perception and assumed capabilities. For dierent robot
appearances, the expected voice and resulting eects might dier [97].
7 Conclusion
Prior research suggests that natural voices can increase the anthropomorphization of robots, which
might lead to aributing a robot more capability and increased knowledge. us, the inuence
of a robot with a natural voice in contrast to a mechanical voice on trust and compliance was
investigated. e study consisted of a pre-study to analyze the expectations about robots’ voices
and determine the most suitable mechanical and natural voice for the robot in the main experiment.
In the main experiment, the participants were assisted by a robot in the game Baleship. e robot
presented itself as possessing exclusive knowledge, but all the advice provided past the second
advice was incorrect. is required the participants to realize that the robot’s advice did not provide
benets and instead they had to rely on themselves. e evaluation of the assessed questionnaires
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:15
shows a dierence in the perceived robot’s safety, where a robot with a natural voice is perceived
as safer than a robot with a mechanical voice.
A comparison of the number of accepted advice does not reveal any disparity in trust. However,
the participants who received assistance from the robot with the mechanical voice required more
time to decide whether to either follow or reject the robot’s advice. is indicates a dierence in
compliance, where the participants in the experiment group with the natural voice rely stronger
on the robot’s assistance. e participants who received advice from a robot with a mechanical
voice might have evaluated the robot’s past performance and second-guessed the robot’s benet,
thus leading to a longer response time.
e pre-study shows that people expect robots to sound natural and that interaction with them
should be comfortable. Further, a natural voice provides increased intelligibility, which can avoid
misunderstandings. e results of this study suggest that a robot’s voice naturalness does not
directly aect trust, but reveals an eect on perceived safety and compliance. e advantages of a
natural voice have to weigh against potential disadvantages, depending on the robot’s use case. For
instance, for industrial and safety-relevant applications, over-reliance on robots should be avoided.
In summary, it appears that a natural and easily intelligible voice is well-suited for cooperative
tasks.
Robots’ voices play a crucial role in shaping people’s perception and expectations of robots. As
shown by this study, a robot’s voice suitability depends on the robot and the scenario the robot is
used in. e research conducted sheds new light on people’s expectations about robots’ voices and
provides evidence that a robot’s voice aects trust and anthropomorphism less than previously
reported. ese results can nurture future research on creating more eective voice interfaces for
robots and subsequently increasing their acceptance and adoption in society.
Acknowledgments
Many thanks to Shrey Dixit, Tassilo Hahm, Haruka Inoba, Katharina Meyer-Lüters, and Mai Nhi
Tran for their contribution to the project.
References
[1]
Amal Abdulrahman and Deborah Richards. 2022. Is natural necessary? Human voice versus synthetic voice for
intelligent virtual agents. Multimodal Technologies and Interaction 6, 7 (2022), 1–17.
DOI:
https://doi.org/10.3390/
mti6070051
[2]
Abdulaziz Abubshait and Eva Wiese. 2017. You Look human, but act like a machine: Agent appearance and behavior
modulate dierent aspects of Human-robot interaction. Frontiers in Psychology 8 (2017), 1–12.
DOI:
https://doi.org/
10.3389/fpsyg.2017.01393
[3]
Ruben Alonso, Emanuele Concas, and Diego Reforgiato Recupero. 2021. An abstraction Layer exploiting voice
assistant technologies for eective human—robot interaction. Applied Sciences (Switzerland) 11, 19 (2021), 1–18.
DOI:
https://doi.org/10.3390/app11199165
[4]
Sean Andrist, Micheline Ziadee, Halim Boukaram, Bilge Mutlu, and Majd Sakr. 2015. Eects of culture on the
credibility of robot speech: A comparison between English and arabic. In Proceedings of the ACM/IEEE International
Conference on Human-Robot Interaction, Vol. 2015, 157–164. DOI: https://doi.org/10.1145/2696454.2696464
[5]
Alexander M. Aroyo, Jan De Bruyne, Orian Dheu, Eduard Fosch-Villaronga, Aleksei Gudkov, Holly Hoch, Steve
Jones, Christoph Lutz, Henrik Saetra, Mads Solberg, and Aurelia Tamò-Larrieux. 2021. Overtrusting robots: Seing a
research agenda to mitigate overtrust in automation. Paladyn, Journal of Behavioral Robotics 12, 1 (2021), 423–436.
20814836 DOI: https://doi.org/10.1515/pjbr-2021-0029
[6]
Mahew P. Ayle, Selina Jeanne Suon, and Yolanda Vazquez-Alvarez. 2019. e right kind of unnatural: Designing
a robot voice. In Proceedings of the ACM International Conference Proceeding Series, 5–6.
DOI:
https://doi.org/10.1145/
3342775.3342806
[7]
Anthony L. Baker, Elizabeth K. Phillips, Daniel Ullman, and Joseph R. Keebler. 2018. Toward an understanding of
trust repair in human-robot interaction: Current research and future directions. ACM Transactions on Interactive
Intelligent Systems 8, 4 (2018), 1–30. DOI: https://doi.org/10.1145/3181671
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:16 D. Becker et al.
[8]
Christoph Bartneck, Dana Kulić, Elizabeth Cro, and Susana Zoghbi. 2009. Measurement instruments for the
anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal
of Social Robotics 1 (2009), 71–81. DOI: https://doi.org/10.1007/s12369-008-0001-3
[9]
Christian Becker-Asano, Takayuki Kanda, Carlos Ishi, and Hiroshi Ishiguro. 2009. How about laughter? Perceived
naturalness of two laughing humanoid robots. In Proceedings of the 3rd International Conference on Aective Computing
and Intelligent Interaction and Workshops (ACII ’09).DOI: https://doi.org/10.1109/ACII.2009.5349371
[10]
Annika Boos, Olivia Herzog, Jakob Reinhardt, Klaus Bengler, and Markus Zimmermann. 2022. A compliance–
reactance framework for evaluating Human-robot interaction. Frontiers in Robotics and AI 9 (2022), 1–13.
DOI:
https://doi.org/10.3389/frobt.2022.733504
[11]
Gordon Briggs, Tom Williams, Ryan Blake Jackson, and Mahias Scheutz. 2022. Why and how robots should say
‘No’. International Journal of Social Robotics 14, 2 (2022), 323–339. DOI: https://doi.org/10.1007/s12369-021-00780-y
[12]
Natalia Calvo-Barajas, Giulia Perugia, and Ginevra Castellano. 2020. e eects of robot’s facial expressions on
children’S First impressions of trustworthiness. In Proceedings of the 29th IEEE International Conference on Robot and
Human Interactive Communication, 165–171. DOI: https://doi.org/10.1109/RO-MAN47096.2020.9223456
[13]
Julia Cambre and Chinmay Kulkarni. 2019. One voice ts all? Social implications and research challenges of
designing voices for Smart devices. Proceedings of the ACM on Human-Computer Interaction 3 (2019), 1–19.
DOI:
https://doi.org/10.1145/3359325
[14]
David Cameron, Jonathan M Aitken, Emily C Collins, Luke Boorman, Adriel Chua, Samuel Fernando, Owen McAree,
Uriel Martinez-Hernandez, and James Law. 2015. Framing factors: e importance of context and the individual in
understanding trust in Human-robot interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS).
[15]
David Cameron, Stevienna de Saille, Emily C. Collins, Jonathan M. Aitken, Hugo Cheung, Adriel Chua, Ee Jing Loh,
and James Law. 2021. e eect of social-cognitive recovery strategies on likability, capability and trust in social
robots. Computers in Human Behavior 114 ( Jan. 2021), 106561. DOI: https://doi.org/10.1016/j.chb.2020.106561
[16]
Eric T. Chancey, James P. Bliss, Yusuke Yamani, and Holly A. H. Handley. 2017. Trust and the compliance-reliance
paradigm: e eects of Risk, error Bias, and reliability on trust and dependence. Human Factors 59, 3 (2017), 333–345.
DOI: https://doi.org/10.1177/0018720816682648
[17]
Rebecca Cherng Shiow Chang, Hsi Peng Lu, and Peishan Yang. 2018. Stereotypes or Golden rules? Exploring likable
voice traits of social robots as active aging companions for tech-savvy baby Boomers in Taiwan. Computers in Human
Behavior 84 (2018), 194–210. DOI: https://doi.org/10.1016/j.chb.2018.02.025
[18]
Jessie Y.C. Chen, Michael J. Barnes, and Michelle Harper-Sciarini. 2011. Supervisory control of multiple robots:
Human-performance issues and user-interface design. IEEE Transactions on Systems, Man and Cybernetics Part C:
Applications and Reviews 41, 4 (2011), 435–454. DOI: https://doi.org/10.1109/TSMCC.2010.2056682
[19]
F. Cid, R. Cintas, L. J. Manso, L. Calderita, A. Sánchez, and P. Núñez. 2011. A real-time synchronization algorithm
between text-to-speech (TTS) system and robot mouth for social robotic applications. Proceedings of Workshop of
Physical Agents.
[20]
Charles R. Crowell, Mahias Scheutz, Paul Schermerhorn, and Michael Villano. 2009. Gendered voice and robot
entities: Perceptions and reactions of Male and female subjects. In Proceedings of the IEEE/RSJ International Conference
on Intelligent Robots and Systems, IROS 2009. 3735–3741. DOI: https://doi.org/10.1109/IROS.2009.5354204
[21]
M. M. A. De Graaf, S. Ben Allouch, and J. A. G. M. Van Dijk. 2015. What makes robots social?: A user’s perspective
on characteristics for social Human-robot interaction. In Proceedings of the International conference on Social Robotics.
Lecture Notes in Computer Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in
Bioinformatics), Vol. 9388, 184–193. DOI: https://doi.org/10.1007/978-3-319-25554-5_19
[22]
Ewart J. de Visser, Samuel S. Monfort, Ryan McKendrick, Melissa A.B. Smith, Patrick E. McKnight, Frank Krueger,
and Raja Parasuraman. 2016. Almost human: Anthropomorphism increases trust resilience in cognitive agents.
Journal of Experimental Psychology: Applied 22, 3 (2016), 331–349. DOI: https://doi.org/10.1037/xap0000092
[23]
Ewart J. de Visser, Marieke M.M. Peeters, Malte F. Jung, Spencer Kohn, Tyler H. Shaw, Richard Pak, and Mark A.
Neerincx. 2020. Towards a theory of longitudinal trust calibration in human–robot teams. International Journal of
Social Robotics 12, 2 (2020), 459–478. DOI: https://doi.org/10.1007/s12369-019-00596-x
[24]
Peter de Vries, Cees Midden, and Don Bouwhuis. 2003. e eects of errors on system trust, Self-condence, and the
allocation of control in route planning. International Journal of Human Computer Studies 58, 6 (Jun. 2003), 719–735.
DOI: https://doi.org/10.1016/S1071-5819(03)00039-9
[25]
Munjal Desai, Poornima Kaniarasu, Mikhail Medvedev, Aaron Steinfeld, and Holly Yanco. 2013. Impact of robot
failures and feedback on real-time trust. In Proceedings of the ACM/IEEE International Conference on Human-Robot
Interaction, 251–258. DOI: https://doi.org/10.1109/HRI.2013.6483596
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:17
[26]
Munjal Desai, Mikhail Medvedev, Marynel Vázquez, Sean McSheehy, Soa Gadea-Omelchenko, Christian Bruggeman,
Aaron Steinfeld, and Holly Yanco. 2012. Eects of changing reliability on trust of robot systems. In Proceedings of the
7th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI ’12). ACM, New York, NY, 73–80.
DOI: https://doi.org/10.1145/2157689.2157702
[27]
F. C. Donders. 1969. On the Speed of mental processes. Acta Psychologica 30, C (1969), 412–431. 00016918
DOI:
https://doi.org/10.1016/0001-6918(69)90065-1
[28]
Xiao Dou, Li Yan, Kai Wu, and Jin Niu. 2022. Eects of voice and lighting color on the social perception of Home
healthcare robots. Applied Sciences (Switzerland) 12, 23 (2022), 1–14. DOI: https://doi.org/10.3390/app122312191
[29] Gölge Eren and e Coqui TTS team. 2021. Coqui TTS. DOI: https://doi.org/10.5281/zenodo.6472420
[30]
Friederike Eyssel, Dieta Kuchenbrandt, Simon Bobinger, Laura De Ruiter, and Frank Hegel. 2012. ‘If you sound
like me, You must be more human’: On the interplay of robot and user features on Human-robot acceptance and
anthropomorphism. In Proceedings of the 7th Annual ACM/IEEE International Conference on Human-Robot Interaction
(HRI’12), 125–126. DOI: https://doi.org/10.1145/2157689.2157717
[31]
Neta Ezer, Arthur D. Fisk, and Wendy A. Rogers. 2008. Age-related dierences in reliance behavior aributable to
costs within a human-decision aid system. Human Factors 50, 6 (Dec. 2008), 853–863.
DOI:
https://doi.org/10.1518/
001872008X375018
[32]
Julia Fink. 2012. Anthropomorphism and human likeness in the design of robots and human-robot interaction. In
Proceedings of the International Conference on Social Robotics. Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 7621, 199–208. 03029743
DOI:
https://doi.org/10.1007/978-3-642-34103-8_20
[33]
Kerstin Fischer and Oliver Niebuhr. 2023. Which voice for which robot? Designing robot voices that indicate robot
size. ACM Transactions on Human-Robot Interaction 12, 4 (2023), 1–24. DOI: https://doi.org/10.1145/3632124
[34]
Kerstin Fischer, Oliver Niebuhr, Lars C. Jensen, and Leon Bodenhagen. 2020. Speech melody maers—How robots
prot from using charismatic speech. ACM Transactions on Human-Robot Interaction 9, 1 (2020), 1–21.
DOI:
https:
//doi.org/10.1145/3344274
[35]
Marlena R. Fraune. 2020. Our robots, our team: Robot anthropomorphism moderates group eects in human–robot
teams. Frontiers in Psychology 11 (2020), 1–14. DOI: https://doi.org/10.3389/fpsyg.2020.01275
[36]
Changzeng Fu, Qi Deng, Jingcheng Shen, Hamed Mahzoon, and Hiroshi Ishiguro. 2022. A preliminary study on
realizing Human–robot mental comforting dialogue Via sharing experience emotionally. Sensors 22, 3 (2022), 1–15.
DOI: https://doi.org/10.3390/s22030991
[37]
David C. Funder and Daniel J. Ozer. 2019. Evaluating eect size in psychological research: Sense and nonsense.
Advances in Methods and Practices in Psychological Science 2, 2 (2019), 156–168.
DOI:
https://doi.org/10.1177/
2515245919847202
[38]
Darci Gallimore, Joseph B. Lyons, y Vo, Sean Mahoney, and Kevin T. Wynne. 2019. Trusting robocop: Gender-based
eects on trust of an autonomous robot. Frontiers in Psychology 10 (2019), 1–9.
DOI:
https://doi.org/10.3389/fpsyg.
2019.00482
[39]
Norina Gasteiger, Jong Yoon Lim, Mehdi Hellou, Bruce A. MacDonald, and Ho Seok Ahn. 2022. A scoping review of
the literature on prosodic elements related to emotional speech in Human-robot interaction. International Journal of
Social Robotics 16 (2022), 659–670. DOI: https://doi.org/10.1007/s12369-022-00913-x
[40]
Ioanna Giorgi, Aniello Minutolo, Francesca Tiroo, Oksana Hagen, Massimo Esposito, Mario Gianni, Marco Palomino,
and Giovanni L. Masala. 2023. I am robot, your health adviser for older adults: Do you trust my advice? International
Journal of Social Robotics, 12 (2023), 1981–1991. DOI: https://doi.org/10.1007/s12369-023-01019-8
[41]
Jennifer Goetz, Sara Kiesler, and Aaron Powers. 2003. Matching robot appearance and behavior to tasks to improve
human-robot cooperation. In Proceedings of the IEEE International Workshop on Robot and Human Interactive
Communication, 55–60. DOI: https://doi.org/10.1109/ROMAN.2003.1251796
[42]
Dale L. Goodhue. 1995. Understanding user evaluations of information systems. Management Science 41, 12 (1995),
1827–1844. DOI: https://doi.org/10.1287/mnsc.41.12.1827
[43]
Dale L. Goodhue and Ronald L. ompson. 1995. Task-technology t and individual performance. MIS arterly:
Management Information Systems 19, 2 (1995), 213–233. DOI: https://doi.org/10.2307/249689
[44]
Peter A. Hancock, Deborah R. Billings, Kristin E. Schaefer, Jessie Y.C. Chen, Ewart J. De Visser, and Raja Parasuraman.
2011. A meta-analysis of factors aecting trust in Human-robot interaction. Human Factors 53, 5 (2011), 517–527.
DOI: https://doi.org/10.1177/0018720811417254
[45]
Corey Hannum, Rui Li, and Weitian Wang. 2023. A trust-assist framework for human–robot co-carry tasks. Robotics
12, 2 (Feb. 2023), 30. DOI: https://doi.org/10.3390/robotics12020030
[46] Caroline E. Harrio and Julie A. Adams. 2017. Towards reaction and response Time metrics for real-world human-
robot interaction. In Proceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communi-
cation (RO-MAN ’17), Vol. 2017, 799–804. DOI: https://doi.org/10.1109/ROMAN.2017.8172394
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:18 D. Becker et al.
[47]
Frank Hegel. 2012. Eects of a robot’s aesthetic design on the aribution of social capabilities. In Proceedings of the
IEEE International Workshop on Robot and Human Interactive Communication, 469–475.
DOI:
https://doi.org/10.1109/
ROMAN.2012.6343796
[48]
Nicholas Hertz and Eva Wiese. 2019. Good advice Is beyond all Price, but what if it comes from a machine? Journal
of Experimental Psychology: Applied 25, 3 (2019), 386–395. DOI: https://doi.org/10.1037/xap0000205
[49]
Georey Ho, Dana Wheatley, and Charles T. Scialfa. 2005. Age dierences in trust and reliance of a medication
management system. Interacting with Computers 17, 6 (Dec. 2005), 690–710.
DOI:
https://doi.org/10.1016/j.intcom.
2005.09.007
[50]
Kevin Anthony Ho and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that
inuence trust. Human Factors 57, 3 (2015), 407–434. DOI: https://doi.org/10.1177/0018720814547570
[51]
Ryan Blake Jackson, Tom Williams, and Nicole Smith. 2020. Exploring the role of gender in perceptions of robotic
noncompliance. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 559–567.
DOI:
https://doi.org/10.1145/3319502.3374831
[52]
A. D. Kaplan, T. T. Kessler, T. L. Sanders, J. Cruit, J. C. Brill, and P. A. Hancock. 2021. A Time to trust: Trust
as a function of time in human-robot interaction. In Trust in Human-Robot Interaction. Elsevier, 143–157.
DOI:
https://doi.org/10.1016/B978-0-12-819472-0.00006-X
[53]
Mahias Kerzel, Erik Strahl, Sven Magg, Nicolás Navarro-Guerrero, Stefan Heinrich, and Stefan Wermter. 2017.
NICO—neuro-inspired companion: A developmental humanoid robot platform for multimodal interaction. In Pro-
ceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE,
113–120.
[54]
Zahra Rezaei Khavas. 2021. A review on trust in human-robot interaction. arXiv:2105.10045. Retrieved from http:
//arxiv.org/abs/2105.10045
[55]
Sara Kiesler. 2005. Fostering common ground in human-robot interaction. In Proceedings of the IEEE International
Workshop on Robot and Human Interactive Communication, Vol. 2005, IEEE, 729–734.
DOI:
https://doi.org/10.1109/
ROMAN.2005.1513866
[56]
Jaehyeon Kim, Jungil Kong, and Juhee Son. 2021. Conditional variational autoencoder with adversarial learning for
end-to-end text-to-speech. arXiv:2106.06103 Retrieved from http://arxiv.org/abs/2106.06103
[57]
Spencer C. Kohn, Ewart J. de Visser, Eva Wiese, Yi Ching Lee, and Tyler H. Shaw. 2021. Measurement of trust in
automation: A narrative review and reference guide. Frontiers in Psychology 12 (2021), 1–23.
DOI:
https://doi.org/10.
3389/fpsyg.2021.604977
[58]
Bing Cai Kok and Harold Soh. 2020. Trust in robots: Challenges and opportunities. Current Robotics Reports 1, 4
(2020), 297–309. DOI: https://doi.org/10.1007/s43154-020-00029-y
[59]
Mahias Kraus, Johannes Kraus, Martin Baumann, and Wolfgang Minker. 2019. Eects of gender stereotypes on trust
and likability in spoken Human-robot interaction. In LREC 2018 - Proceedings of the 11th International Conference on
Language Resources and Evaluation. 112–118.
[60]
Robert M. Krauss, Robin Freyberg, and Ezequiel Morsella. 2002. Inferring speakers’ physical aributes from their
voices. Journal of Experimental Social Psychology 38, 6 (2002), 618–625. 00221031
DOI:
https://doi.org/10.1016/S0022-
1031(02)00510-3
[61]
Katharina Kühne, Martin H. Fischer, and Yuefang Zhou. 2020. e human takes it all: Humanlike synthesized voices
are perceived as less eerie and more likable. Evidence from a subjective ratings study. Frontiers in Neurorobotics 14
(2020), 1–15. DOI: https://doi.org/10.3389/fnbot.2020.593732
[62]
Katharina Kühne, Erika Herbold, Oliver Bendel, Yuefang Zhou, and Martin H. Fischer. 2023. “Ick bin een Berlina”:
Dialect prociency impacts a robot’s trustworthiness and competence evaluation. Frontiers in Robotics and AI 10
(2023), 1–15. 22969144 DOI: https://doi.org/10.3389/frobt.2023.1241519
[63]
Rinaldo Kühne and Jochen Peter. 2023. Anthropomorphism in human–robot interactions: A multidimensional
conceptualization. Communication eory 33, 1 (2023), 42–52. DOI: https://doi.org/10.1093/ct/qtac020
[64]
John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1
(2004), 50–80. DOI: https://doi.org/10.1518/hfes.46.1.50_30392
[65]
Stephen C Levinson. 2020. Natural forms of purposeful interaction among humans: What makes interaction eective?
In Interactive Task Learning.DOI: https://doi.org/10.7551/mitpress/11956.003.0012
[66]
Michael Lewis, Katia Sycara, and Phillip Walker. 2018. e Role of Trust in Human-Robot Interaction. Springer
International Publishing, Cham, 135–159. DOI: https://doi.org/10.1007/978-3-319-64816-3_8
[67]
Mingming Li, Fu Guo, Xueshuang Wang, Jiahao Chen, and Jaap Ham. 2023. Eects of robot gaze and voice human-
likeness on users’ subjective perception, visual aention, and cerebral activity in voice conversations. Computers in
Human Behavior 141 (Apr. 2023), 107645. DOI: https://doi.org/10.1016/j.chb.2022.107645
[68]
Yuanchao Li and Catherine Lai. 2022. Robotic Speech Synthesis: Perspectives on Interactions, Scenarios, and Ethics. Vol.
1, ACM, New York, NY.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:19
[69]
LimeSurvey Project Team/Carsten Schmitz. 2012. LimeSurvey: An Open Source Survey Tool. LimeSurvey Project,
Hamburg, Germany. Retrieved from http://www.limesurvey.org
[70]
Rui Liu and Xiaoli Zhang. 2019. A review of methodologies for natural-language-facilitated human–robot cooperation.
International Journal of Advanced Robotic Systems 16 (2019), 1–17. DOI: https://doi.org/10.1177/1729881419851402
[71]
Christoph Lutz and Aurelia Tamò-Larrieux. 2021. Do privacy concerns about social robots aect use intentions?
Evidence from an experimental vignee study. Frontiers in Robotics and AI 8 (2021).
DOI:
https://doi.org/10.3389/
frobt.2021.627958
[72]
P. Madhavan and D. A. Wiegmann. 2007. Similarities and dierences between human–human and human–automation
trust: An integrative review. eoretical Issues in Ergonomics Science 8, 4 (2007), 277–301.
DOI:
https://doi.org/10.
1080/14639220500337708
[73]
Bertram Malle, Kerstin Fischer, James Young, AJung Moon, and Emily Collins. 2020. Trust and the Discrepancy
between Expectations and Actual Capabilities of Social Robots. Cambridge Scholars Press, 1–23.
[74]
Joseph H. Manson, Gregory A. Bryant, Mahew M. Gervais, and Michelle A. Kline. 2013. Convergence of speech
rate in conversation predicts cooperation. Evolution and Human Behavior 34, 6 (2013), 419–426.
DOI:
https://doi.org/
10.1016/j.evolhumbehav.2013.08.001
[75]
Alessandro Marin Vargas, Lorenzo Cominelli, Felice Dell’Orlea, and Enzo Pasquale Scilingo. 2021. Verbal communi-
cation in robotics: A study on salient terms, research Fields and trends in the last decades based on a computational
linguistic analysis. Frontiers of Computer Science 2 (2021), 1–12. DOI: https://doi.org/10.3389/fcomp.2020.591164
[76]
Fernando Alonso Martin, María Malfaz, Álvaro Castro-GonzáLez, José Carlos Castillo, and Miguel Ángel Salichs.
2020. Four-features evaluation of text to speech systems for three social robots. Electronics (Switzerland) 9, 2 (2020),
1–23. DOI: https://doi.org/10.3390/electronics9020267
[77]
Nikolaos Mavridis. 2015. A review of verbal and non-verbal human-robot interactive communication. Robotics and
Autonomous Systems 63 (1 2015), 22–35. DOI: https://doi.org/10.1016/j.robot.2014.09.031
[78]
Phil McAleer, Alexander Todorov, and Pascal Belin. 2014. How do you say ‘hello’? Personality impressions from
brief novel voices. PLoS ONE 9, 3 (2014), 1–9. DOI: https://doi.org/10.1371/journal.pone.0090779
[79]
Conor Mcginn and Ilaria Torre. 2019. Can you tell the robot by the voice? An exploratory study on the role of voice
in the perception of robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Vol.
2019. IEEE, 211–221. DOI: https://doi.org/10.1109/HRI.2019.8673305
[80]
Joachim Meyer and John D. Lee. 2013. Trust, Reliance, and Compliance. Oxford University Press, 1–28.
DOI:
https:
//doi.org/10.1093/oxfordhb/9780199757183.013.0007
[81]
Cliord Nass and Kwan Min Lee. 2001. Does computer-synthesized speech manifest personality? Experimental tests
of recognition, similarity-araction, and consistency-araction. Journal of Experimental Psychology: Applied 7, 3
(2001), 171–181. DOI: https://doi.org/10.1037/1076-898X.7.3.171
[82]
Manisha Natarajan and Mahew Gombolay. 2020. Eects of anthropomorphism and accountability on trust in
Human robot interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction,
33–42. DOI: https://doi.org/10.1145/3319502.3374839
[83]
Aidan Naughton and Tom Williams. 2021. How to tune your draggin’: Can body language mitigate face threat in
robotic noncompliance?. In Proceedings of the International Conference on Social Robotics. Lecture Notes in Computer
Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 13086,
247–256. DOI: https://doi.org/10.1007/978-3-030-90525-5_21
[84]
Andreea Niculescu, Betsy van Dijk, Anton Nijholt, Haizhou Li, and Swee Lan See. 2013. Making social robots More
aractive: e eects of voice pitch, humor and empathy. International Journal of Social Robotics 5, 2 (2013), 171–191.
DOI: https://doi.org/10.1007/s12369-012-0171-x
[85]
Oliver Niebuhr and Jan Michalsky. 2019. Computer-generated speaker charisma and its eects on Human actions in
a car-navigation system experiment - or how Steve Jobs’ tone of voice can take you anywhere. In Proceedings of
the International Conference on Computational Science and Its Applications (ICCSA ’19). Lecture Notes in Computer
Science (Including Subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 11620,
375–390. DOI: https://doi.org/10.1007/978-3-030-24296-1_31
[86]
Shuichi Nishio, Kohei Ogawa, Yasuhiro Kanakogi, Shoji Itakura, and Hiroshi Ishiguro. 2012. Do robot appearance
and speech aect people’s aitude? Evaluation through the ultimatum Game. Proceedings of the IEEE International
Workshop on Robot and Human Interactive Communication September, 809–814.
DOI:
https://doi.org/10.1109/ROMAN.
2012.6343851
[87]
European Organisation, F O R e, Safety Of, A I R Navigation, European Air, and Trac Management. 2003.
Guidelines for Trust in Future ATM Systems: A Literature Review. European Air Trac Management Programme, 70.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:20 D. Becker et al.
[88] Richard Pak, Nicole Fink, Margaux Price, Brock Bass, and Lindsay Sturre. 2012. Decision support aids with anthro-
pomorphic characteristics inuence trust and performance in younger and older adults. Ergonomics 55, 9 (2012),
1059–1072. DOI: https://doi.org/10.1080/00140139.2012.691554
[89]
Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human Factors 39, 2
(Jun 1997), 230–253. DOI: https://doi.org/10.1518/001872097778543886
[90]
Melanie D. Polkosky and James R. Lewis. 2003. Expanding the MOS: Development and psychometric evaluation of the
MOS-R and MOS-X. International Journal of Speech Technology 6, 2 (2003), 161–182.
DOI:
https://doi.org/10.1023/A:
1022390615396
[91]
Aaron Powers, Adam D.I. Kramer, Shirlene Lim, Jean Kuo, Sau Lai Lee, and Sara Kiesler. 2005. Eliciting information
from people with a gendered humanoid robot. In Proceedings of the IEEE International Workshop on Robot and Human
Interactive Communication, Vol. 2005, 158–163. DOI: https://doi.org/10.1109/ROMAN.2005.1513773
[92]
Diogo Rato, Filipa Correia, André Pereira, and Rui Prada. 2023. Robots in games. International Journal of Social
Robotics 15, 1 (2023), 37–57. DOI: https://doi.org/10.1007/s12369-022-00944-4
[93]
Paul Robinee, Ayanna M. Howard, and Alan R. Wagner. 2017. Eect of robot performance on Human-robot
trust in time-critical situations. IEEE Transactions on Human-Machine Systems 47, 4 (2017), 425–436.
DOI:
https:
//doi.org/10.1109/THMS.2017.2648849
[94]
Paul Robinee, Wenchen Li, Robert Allen, Ayanna M. Howard, and Alan R. Wagner. 2016. Overtrust of robots in
emergency evacuation scenarios. Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction,
101–108. DOI: https://doi.org/10.1109/HRI.2016.7451740
[95]
Julia Rosén, Erik Lagerstedt, and Maurice Lamb. 2022. Is human-like speech in robots deception? In Proceedings of
Human-Robot Interaction (HRI ’22 Workshop). Vol. 1. ACM, New York, NY.
[96]
Henrik Skaug Saetra. 2021. Social robot deception and the culture of trust. Paladyn 12, 1 (2021), 276–286. 20814836
DOI: https://doi.org/10.1515/pjbr-2021-0021
[97]
Busra Sarigul and Burcu A. Urgen. 2023. Audio–visual predictive processing in the perception of humans and robots.
International Journal of Social Robotics 15, 5 (2023), 855–865. DOI: https://doi.org/10.1007/s12369-023-00990-6
[98] Nina Savela, Tuuli Turja, Rita Latikka, and Ae Oksanen. 2021. Media eects on the perceptions of robots. Human
Behavior and Emerging Technologies 3, 5 (2021), 989–1003. DOI: https://doi.org/10.1002/hbe2.296
[99]
Simon Schreibelmayr and Martina Mara. 2022. Robot voices in daily life: Vocal human-likeness and application
context as determinants of user acceptance. Frontiers in Psychology 13 (2022), 1–17.
DOI:
https://doi.org/10.3389/
fpsyg.2022.787499
[100]
Katie Seaborn, Norihisa P. Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2021. Voice in human-agent
interaction: A survey. ACM Computing Surveys 54, 4 (2021), 1–43. DOI: https://doi.org/10.1145/3386867
[101]
Michihiro Shimada and Takayuki Kanda. 2012. What is the appropriate speech rate for a communication robot?
Interaction Studies. Social Behaviour and Communication in Biological and Articial Systems 13, 3 (2012), 408–435.
DOI: https://doi.org/10.1075/is.13.3.05shi
[102]
Georgios Sideridis and Maisaa Taleb S. Alahmadi. 2022. e role of response times on the measurement of mental
ability. Frontiers in Psychology 13 (2022), 1–10. DOI: https://doi.org/10.3389/fpsyg.2022.892317
[103]
Olympia Simantiraki, Martin Cooke, and Simon King. 2018. Impact of dierent speech types on listening eort.
In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Vol.
2018, 2267–2271. DOI: https://doi.org/10.21437/Interspeech.2018-1358
[104]
Valerie K. Sims, Mahew G. Chin, Heather C. Lum, Linda Upham-Ellis, Tatjana Ballion, and Nicholas C. Lagauta.
2009. Robots’ auditory cues are subject to anthropomorphism. In Proceedings of the Human Factors and Ergonomics
Society, Vol. 3, 1418–1421. 10711813 DOI: https://doi.org/10.1518/107118109x12524444079352
[105]
Melissa A. Smith, M. Mowafak Allaham, and Eva Wiese. 2016. Trust in automated agents Is modulated by the
combined inuence of agent and task type. In Proceedings of the Human Factors and Ergonomics Society, 206–210.
10711813 DOI: https://doi.org/10.1177/1541931213601046
[106]
Yao Song, Da Tao, and Yan Luximon. 2023. In robot We trust? e eect of emotional expressions and contextual
cues on anthropomorphic trustworthiness. Applied Ergonomics 109 (May 2023), 103967.
DOI:
https://doi.org/10.
1016/j.apergo.2023.103967
[107]
Hang Su, Wen Qi, Jiahao Chen, Chenguang Yang, Juan Sandoval, and Med Amine Laribi. 2023. Recent advancements
in multimodal human–robot interaction. Frontiers in Neurorobotics 17 (2023), 1–21.
DOI:
https://doi.org/10.3389/
fnbot.2023.1084000
[108]
Rie Tamagawa, Catherine I. Watson, I. Han Kuo, Bruce A. Macdonald, and Elizabeth Broadbent. 2011. e eects of
synthesized voice accents on user perceptions of robots. International Journal of Social Robotics 3, 3 (2011), 253–262.
DOI: https://doi.org/10.1007/s12369-011-0100-4
[109]
Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu. 2021. A survey on neural speech synthesis. arXiv:2106.15561.
Retrieved from http://arxiv.org/abs/2106.15561
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:21
[110]
Ilaria Torre, Jeremy Goslin, Laurence White, and Debora Zanao. 2018. Trust in articial voices: A “congruency
eect” of First impressions and behavioural experience. In Proceedings of the ACM International Conference Proceeding
Series.DOI: https://doi.org/10.1145/3183654.3183691
[111]
Ilaria Torre and Laurence White. 2021. Trust in vocal human–robot interaction: Implications for robot voice design.
In Voice Aractiveness. Springer, Singapore, 299–316. DOI: https://doi.org/10.1007/978-981-15-6627-1_16
[112]
Daniel Ullman, Iolanda Leite, Jonathan Phillips, Julia Kim-Cohen, and Brian Scassellati. 2014. Smart human, smarter
robot: How cheating aects perceptions of social agency. In Proceedings of the 36th Annual Meeting of the Cognitive
Science Society (CogSci ’14), 2996–3001.
[113]
Daniel Ullman and Bertram F. Malle. 2019. Measuring gains and losses in Human-robot trust: Evidence for dieren-
tiable components of trust. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Vol.
2019, IEEE, 618–619. DOI: https://doi.org/10.1109/HRI.2019.8673154
[114]
Daniel Ullrich, Andreas Butz, and Sarah Diefenbach. 2021. e development of overtrust: An empirical simulation
and psychological analysis in the context of Human–robot interaction. Frontiers in Robotics and AI 8 (2021), 1–15.
DOI: https://doi.org/10.3389/frobt.2021.554578
[115]
Jacqueline Urakami and Katie Seaborn. 2023. Nonverbal cues in Human–robot interaction: A communication studies
perspective. ACM Transactions on Human-Robot Interaction 12, 2 (2023), 1–21.
DOI:
https://doi.org/10.1145/3570169
[116]
Ella Velner, Paul P. G. Boersma, and Maartje M. A. De Graaf. 2020. Intonation in robot speech: Does it work the same
as with people? In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 569–578.
DOI:
https://doi.org/10.1145/3319502.3374801
[117]
Lorenzo Vianello, Luigi Penco, Waldez Gomes, Yang You, Salvatore Maria Anzalone, Pauline Maurice, Vincent
omas, and Serena Ivaldi. 2021. Human-humanoid interaction and cooperation: A review. Current Robotics Reports
2, 4 (2021), 441–454. DOI: https://doi.org/10.1007/s43154-021-00068-z
[118]
M. L. Walters, D. S. Syrdal, K. L. Koay, K. Dautenhahn, and R. Te Boekhorst. 2008. Human approach distances to a
mechanical-looking robot with dierent robot voice styles. In Proceedings of the 17th IEEE International Symposium on
Robot and Human Interactive Communication (RO-MAN), 707–712.
DOI:
https://doi.org/10.1109/ROMAN.2008.4600750
[119]
Claire Whang and Hyunjoo Im. 2021. “I Like your suggestion!” e role of humanlikeness and parasocial relationship
on the website versus voice shopper’s perception of recommendations. Psychology and Marketing 38, 4 (2021),
581–595. DOI: https://doi.org/10.1002/mar.21437
[120] Christopher D. Wickens. 1981. Processing Resources in Aention. Academic Press, New York, 63–102.
[121]
Herbert Woodrow. 1911. Reaction Times. Psychological Bulletin 8, 11 (Nov. 1911), 387–390.
DOI:
https://doi.org/10.
1037/h0070885
[122]
Min Xin and Ehud Sharli. 2007. Playing games with robots – A method for evaluating Human-robot interaction. In
Human Robot Interaction.DOI: https://doi.org/10.5772/5208
[123]
Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. 2019. CSTR VCTK Corpus: English Multi-Speaker Corpus
for CSTR Voice Cloning Toolkit (version 0.92). [Sound]. University of Edinburgh. e Centre for Speech Technology
Research (CSTR). DOI: https://doi.org/10.7488/ds/2645
[124]
Jakub Zotowski, Hidenobu Sumioka, Shuichi Nishio, Dylan F. Glas, Christoph Bartneck, and Hiroshi Ishiguro. 2016.
Appearance of a robot aects the impact of its behaviour on perceived trustworthiness and empathy. Paladyn 7, 1
(2016), 55–66. DOI: https://doi.org/10.1515/pjbr-2016-0005
[125]
Joshua Zonca, Anna Folsø, and Alessandra Sciui. 2021. e role of reciprocity in human-robot social inuence.
iScience 24, 12 (2021), 103424. DOI: https://doi.org/10.1016/j.isci.2021.103424
Appendices
A Details on Selected Voices
For the selection of suitable voices for the robot evaluated in the pre-study, the voices of the CSTR
VCTK Corpus were analyzed. e CSTR VCTK Corpus consists of speech data uered by 110
English speakers with various accents. Metadata for all speakers was collected, and an audio sample
of each speaker was subjectively evaluated for the following speaker aributes:
—fundamental frequency (𝑓0)
—noise (breath sounds etc.)
—pitch (high, bit high, neutral, bit slow, slow)
—perceived gender (male, female, neutral)
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:22 D. Becker et al.
—speed (fast, neutral, slow)
—rhythm (monotonic, neutral, expressive)
e collected information for all speakers was utilized for voice selection. Specically, to provide
a wide variety of speech paerns, while providing a distinctive pitch and ensuring the understand-
ability of the generated voice. Table A1 shows the collected and evaluated aributes of the voices
in the pre-study. For the main study, voice p286 was utilized for the natural voice, which could be
described as a distinctively male voice with a calm speech paern. e voice p336 was utilized for
the mechanical voice in the main study, and aer applying the phaser eect, the resulting voice
reminds of a male child’s voice.
Table A1. Details and Estimates of the Selected Voices
ID Age Gender 𝑓0Accent Region Noise Pitch Perceived gender Speed Rhythm
p336 18 F 205 English Surrey No Neutral F Neutral Monotonic
p243 22 M 270 American Iowa No High F Neutral Neutral
p286 23 M 59 American Ohio No Bit low M Neutral Neutral
p285 21 M 96 American New York No Very low M Slow Neutral
B Pre-Study Pitch Preference
To analyze if the pre-study participants’ gender aects the preferred pitch, a repeated measures
proportional odds logistic regression was utilized. Each participant ranks the voices according to
their perceived suitability in the main study. For model estimation, the voices are grouped according
to their pitch (low, neutral, high), and the rank is the dependent variable. e interaction between
pitch and gender is the independent variable. e model estimates are shown in Table B1. e
results do not indicate a relationship between a participant’s gender and a preference for the robot’s
voice pitch.
Table B1. Repeated Measures Proportional Odds Logistic Regression
Model for Preference of Voice Pitch Depended on a Participant’s Gender
Estimate Std. Error z value Pr(>|z|)
Male 0.123 0.282 0.438 0.662
Neutral-pitch 0.600 0.383 1.566 0.117
High-pitch −0.408 0.358 −1.140 0.254
Male : Neutral-pitch −0.531 0.487 −1.091 0.275
Male : High-pitch 0.047 0.462 0.102 0.919
cuts1|2−1.685 0.219 −7.699 <0.001
cuts2|3−0.750 0.213 −3.521 <0.001
cuts3|4−0.043 0.209 −0.208 0.836
cuts4|5 0.660 0.207 3.195 0.001
cuts5|6 1.587 0.202 7.858 <0.001
C Implementation Details
e implementation is separated into frontend, backend, and ROS services. An illustration is
provided in Figure C1. e experiment’s frontend is implemented as a React.js application leveraging
the ree.js WebGL 3D engine. e frontend implementation communicates with the backend via
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:23
an HTTP API. Every action by the participant corresponds to an HTTP call to the Django backend.
Additionally, the frontend is connected to the Django backend’s event stream, which allows the
frontend to instantly react to changes. e event stream is implemented on top of the JavaScript
EventSource specication, which allows transmiing asynchronous events to a browser via a
persistent HTTP connection. e experiment is implemented in the backend as a state machine that
calls the individual ROS services, and updates the frontend, and the opponent. us, the opponent
knows the locations of the participant’s ships. When the player accepts correct advice in the rst
experiment phase, a ship of the opponent is placed at the accepted location for the participant to
hit. Further, the backend utilizes a PostgreSQL database to store the participant’s interactions in
the experiment.
e robot is controlled via ROS services, where each of the robot’s actions corresponds to a ROS
service. Specically, ROS is used to control the robot’s voice lines, gestures, and facial expressions.
e robot’s actions, which are controlled by ROS, have a variety of voice lines that are accompa-
nied by gestures and facial expressions. Depending on whether the advice was accepted or rejected,
the robot displays dierent gestures and facial expressions. Details on the utilized gestures are
provided in the Appendix Dand on the variety of voice lines Appendix E.
Fig. C1. Module diagram of the experiment implementation.
D Implemented Robot Gestures
e voice lines used by the robot are accompanied by gestures. ese gestures increase the perceived
liveliness of the robot. An overview of the implemented gestures is provided in Table D1.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:24 D. Becker et al.
Table D1. List of Robot’s Gestures
Gesture Comment
Point at the touch table Positioning the ships, choosing a target
Shaking its head e opponent misses a shot
Waving Farewell of the participant
umbs up gesture e player shits an opponent’s ship
Move both arms in front of the body e robot provides a hint
Move hands in front of face Opponent hits a participant’s ship
Hands to head e participant misses its shot
Arms up e participant sinks an opponent’s ship
Head down e opponent wins
Greeting gesture First interaction with the participant
E Examples of Voice Lines
e robot in the experiment utilizes a variety of voice lines, which are pseudo-randomly selected,
to increase the robot’s liveliness and participant engagement. During the experiment, the robot
will guide the participant and inform the participant about the next steps. For example, the robot
will say: “e Opponent is now ring their cannons.”, or provide advice: “We just detected a signal
at position A1. I think there might be a ship. Should we change our target to that position?”. e
coordinates, in this example A1, are dynamically generated during the experiment. ese voice
lines are accompanied by gestures and facial expressions. Table E1 shows examples of the utilized
voice lines.
Table E1. Examples of the Robot’s Voice Lines
Action Facial expression Voice lines
Accept advice Happiness
“anks for trusting my advice. Changing target location to:”
“Let’s hope this intel is correct. Changing target location to:”
“Fingers crossed that my spies got the right information. Changing target location to:”
“I hope I decoded this message correctly. Changing target location to:”
“Let’s see if we can count on this spy. Changing target location to:”
Opponent sinks a ship Angry
“Oh no, that was my favorite ship!”
“Ah man, that was a tough one. We lost a good ship there.”
“Well, that’s not how I wanted that round to go.”
“It’s disappointing, but we still have a chance to turn things around.”
Reject advice Sadness
“Alright, you’re the captain.”
“I respect your decision, captain. I’m here to support you no maer what.”
“I trust your judgment, captain.”
“Alright, I’m here to serve you, captain.”
F Mixed-Eect Model for the Accepted Advice
To account for the repeated measures of the participants, a mixed-eect model for advice acceptance
was estimated. e utilized model has an uncorrelated individual intercept and slope for each
participant. is represents that each participant might have a dierent initial trust level, which
individually decreases. ese individual estimates are uncorrelated to ensure that some participants
might have a high trust level, which could either decrease slowly or faster, independent of the
initial trust in the robot’s advice. e results of the mixed-eects model are shown in Table F1.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:25
Table F1. Mixed-Eects Model for Advice Acceptance
Fixed eects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.863 0.263 7.070 <0.001∗∗∗
Natural voice 0.156 0.239 0.650 0.516
Advice −0.222 0.041 −5.442 <0.001∗∗∗
Random eects:
Groups Name Variance Std. Dev.
Participant (Intercept) 0.135 0.367
Participant (Advice) 0.009 0.093
Correlation of Fixed Eects:
Intercept Group
Group −0.400
Advice −0.771 −0.047
∗∗∗p<.001.
e estimated model suggests a negative correlation with the number of advice. is shows that
the participants lost trust in the robot’s advice throughout the experiment and were less likely to
accept later advice. e random eects suggest that this loss in trust was consistent among the
participants, whereas the initial trust in the robot’s advice varies among the participants.
G Correlation between Response Time and the estionnaire Measures
To infer potential inuences on the response time and the assessed questionnaires, the estimated
correlations are provided in Table G1. From the data, no signicant eect of the robot’s perception
on the response time can be estimated.
Table G1. Spearman’s Rank Correlation between the Average Response Time and the Assessed Measures
Godspeed MOS-X MDMT
Item Animacy Likeability Anthropomorphism Intelligence Safety Intelligibility Naturalness Social impression Competent Reliable
𝜌−.006 −.046 .053 −.170 −.033 −.056 −.021 .100 −.124 .161
p-value .963 .707 .665 .166 .786 .649 .864 .418 .314 .190
Received 27 February 2024; revised 19 August 2024; accepted 13 November 2024
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.