ArticlePDF Available

Influence of Robots’ Voice Naturalness on Trust and Compliance

Authors:

Abstract

With the increasing performance of text-to-speech systems and their generated voices indistinguishable from natural human speech, the use of these systems for robots raises ethical and safety concerns. A robot with a natural voice could increase trust, which might result in over-reliance despite evidence for robot unreliability. To estimate the influence of a robot's voice on trust and compliance, we design a study that consists of two experiments. In a pre-study ( N1=60N_{1}=60 ) the most suitable natural and mechanical voice for the main study are estimated and selected for the main study. Afterward, in the main study ( N2=68N_{2}=68 ), the influence of a robot's voice on trust and compliance is evaluated in a cooperative game of Battleship with a robot as an assistant. During the experiment, the acceptance of the robot's advice and response time are measured, which indicate trust and compliance respectively. The results show that participants expect robots to sound human-like and that a robot with a natural voice is perceived as safer. Additionally, a natural voice can affect compliance. Despite repeated incorrect advice, the participants are more likely to rely on the robot with the natural voice. The results do not show a direct effect on trust. Natural voices provide increased intelligibility, and while they can increase compliance with the robot, the results indicate that natural voices might not lead to over-reliance. The results highlight the importance of incorporating voices into the design of social robots to improve communication, avoid adverse effects, and increase acceptance and adoption in society.
Influence of Robots’ Voice Naturalness on Trust and
Compliance
DENNIS BECKER,LUKAS BRAACH,LENNART CLASMEIER,TERESA KAUFMANN,
OSKAR ONG,KYRA AHRENS,CONNOR GÄDE, and ERIK STRAHL,Universität Hamburg,
Hamburg, Germany
DI FU,University of Surrey, Guildford, UK
STEFAN WERMTER,Universität Hamburg, Hamburg, Germany
With the increasing performance of text-to-speech systems and their generated voices indistinguishable from
natural human speech, the use of these systems for robots raises ethical and safety concerns. A robot with a
natural voice could increase trust, which might result in over-reliance despite evidence for robot unreliability.
To estimate the inuence of a robot’s voice on trust and compliance, we design a study that consists of two
experiments. In a pre-study (
𝑁1=60
) the most suitable natural and mechanical voice for the main study are
estimated and selected for the main study. Aerward, in the main study (
𝑁2=68
), the inuence of a robot’s
voice on trust and compliance is evaluated in a cooperative game of Baleship with a robot as an assistant.
During the experiment, the acceptance of the robot’s advice and response time are measured, which indicate
trust and compliance, respectively. e results show that participants expect robots to sound human-like and
that a robot with a natural voice is perceived as safer. Additionally, a natural voice can aect compliance.
Despite repeated incorrect advice, the participants are more likely to rely on the robot with the natural voice.
e results do not show a direct eect on trust. Natural voices provide increased intelligibility, and while they
can increase compliance with the robot, the results indicate that natural voices might not lead to over-reliance.
e results highlight the importance of incorporating voices into the design of social robots to improve
communication, avoid adverse eects, and increase acceptance and adoption in society.
CCS Concepts: Human-centered computing User studies;
Additional Key Words and Phrases: Additional Key Words and Phrases: Human-Robot Interaction, Trust and
Cooperation
e authors gratefully acknowledge support from the German Research Foundation DFG (CML, LeCAREbot), the European
Commission (TRAIL), and the Federal Ministry for Economic Aairs and Climate Action (BMWK) under the Federal
Aviation Research Programme (LuFO), Projekt VeriKAS.
Authors’ Contact Information: Dennis Becker (corresponding author), Universität Hamburg, Hamburg, Germany; e-mail:
dennis.becker-1@uni-hamburg.de; Lukas Braach, Universität Hamburg, Hamburg, Germany; e-mail: lukas.braach@
studium.uni-hamburg.de; Lennart Clasmeier, Universität Hamburg, Hamburg, Germany; e-mail: lennart.clasmeier@
studium.uni-hamburg.de; Teresa Kaufmann, Universität Hamburg, Hamburg, Germany; e-mail: teresa.kaufmann@studium.
uni-hamburg.de; Oskar Ong, Universität Hamburg, Hamburg, Germany; e-mail: oskar.ong@studium.uni-hamburg.de;
Kyra Ahrens, Universität Hamburg, Hamburg, Germany; e-mail: kyra.ahrens@uni-hamburg.de; Connor Gäde, Universität
Hamburg, Hamburg, Germany; e-mail: connor.gaede@uni-hamburg.de; Erik Strahl, Universität Hamburg, Hamburg,
Germany; e-mail: erik.strahl@uni-hamburg.de; Di Fu, University of Surrey, Guildford, UK; e-mail: d.fu@surrey.ac.uk; Stefan
Wermter, Universität Hamburg, Hamburg, Germany; e-mail: stefan.wermter@uni-hamburg.de.
This work is licensed under a Creative Commons Attribution International 4.0 License.
© 2025 Copyright held by the owner/author(s).
ACM 2573-9522/2025/1-ART29
https://doi.org/10.1145/3706066
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:2 D. Becker et al.
ACM Reference format:
Dennis Becker, Lukas Braach, Lennart Clasmeier, Teresa Kaufmann, Oskar Ong, Kyra Ahrens, Connor Gäde,
Erik Strahl, Di Fu, and Stefan Wermter. 2025. Inuence of Robots’ Voice Naturalness on Trust and Compliance.
ACM Trans. Hum.-Robot Interact. 14, 2, Article 29 (January 2025), 25 pages.
https://doi.org/10.1145/3706066
1 Introduction
Despite the majority of research in human-robot interaction emphasizing robot appearance [124],
behavior [9,21], and non-verbal communication [115], natural speech interaction is equally im-
portant [75,77]. Verbal interaction provides accurate and ecient communication and enables
human-robot cooperation with non-expert users in a social environment [70]. A voice transmits a
variety of non-linguistic information [100] and is a strong anthropomorphic cue [81,119]. Even a
simple conversation with a robot renders it more social and human-like [86], and increases the
perceived psychological closeness to the robot [30].
Accepting a robot as a partner in a cooperative task requires trust in the robot’s performance
and reliability [45]. Robots that are perceived as anthropomorphic are preferred as partners for a
cooperative task [35]. Specically, humanoid robots with their anthropomorphic appearance can
simultaneously facilitate interaction and increase expectations about their capabilities and social
skills [47]. However, consistent social interaction is required, and physical or behavioral inconsis-
tencies can render the robot unacceptable [117]. A robot’s voice aects these social interactions,
which creates challenges in assigning a voice that is suitable for the robot and the task [13].
ese voices are synthesized utilizing a Text-to-Speech (TTS) engine, however, nuances of
speech are oen lost during the synthesis [107]. is results in a more mechanical voice, and the
generated voice quality can inuence the perception of the robot [19,39]. With advancements in
deep learning, recent TTS systems can generate speech with characteristics rivaling natural speech
[109]. However, the use of a voice indistinguishable from natural speech raises privacy [71] and
ethical concerns [68] for robotics. Research in human-robot interaction suggests that a robot with
a natural-sounding voice positively inuences the perception of the robot [67] and can increase
trust and perceived competence [85]. Although trust is an essential element for human-robot
cooperation, over-reliance and over-trust can result in accidents [73]. Despite the potential for
unreliability or robot failure, people may place too much trust in the robot despite clear evidence
of robot failure [94]. Over-trust and over-reliance in the presence of robot failure can have severe
consequences [58].
With the increasing use and reliance on voice assistance systems and robotic applications, the
implication of a natural-sounding voice on trust in human-robot interaction has to be researched.
Specically, recent publications emphasize that robot voice design and associated anthropomor-
phism are a pressing research issue [6,99]. erefore, we propose the following research question:
How does a robot with a natural voice aect trust and compliance when it performs unreliably in
comparison to a robot with a mechanical-sounding voice? To estimate the eect of a robot’s voice
naturalness on trust and reliance, we design a study that consists of two parts. Since voices create
a mental image of the speaker [60] and a mismatch between the voice and the robot can create
mistrust [55], a pre-study is conducted to estimate the most suitable neutral and mechanical voice
for the experiment in the main study. In the main study, the participants play an adaptation of
the classic board game Baleship with a robot as their assistant. An illustration is provided in
Figure 1. Board games provide a social environment for interaction with the robot while restricting
the possible actions in the environment [92]. Additionally, board games provide an engaging sce-
nario, and information available to the participants can be restricted, which creates reliance on the
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:3
Fig. 1. Baleship game to measure the dierence in trust and compliance depending on the robot’s voice.
robot [122]. In the experiment, a participant and robot cooperate and the robot provides advice
to the participant for the next move that diers from the participant’s decision in the game. e
participant can either follow or reject the advice, which indicates trust in the robot. Further, the
response time is measured, which indicates compliance with the robot [46].
2 Related Work
A robot’s voice strongly aects a human’s perception, associated aributes, and perceived capa-
bilities [17]. Previous studies have reported a strong inuence of a robot’s accent [110], voice
gender [20], and voice naturalness [79] on the perception of a robot. Specically, a robot with a
local accent is aributed with more credibility [4] and perceived more positively [108]. Further, a
robot’s voice is associated with personality traits [78] and stereotypes, where a deeper male voice
suggests dominance and a female voice suggests a caring personality [28]. Additionally, people
assume that the robot possesses gender-specic knowledge [91]. e pitch of the voice inuences
the perceived interaction quality [84], and voice prosody can alter the perception of the robot [34]
and the willingness to cooperate [74]. A major aspect of a robot’s voice that changes its impression
is the voice’s naturalness [111]. Increasing the voice’s naturalness simultaneously increases the
perceived naturalness of the robot [116], and natural voices are overall preferred by participants
for human-robot interaction [61]. Research suggests that a natural-sounding voice increases the
anthropomorphism of the robot [99,104] and perceived approachability [118]. is increase in
anthropomorphism might inuence trust and could deceive to place trust in the robot beyond its
actual capacities [95] and foster an over-reliance [3].
Trust is an overarching concept that is essential for successful human-robot interaction [54,66].
However, the concept of trust is not uniquely dened in the context of human-robot interaction [14,
125]. A commonly agreed-upon denition of trust in robot automatization denes trust as the need
for reliance in situations of uncertainty and vulnerability [64]. Further, it has been characterized by
the expectation that the robots’ actions are well intended [44] and result in a benecial outcome [96].
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:4 D. Becker et al.
Initial trust in a robot exists before the rst interaction [50] and is inuenced by expectations
about the robot [23,73], media representations of robots [98], and the robot’s physical aributes
[106]. is initial trust is dynamically changing during the interaction [2,52], where a robot
demonstrating competence [12] and observable task performance [18] increases trust, whereas a
poor robot performance can reduce trust [93]. Specically, robot failure and unreliability reduce
trust [26], especially when the unreliability is observed during the early stages of the interaction in
contrast to malfunctions that are observed later during the interaction [25].
Compliance with a robot’s advice and trust are intertwined [80]. e degree to which people are
willing to comply is an indication of trust [16], and compliance with a robot’s advice is a direct
observation of trust [10]. However, trust in a robot’s recommendations depends on the perceived
task suitability [42,43]. Robots are preferred for tasks that require high analytical capabilities
and deductive reasoning utilizing statistics [72] and are less preferred for social tasks [48]. Robots
that exhibit human-like characteristics are perceived as more anthropomorphic and receive more
trust and willingness to follow their advice [41,105]. Similarly, a higher level of trust in the
robots can increase the willingness to seek the robot’s advice [40]. Robots with anthropomorphic
characteristics appear to form a stronger bond [88], are more resilient against breaches of trust
[22], and receive increased trust and compliance [82].
However, increased anthropomorphism creates a tendency to over-trust technology [5] by at-
tributing a larger competence [65] and resilience against trust loss despite decreasing reliability [22].
Especially, repeated observation of a robot’s reliability can lead to over-reliance by considering these
observations as proof for its reliability [114]. is mismatch between trust and the robots’ actual ca-
pabilities and reliability can result in over-trust [58]. In sensitive domains or applications, where lives
or personal well-being are involved, over-trust in technology can and has led to accidents [87,89].
An indication of reliance and compliance with a robot is the response time to the robot’s ad-
vice [66]. In contrast to reaction time, which describes the duration between the onset of a stimulus
and the person’s instinctive reaction [27,121], the response time measures the time between the on-
set of the stimulus and selecting and providing the response [102,120]. erefore, the response time
includes the decision-making process to provide the correct or appropriate response to the stimulus.
A shorter response time indicates an automatic or reex response, and longer response times are asso-
ciated with deliberate mental processing that involves examining all the available information [102].
Reliance can be considered a passive form of compliance, where reliance assumes the correct opera-
tion of the robot and that following the advice would advance their shared goal, whereas compliance
requires verication of the advice [57]. Verication is the process of reevaluating the past robot’s
task accuracy or recommendations to assess its performance, which is an indication of mistrust [49].
However, verifying the robot’s advice is associated with additional eort and time [31].
3 Methodology
3.1 Research Design
To answer the previously dened research question, an experiment is designed that consists of an
online pre-study and an in-person main study. e pre-study determines expectations about robot
voices and the most suitable natural and mechanical voice for the robot in the main experiment. e
main experiment researches the eects of a natural voice in contrast to a mechanical voice on trust
and compliance in a human-robot cooperative game of Baleship. e experiment is conducted in
a between-subject design with two groups, in which the robot has either a natural or mechanical
voice, while the gestures, uerances, and facial expressions are identical in both groups.
Based on the previously conducted research in the eld and the research question, we derive the
subsequent hypotheses:
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:5
H1: e participants will accept more advice from a robot with a natural voice.
H2: e participants exhibit a faster response when the robot has a natural voice.
H3: A longer response time indicates reduced compliance and can lead to advice rejection.
3.2 Pre-Study
Since there is wide variability in terms of voices, such as gender, rhythm, and perceived suitability
of the scenario in the main experiment, a pre-study for the voice selection was conducted. For the
pre-study, six dierent voice samples are examined, and their perceived suitability for the robot in
the main study is estimated. Previous studies suggest that the appearance and stature of a robot
can aect the perceived suitability of the robot’s voice [33]. Furthermore, a dierence in the voice
pitch is sucient to strongly change the participants’ impressions of the voice [17]. erefore,
the dierent voices can be separated into three natural-sounding and three mechanical-sounding
voices, where each group consists of one sample of a neutral-pitch, high-pitch, and low-pitch voice.
e voice samples were generated with a TTS model trained on the VCTK dataset [123] using a
VITS [56] model, which is provided by the coqui-ai TTS library [29]. e utilized model clones a
speaker’s voice and produces state-of-the-art natural-sounding speech. Since the utilized VCTK
Corpus contains sample data from 110 native English speakers, the data set was analyzed in
terms of gender, age, fundamental frequency, accent, noise, pitch, speed, and rhythm. To capture
a wide variety of voices during the pre-study that might be suitable for the robot’s appearance,
younger speakers between 18 and 23 years of age and dierent genders were selected. Further,
the robotic speech counterpart for the speakers was generated to analyze speech’s perception,
paerns, and understandability. e mechanical voice is created by applying a phaser eect to the
generated voice sample. e natural voice is overlaid on itself with a 10-millisecond delay. is
produces a mechanical-sounding voice while retaining the voice characteristics. To represent a
high-pitched voice, a female speaker was selected, while for the remaining lower voices, male
speakers were selected. Finally, to generate the voice samples, voices with a clearly distinguishable
pitch (fundamental frequency
𝑓0
) were selected. From the TTS library, the speaker p336 (
𝑓0=205 Hz
)
is used for the neutral voice, speaker p243 (
𝑓0=270 Hz
) for the high-pitch voice, speaker p286
(
𝑓0=59 Hz
) for the low-natural voice, and speaker p285 (
𝑓0=96 Hz
) for the low-pitch robotic
voice. e range of evaluated voices in the pre-study could be briey described as: female, male,
cartoonish, serious, childlike, and robotic. Further information on the selected voices is provided in
Appendix A.
e pre-study was conducted online using LimeSurvey [69]. An illustration of the pre-study is
shown in Figure 2. In the survey, the participants rst provide informed consent, and then their
expectations about robot voices are assessed. Aerward, a description of the main experiment with
a picture of the robot is provided. e six dierent voice samples are presented in random order,
and the participants rate the characteristics of each voice individually on a ranking scale ranging
from 1 (not at all) to 7 (yes, absolutely). e utilized questionnaire measures the perception and
distinguishability of the voices [76]. Finally, a video that illustrates the purpose of the robot and
the scenario of the main study is shown. Aerward, the participants rank all six voice samples
according to their perceived suitability for the robot in the video. Subsequently, the most suitable
natural and mechanical voice will be compared in the main study.
3.3 Experiment Design
In the main study, the game Baleship is played to evaluate the inuence of a robot’s voice on trust
and compliance. Previous human-robot interaction studies utilized the Baleship game to show
participants the game-play of a human and a robot [112] or as a scenario for a robot as a teacher
for the game [11,51,83]. Baleship is a turn-based guessing game in which both players aempt
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:6 D. Becker et al.
Fig. 2. Online study to assess expectations about robot voices and suitability for the robot in the main study.
to nd the other player’s ships on a two-dimensional 10 x 10 playing eld. Before the game starts,
both players position their ships on their playing eld. Aerward, both players take turns guessing
the position of the other player’s ships. Choosing a position on the playing eld that is occupied by
a ship is considered a hit. Guessing a position that is not occupied by a ship is considered a miss.
If all positions of a ship are hit, the ship is considered sunken and removed from the game. e
player who rst sinks all the opponent’s ships is considered the winner.
e game rules are adapted to ensure a consistent experiment among the participants. Specically,
all ships have a length of two elds, and each player has a total of seven ships. During the game,
when a player guesses the correct location of a ship, an additional turn to sink the already discovered
ship is granted. Accordingly, aer hiing a ship, only the four adjacent elds around the ship can
be selected until the ship is sunk.
During the experiment, the robot does not play the role of the opponent but instead assists the
participant by providing advice. is advice consists of proposing a dierent eld than the one
selected by the participant. e participant can either accept or reject the advice. Initially, the
participants are allowed to freely place their ships on the playing eld, however, the ships will
be sunk by the opponent in a predened order, which threatens the participant to lose the game.
is creates an incentive for the participant to follow the advice given by the robot. Further, the
robot claims to possess analytical and statistical capabilities, and knowledge of the opponent’s ship
position, which is not available to the participant.
e robot’s advice strategy is separated into two phases. In the rst phase, the robot will provide
an advice at every turn if the participant is not currently in the process of sinking a ship. In this
phase, the participant can only hit an opponent’s ship when following the robot’s advice. us, in
the rst phase, all advice is correct. e rst phase ends aer the participant follows two pieces
of the robot’s advice. In the second phase, the robot provides advice every second turn, when the
participant is not in the process of sinking a ship. In contrast to the rst phase, every advice is
incorrect and an accepted advice will result in a miss. e experiment ends aer the participant
received a total of nine pieces of advice.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:7
Fig. 3. Schematic overview and image of the main study setup.
While the rst phase will establish trust in the robot’s capabilities, in the second phase, the
participants will experience the unreliability of the robot’s advice. Despite the robot applying a trust
repair strategy [7] by apologizing and providing a ctional reason for the wrong advice, doubt in
the robot’s reliability will be created. Consequently, the participants should reevaluate the robot’s
reliability over the past turns, which leads to less trust in the robot’s capabilities [24]. is process
is inuenced by the robot’s voice and enables to measure the eect of a natural-sounding voice in
contrast to a mechanical-sounding voice.
3.4 Experiment Setup
e Neuro-Inspired COmpanion (NICO) [53], a child-sized humanoid robot developed for
human-robot interaction studies, is used for the experiment. e participant is seated in front of a
table with a multi-touch interface. e NICO robot is placed behind the table, facing the participant.
Additionally, an experimenter who supervises the experiment is seated behind a partition wall.
Figure 3(a) provides a detailed illustration of the experiment setup, and Figure 3(b) shows the
interaction between a participant and the robot in the experiment.
At the onset of the experiment, the robot greets the participant and introduces the game and
functionality of the multi-touch table, which is used to display and interact with the Baleship
game. An initial start screen allows the participant to freely position the ships on the playing eld.
During the participant’s turn, only the opponent’s playing eld is displayed in the center of the
multi-touch table and a eld can be selected. During the opponent’s turn, both playing elds are
displayed next to each other, and the position that was selected by the opponent is highlighted.
Instructions and advice are provided by the NICO robot and displayed on the multi-touch table.
During the experiment, the NICO robot guides the participant by providing status updates and
describing the next step. Further, the robot will oer advice and apologize if the accepted advice is
incorrect. During these interactions, the robot uses a gesture, a facial expression, and depending on
the situation randomly selects a suitable voice line for the situation. e robot conveys the next
action and emotions by using one of the 10 implemented gestures, such as pointing at the touch
table, shaking its head, or a thumbs-up gesture. e facial expressions used in the experiment are
neutral, angry, happy, and sad. Finally, there are a total of 76 sentences for the robot to be used
in these interactions. Implementation details (Appendix C), examples of the utilized voice lines
(Appendix E), and a list of the implemented gestures (Appendix D) are provided in the Appendix.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:8 D. Becker et al.
A pilot study with eight participants was conducted before the main experiment. e participants
were recruited from the campus of the Informatics department. antitative results showed that
the participants perceived a dierence in the naturalness of both voices. Further, interviewing the
participants conrmed that they understood the game rules and were able to follow the robots’
speech.
3.5 estionnaires and Measurements
To evaluate the eect of the robot’s voice, questionnaires are assessed and advice acceptance and
response time are recorded.
e Godspeed questionnaire [8] is used to capture the participants’ overall perception of the
robot and its behavior and measures the perceived robot anthropomorphism, animacy, likeability,
intelligence, and safety. e Multi-Dimensional Measure of Trust (MDMT) [113] is used to
assess trust along the dimensions of performance trust (reliable, competent). e Mean Opinion
Score eXtended (MOS-X) Scales [90] quanties the participants’ perception of the robot’s voice
in terms of intelligibility, naturalness, social impression, and prosody. e prosody factor is omied
since the prosody for both voices is identical. All the questionnaires were assessed on a 7-point
ranking scale.
As an objective measure of the participant’s trust in the robot, the decision to accept or reject
the robot’s advice is recorded. e response time between the robot’s advice and the participant
pressing the buon on the multi-touch table measures the participant’s compliance with the robot.
3.6 Procedure
Aer providing informed consent to participate in the experiment, the participants are randomly
assigned to one of the experiment conditions. e rules of the baleship game are explained by
the experimenter using an instruction sheet. Next, the participants are escorted to a neighboring
room, where they are seated in front of the touch table facing the NICO robot. e experiment lasts
until the NICO robot provides nine pieces of advice, which requires approximately 30 minutes.
Upon completion of the experiment, the participants are brought back to the initial room and are
presented with the questionnaires. Student participants were granted experiment participant hours.
3.7 Participants
Aer an evaluation by the Ethics Commission of the Department of Informatics at the University,
the participants for the pre-study were recruited online, and participants for the main study were
recruited through announcements at the university’s email lists and social channels. For estimation
of the pre-study sample size, a medium eect size of
𝑓=0.30
was assumed [99], with an
𝛼=.05
and a statistical power of
.95
. Simulation of these assumptions yielded an estimated sample size
of 56 participants. e experiment was completed by 62 participants. Two participants answered
the control question incorrectly and were removed from the analysis. As a control question, the
participants had to answer a question regarding the content of the provided audio sample. Of the
remaining 60 participants, 30 were male, 29 were female, and one participant preferred not to
disclose their gender. e participants’ ages ranged from 18 to 62 years (
𝑀=26.18
,
𝑆𝐷 =9.74
). e
majority of the participants were students (52%) followed by those working part-time or full-time
(35%). Forty-ve participants had no prior experience with robots, 14 participants stated experience
with robots, and one participant worked with them regularly.
Previous studies [38,110] report a small eect size (
𝜑=0.16
) of a robot’s voice on trust. e
sample size for the main study was estimated with an
𝛼=.05
and a statistical power of
.95
. e
estimated sample size was 30 participants for each experiment group. A total of 73 participants
completed the main study, and the participants from the pre-study were excluded from participating.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:9
Fig. 4. Expectations about a robot’s voice.
Due to technical issues, ve participants had to be excluded, resulting in a total of 68 participants for
the data analysis. e participants were equally distributed between both experimental conditions.
Of these participants, 42 were male, 25 were female, and one participant did not wish to provide
information. e ages ranged from 18 to 64 years (
𝑀=27.45
,
𝑆𝐷 =8.08
). Most of the participants
were students (63%) followed by those working part-time or full-time (34%). irty-one participants
had no prior experience with robots, 29 participants had prior experience with robots, and 8
participants stated to work with robots regularly.
4 Results
e pre-study estimates expectations regarding robot voices and the most suitable natural and
mechanical voice for the robot in the main study. e results of the main study present the statistical
analysis of the dierence between the natural and mechanical voice on the perception of the robot,
advice acceptance (H1), and response time (H2 and H3).
4.1 Pre-Study
Before the robot and the scenario of the main study were introduced, the participants’ expectations
about a robot’s voice were assessed. An illustration of the responses is provided in Figure 4.
e responses show that the participants did not expect a robot’s voice to sound mechanical or
creepy, but instead expect a robot to sound human-like. Further, most participants (60%) strongly
disagree that a robot’s voice should be male and it is indicated that a robot’s voice is not expected
to sound genderless. e answers regarding the expected comfortableness of interaction reect
uncertainty in robot voices, as illustrated by the spread of responses.
Aer assessing the expectations about a robot’s voice, a picture of the NICO robot was shown to
the participants. Further, the participants listened to an audio sample of each voice followed by a
questionnaire assessment of each voice’s perception. Finally, the choice of the most suitable natural
and mechanical voice for the main experiment is based on the participants’ suitability ranking
for the scenario of the main study. e suitability was assessed aer introducing the scenario and
showing the participants a video of the robot in the conditions of the main study. A Kruskal-Wallis
rank sum test of these suitability rankings indicates signicant dierences among the suitability of
the various voices (
𝜒2(5)=54.75
,
p<.001
). e assessed suitability ranking for each voice with
standard errors is illustrated in Figure 5.
It is apparent that in the group of natural voices, the low-pitch natural voice was rated as most
suitable for the study scenario, followed by the neutral-pitch voice in the mechanical voice group.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:10 D. Becker et al.
Fig. 5. Voice suitability ranking regarding their suitability for the main study. Values indicate p-value,
p
<
.05, ∗∗∗p<.001.
Table 1. Perceived Speech Dierence between the Voices Utilized in the
Main Study
Measure Mean (𝑆𝐷) p-value
Low natural Neutral mechanical
Understandability 6.62 (0.87) 5.05 (1.62) <.001∗∗∗
Mechanic 2.63 (1.73) 6.33 (1.32) <.001∗∗∗
Expressiveness 4.88 (1.56) 3.12 (1.42) <.001∗∗∗
Appealing 4.95 (1.60) 2.68 (1.44) <.001∗∗∗
Intelligibility 5.07 (1.52) 3.80 (1.54) <.001∗∗∗
Credibility 5.02 (1.65) 3.82 (1.65) <.001∗∗∗
Suitability 4.02 (1.87) 3.55 (1.80) <.179
∗∗∗p<.001.
erefore, the specic eects of these two voices on trust and compliance are researched in the
main study. e natural voice could be considered distinctively male and with a calm speech paern,
whereas the mechanical voice reminds of a child’s voice. us, the natural voice might be suitable
for the role of a caption, whereas the mechanical voice could be in line with the childlike appearance
of the robot. e mechanical low-pitch voice, which sounds robotic, and the mechanical high-pitch
voice, which reminds of a cartoon voice, received the lowest suitability rankings. Further analysis
did not reveal a relationship between the participant’s gender and pitch preference in the data
(Appendix B). A direct comparison of the low-pitch natural and the neutral-pitch mechanical voice
in the aspects of perception and distinguishability is shown in Table 1.
A Mann–Whitney Utest shows that the selected voices signicantly dier in all the assessed
aspects of perception and distinguishability, except for their suitability in the main study. A non-
signicant dierence in the suitability of the voices for the main study is preferable since it does not
provide the participants with an indication of the study’s objective. From the signicant dierences,
it is notable that the direct comparison reects the participants’ expectations about robot voices,
since the natural voice is perceived as more appealing (
𝑊=563
,
p<.001
) and aributed a larger
credibility (
𝑊=1077.5
,
p<.001
). Further, the natural voice is beer understandable (
𝑊=709.5
,
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:11
Fig. 6. Godspeed questionnaire for a robot with a neutral and a mechanical voice. Values indicate p-value,
∗∗p<.01.
p<.001
) and more intelligible (
𝑊=979.5
,
p<.001
) than the mechanical voice. Finally, the
mechanical voice is perceived as more mechanical (
𝑊=3356.5
,
p<.001
) as intended by applying
the phaser eect to the natural voice.
4.2 Main Study
e main experiment was conducted, using the most suitable natural and mechanical voice deter-
mined in the pre-study, To estimate the dierence in perception of the robot depending on its voice,
the Godspeed questionnaire was assessed. e results of the Godspeed questionnaire with mean
and standard error for both experiment groups are illustrated in Figure 6.
A Mann–Whitney Utest suggests a signicant dierence (
𝑊=344
,
𝑝=.004
) in the perceived
robot safety between the robot with the neutral (
𝑀=5.08
,
𝑆𝐷 =1.15
) and mechanical voice
(𝑀=4.37,𝑆𝐷 =0.97).
As a self-assessed measure of the robot’s competence and reliability in the experiment, the
MDMT questionnaire was assessed. An illustration of the average ratings is shown in Figure 7(a).
However, a Mann–Whitney Utest does not reveal a signicant dierence between both robots.
Similarly, the MOS-X questionnaire assesses the dierence regarding the voices, and the results
for both robots are shown in Figure 7(b). A Mann–Whitney Utest reveals a signicant dierence
in the measure of intelligibility and naturalness. e intelligibility of the natural voice (
𝑀=5.72
,
𝑆𝐷 =1.26
) is signicantly higher (
𝑊=199.5
,
p<.001
) than the mechanical voice (
𝑀=3.91
,
𝑆𝐷 =1.40
). Likewise, the perceived naturalness of the natural voice (
𝑀=4.83
,
𝑆𝐷 =1.36
)
signicantly diers (𝑊=177.5,p<.001) from the mechanical voice (𝑀=2.88,𝑆𝐷 =1.29).
During the experiment, the robot provided advice to the participant, and the response time to
either reject or accept the advice was measured as well as their response. e participants’ advice
acceptance rate is shown in Figure 8(a). To evaluate if participants accept more advice from the
robot with the natural voice (H1), a chi-square test is applied to compare the number of accepted
advice for the natural voice (
𝑁=212
) and the mechanical voice (
𝑁=201
). However, the Chi-square
test does not indicate a signicant dierence (
𝜒2(1, 𝑁 =612)=0.745
,
p=.388
). e estimates of a
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:12 D. Becker et al.
Fig. 7. Assessed dierences in the natural and mechanical voice. Values indicate p-value, ∗∗∗p<.001.
Fig. 8. Probability of accepted advice and average response time for each advice throughout the experiment.
Table 2. Spearman’s Rank Correlation between the Advice Acceptance Rate and the Assessed Measures
Godspeed MOS-X MDMT
Item Animacy Likeability Anthropomorphism Intelligence Safety Intelligibility Naturalness Social impression Competent Reliable Response time
𝜌.020 .275 .192 .154 .079 .210 .202 .146 .126 .205 .287
p-value .871 .023.116 .209 .524 .086 .098 .236 .304 .094 .018
p<.05.
mixed-eects model for advice acceptance that shows the reduction of the probability of accepting
advice throughout the experiment are provided in Appendix F.
e average response time in seconds for the individual advice is shown in Figure 8(b). A
dierence in response time for both groups refers to H2, and a Mann–Whitney Utest shows that
the experiment group with the natural voice (
𝑀=5.30
,
𝑆𝐷 =8.93
) exhibited a signicantly shorter
response time to the robot’s advice over the course of the experiment (
𝑊=42259
,
p=.037
) than
the group with the mechanical voice (𝑀=6.38,𝑆𝐷 =2.85).
To evaluate H3, the Spearman’s rank correlation between the assessed questionnaires and
response time on the probability of accepting the robot’s advice is estimated and shown in Table 2.
e correlation analysis shows that a longer response time results in a higher probability of rejecting
the robot’s advice (
𝑟(66)=.205
,
p=.018
). Further, with an increased likability of the robot, the
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:13
participants are more inclined to follow the robot’s advice (
𝑟(66)=.275
,
p=.023
). e estimates
of the correlation between the response time and the questionnaire measures are provided in
Appendix G.
5 Discussion
From the responses of the expectations about robot voices in the pre-study, it can be noticed that
the participants did not expect a robot to sound mechanical but instead human-like. In terms of the
expected voice gender, the response paern is inconclusive. Although a robot’s voice should not
necessarily be male or genderless, it appears that a robot’s gender might depend on the application
and the robot’s task. Congruently, aer introducing the main scenario, the participants rated the
suitability of the deeper male-sounding voice as higher than the high-pitched voice. is suggests
that for the introduced baleship scenario preconceived gender roles might have inuenced their
choice [20,38,59].
e preference for natural voices [61] is further supported by the participants’ suitability ranking
for the robot in the scenario, where the natural voices are considered more suitable than their
mechanical counterparts except for the neutral-pitch voice. e comparison of the voice charac-
teristics of the low natural voice and the neutral mechanical voice shows that the natural voice
is perceived as more expressive, appealing, and credible. For the scenario of the main study, the
participants might prefer a natural and low-pitched voice that expresses calmness and authority.
e Baleship scenario might be perceived as a task that bears uncertainty and responsibility,
which requires the robot to portray calmness and condence. Additionally, natural voices can be
easier understood [103] as shown by the dierence in intelligibility. e suitability ranking does
not show a signicant dierence for the voices selected for the main study. erefore, using the
neutral-pitch mechanical voice in the main experiment should not reveal the study objective to the
participants.
In the main study, the evaluation of the Godspeed questionnaire shows a dierence in the
perceived safety of both robots. e robot in the experiment is perceived as safer when speaking in
a natural voice. However, the Godspeed questionnaire does not indicate a dierence in perceived
anthropomorphism. e MDMT does not show a dierence in the robot’s competence and reliability.
e analysis of the perception of the voices shows that the participants noticed a clear distinction
between both robots’ voices in the main experiment. e natural voice is easier to understand and
perceived as more natural than its mechanical counterpart, which is identical to the participants’
perception of the voices in the pre-study. Additionally, the results of the main study reveal that both
voices do not dier in their social impression. e results suggest that anthropomorphism might
be strongly aected by a robot’s appearance and movements [32] as opposed to its voice. During
the interaction, personality and moral values might be further aributed to the robot which aects
the perceived robot’s anthropomorphism [63]. erefore, the results do not show any dierence in
the perceived robot’s anthropomorphism and the robot’s social impression.
To analyze the eect of a natural voice in comparison to a mechanical voice on trust, the number
of accepted advice in both groups was compared. From the acceptance rate over the course of the
experiment, it can be noticed that most participants were inclined to accept early advice from the
robot, which resulted in hiing an opponent’s ship. However, since any advice aer the second
accepted advice was incorrect and would result in a miss, trust in the robot declined throughout the
experiment. e hypothesis H1 suggests that the participants in the natural voice experiment group
are more likely to accept the advice. However, the statistical analysis does not show a dierence in
both groups. Surprisingly, the estimated eect size (
𝜑=.035
) is tiny [37], which is in contrast to
the assumed small eect size stated in the literature. Although a mechanical voice can be perceived
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:14 D. Becker et al.
as eerie, the inuence of a robot’s voice on trust appears smaller than previously reported or does
not directly aect trust [1].
H2 proposes that the participants in the experiment group with the natural voice robot are
more likely to rely on the robot’s advice and are less likely to reconsider the outcome of the past
robot’s advice, thus having a shorter response time. While at the beginning of the experiment, the
participants consider their options, the response time decreases throughout the experiment. is can
also be aributed to an increase in familiarity with the procedure. ere is a signicant dierence in
the participants’ response times between both experiment groups exhibiting a small eect (
𝑟=.084
).
is suggests that the participants might aribute more competence to the robot with the natural
voice and are more likely to comply with the robot’s advice. However, the mechanical voice was
perceived as less intelligible, which might require more listening eort and could increase response
time. Since the procedure of following or rejecting advice was repetitive and textual instructions
were provided on the touch table, this eect on the response time could be minor. Additionally, the
correlation analysis did not suggest a signicant relationship between the intelligibility of the voice
and following the advice. Furthermore, it might be that a voice’s aributes indirectly inuence
trust and reliance. Specically, the aributes of a voice could aect the perceived competence [62,
101], which might inuence trust and reliance.
e correlation analysis shows that the response time has a moderate (
𝜌=.287
) negative
inuence on the probability of accepting the robot’s advice. As suggested in H3, the participants
who do not rely on the robot and reevaluate the past performance by previously following the
incorrect advice will doubt the robot’s capabilities and reject further advice. In addition, the analysis
shows that an increased likeability of the robot inclines the participants to follow the robot’s advice.
is emphasizes that the concept of trust might comprise of many aspects and can be inuenced
by the robot’s likeability [15].
6 Limitations
Certain limitations in this study should be addressed in future research. Dening and eectively
measuring trust is uniquely challenging. is study focused on an advice-taking scenario and the
relationship between trust and compliance. However, various factors can inuence trust in a robot,
as indicated by the study results. Further, the selection of the most suitable voice in the main study
focused on their fundamental frequency. Although the results show that a change in frequency
aects the perception of the robot, there are additional aspects to vocal interactions, such as social
and emotional aspects [36], which could serve as directions for future research. Finally, the study
was centered around a humanoid robot. Humanoid robots have a strong representation in the
media, which might shape the public’s perception and assumed capabilities. For dierent robot
appearances, the expected voice and resulting eects might dier [97].
7 Conclusion
Prior research suggests that natural voices can increase the anthropomorphization of robots, which
might lead to aributing a robot more capability and increased knowledge. us, the inuence
of a robot with a natural voice in contrast to a mechanical voice on trust and compliance was
investigated. e study consisted of a pre-study to analyze the expectations about robots’ voices
and determine the most suitable mechanical and natural voice for the robot in the main experiment.
In the main experiment, the participants were assisted by a robot in the game Baleship. e robot
presented itself as possessing exclusive knowledge, but all the advice provided past the second
advice was incorrect. is required the participants to realize that the robot’s advice did not provide
benets and instead they had to rely on themselves. e evaluation of the assessed questionnaires
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:15
shows a dierence in the perceived robot’s safety, where a robot with a natural voice is perceived
as safer than a robot with a mechanical voice.
A comparison of the number of accepted advice does not reveal any disparity in trust. However,
the participants who received assistance from the robot with the mechanical voice required more
time to decide whether to either follow or reject the robot’s advice. is indicates a dierence in
compliance, where the participants in the experiment group with the natural voice rely stronger
on the robot’s assistance. e participants who received advice from a robot with a mechanical
voice might have evaluated the robot’s past performance and second-guessed the robot’s benet,
thus leading to a longer response time.
e pre-study shows that people expect robots to sound natural and that interaction with them
should be comfortable. Further, a natural voice provides increased intelligibility, which can avoid
misunderstandings. e results of this study suggest that a robot’s voice naturalness does not
directly aect trust, but reveals an eect on perceived safety and compliance. e advantages of a
natural voice have to weigh against potential disadvantages, depending on the robot’s use case. For
instance, for industrial and safety-relevant applications, over-reliance on robots should be avoided.
In summary, it appears that a natural and easily intelligible voice is well-suited for cooperative
tasks.
Robots’ voices play a crucial role in shaping people’s perception and expectations of robots. As
shown by this study, a robot’s voice suitability depends on the robot and the scenario the robot is
used in. e research conducted sheds new light on people’s expectations about robots’ voices and
provides evidence that a robot’s voice aects trust and anthropomorphism less than previously
reported. ese results can nurture future research on creating more eective voice interfaces for
robots and subsequently increasing their acceptance and adoption in society.
Acknowledgments
Many thanks to Shrey Dixit, Tassilo Hahm, Haruka Inoba, Katharina Meyer-Lüters, and Mai Nhi
Tran for their contribution to the project.
References
[1]
Amal Abdulrahman and Deborah Richards. 2022. Is natural necessary? Human voice versus synthetic voice for
intelligent virtual agents. Multimodal Technologies and Interaction 6, 7 (2022), 1–17.
DOI:
https://doi.org/10.3390/
mti6070051
[2]
Abdulaziz Abubshait and Eva Wiese. 2017. You Look human, but act like a machine: Agent appearance and behavior
modulate dierent aspects of Human-robot interaction. Frontiers in Psychology 8 (2017), 1–12.
DOI:
https://doi.org/
10.3389/fpsyg.2017.01393
[3]
Ruben Alonso, Emanuele Concas, and Diego Reforgiato Recupero. 2021. An abstraction Layer exploiting voice
assistant technologies for eective human—robot interaction. Applied Sciences (Switzerland) 11, 19 (2021), 1–18.
DOI:
https://doi.org/10.3390/app11199165
[4]
Sean Andrist, Micheline Ziadee, Halim Boukaram, Bilge Mutlu, and Majd Sakr. 2015. Eects of culture on the
credibility of robot speech: A comparison between English and arabic. In Proceedings of the ACM/IEEE International
Conference on Human-Robot Interaction, Vol. 2015, 157–164. DOI: https://doi.org/10.1145/2696454.2696464
[5]
Alexander M. Aroyo, Jan De Bruyne, Orian Dheu, Eduard Fosch-Villaronga, Aleksei Gudkov, Holly Hoch, Steve
Jones, Christoph Lutz, Henrik Saetra, Mads Solberg, and Aurelia Tamò-Larrieux. 2021. Overtrusting robots: Seing a
research agenda to mitigate overtrust in automation. Paladyn, Journal of Behavioral Robotics 12, 1 (2021), 423–436.
20814836 DOI: https://doi.org/10.1515/pjbr-2021-0029
[6]
Mahew P. Ayle, Selina Jeanne Suon, and Yolanda Vazquez-Alvarez. 2019. e right kind of unnatural: Designing
a robot voice. In Proceedings of the ACM International Conference Proceeding Series, 5–6.
DOI:
https://doi.org/10.1145/
3342775.3342806
[7]
Anthony L. Baker, Elizabeth K. Phillips, Daniel Ullman, and Joseph R. Keebler. 2018. Toward an understanding of
trust repair in human-robot interaction: Current research and future directions. ACM Transactions on Interactive
Intelligent Systems 8, 4 (2018), 1–30. DOI: https://doi.org/10.1145/3181671
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:16 D. Becker et al.
[8]
Christoph Bartneck, Dana Kulić, Elizabeth Cro, and Susana Zoghbi. 2009. Measurement instruments for the
anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal
of Social Robotics 1 (2009), 71–81. DOI: https://doi.org/10.1007/s12369-008-0001-3
[9]
Christian Becker-Asano, Takayuki Kanda, Carlos Ishi, and Hiroshi Ishiguro. 2009. How about laughter? Perceived
naturalness of two laughing humanoid robots. In Proceedings of the 3rd International Conference on Aective Computing
and Intelligent Interaction and Workshops (ACII ’09).DOI: https://doi.org/10.1109/ACII.2009.5349371
[10]
Annika Boos, Olivia Herzog, Jakob Reinhardt, Klaus Bengler, and Markus Zimmermann. 2022. A compliance–
reactance framework for evaluating Human-robot interaction. Frontiers in Robotics and AI 9 (2022), 1–13.
DOI:
https://doi.org/10.3389/frobt.2022.733504
[11]
Gordon Briggs, Tom Williams, Ryan Blake Jackson, and Mahias Scheutz. 2022. Why and how robots should say
‘No’. International Journal of Social Robotics 14, 2 (2022), 323–339. DOI: https://doi.org/10.1007/s12369-021-00780-y
[12]
Natalia Calvo-Barajas, Giulia Perugia, and Ginevra Castellano. 2020. e eects of robot’s facial expressions on
children’S First impressions of trustworthiness. In Proceedings of the 29th IEEE International Conference on Robot and
Human Interactive Communication, 165–171. DOI: https://doi.org/10.1109/RO-MAN47096.2020.9223456
[13]
Julia Cambre and Chinmay Kulkarni. 2019. One voice ts all? Social implications and research challenges of
designing voices for Smart devices. Proceedings of the ACM on Human-Computer Interaction 3 (2019), 1–19.
DOI:
https://doi.org/10.1145/3359325
[14]
David Cameron, Jonathan M Aitken, Emily C Collins, Luke Boorman, Adriel Chua, Samuel Fernando, Owen McAree,
Uriel Martinez-Hernandez, and James Law. 2015. Framing factors: e importance of context and the individual in
understanding trust in Human-robot interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS).
[15]
David Cameron, Stevienna de Saille, Emily C. Collins, Jonathan M. Aitken, Hugo Cheung, Adriel Chua, Ee Jing Loh,
and James Law. 2021. e eect of social-cognitive recovery strategies on likability, capability and trust in social
robots. Computers in Human Behavior 114 ( Jan. 2021), 106561. DOI: https://doi.org/10.1016/j.chb.2020.106561
[16]
Eric T. Chancey, James P. Bliss, Yusuke Yamani, and Holly A. H. Handley. 2017. Trust and the compliance-reliance
paradigm: e eects of Risk, error Bias, and reliability on trust and dependence. Human Factors 59, 3 (2017), 333–345.
DOI: https://doi.org/10.1177/0018720816682648
[17]
Rebecca Cherng Shiow Chang, Hsi Peng Lu, and Peishan Yang. 2018. Stereotypes or Golden rules? Exploring likable
voice traits of social robots as active aging companions for tech-savvy baby Boomers in Taiwan. Computers in Human
Behavior 84 (2018), 194–210. DOI: https://doi.org/10.1016/j.chb.2018.02.025
[18]
Jessie Y.C. Chen, Michael J. Barnes, and Michelle Harper-Sciarini. 2011. Supervisory control of multiple robots:
Human-performance issues and user-interface design. IEEE Transactions on Systems, Man and Cybernetics Part C:
Applications and Reviews 41, 4 (2011), 435–454. DOI: https://doi.org/10.1109/TSMCC.2010.2056682
[19]
F. Cid, R. Cintas, L. J. Manso, L. Calderita, A. Sánchez, and P. Núñez. 2011. A real-time synchronization algorithm
between text-to-speech (TTS) system and robot mouth for social robotic applications. Proceedings of Workshop of
Physical Agents.
[20]
Charles R. Crowell, Mahias Scheutz, Paul Schermerhorn, and Michael Villano. 2009. Gendered voice and robot
entities: Perceptions and reactions of Male and female subjects. In Proceedings of the IEEE/RSJ International Conference
on Intelligent Robots and Systems, IROS 2009. 3735–3741. DOI: https://doi.org/10.1109/IROS.2009.5354204
[21]
M. M. A. De Graaf, S. Ben Allouch, and J. A. G. M. Van Dijk. 2015. What makes robots social?: A user’s perspective
on characteristics for social Human-robot interaction. In Proceedings of the International conference on Social Robotics.
Lecture Notes in Computer Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in
Bioinformatics), Vol. 9388, 184–193. DOI: https://doi.org/10.1007/978-3-319-25554-5_19
[22]
Ewart J. de Visser, Samuel S. Monfort, Ryan McKendrick, Melissa A.B. Smith, Patrick E. McKnight, Frank Krueger,
and Raja Parasuraman. 2016. Almost human: Anthropomorphism increases trust resilience in cognitive agents.
Journal of Experimental Psychology: Applied 22, 3 (2016), 331–349. DOI: https://doi.org/10.1037/xap0000092
[23]
Ewart J. de Visser, Marieke M.M. Peeters, Malte F. Jung, Spencer Kohn, Tyler H. Shaw, Richard Pak, and Mark A.
Neerincx. 2020. Towards a theory of longitudinal trust calibration in human–robot teams. International Journal of
Social Robotics 12, 2 (2020), 459–478. DOI: https://doi.org/10.1007/s12369-019-00596-x
[24]
Peter de Vries, Cees Midden, and Don Bouwhuis. 2003. e eects of errors on system trust, Self-condence, and the
allocation of control in route planning. International Journal of Human Computer Studies 58, 6 (Jun. 2003), 719–735.
DOI: https://doi.org/10.1016/S1071-5819(03)00039-9
[25]
Munjal Desai, Poornima Kaniarasu, Mikhail Medvedev, Aaron Steinfeld, and Holly Yanco. 2013. Impact of robot
failures and feedback on real-time trust. In Proceedings of the ACM/IEEE International Conference on Human-Robot
Interaction, 251–258. DOI: https://doi.org/10.1109/HRI.2013.6483596
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:17
[26]
Munjal Desai, Mikhail Medvedev, Marynel Vázquez, Sean McSheehy, Soa Gadea-Omelchenko, Christian Bruggeman,
Aaron Steinfeld, and Holly Yanco. 2012. Eects of changing reliability on trust of robot systems. In Proceedings of the
7th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI ’12). ACM, New York, NY, 73–80.
DOI: https://doi.org/10.1145/2157689.2157702
[27]
F. C. Donders. 1969. On the Speed of mental processes. Acta Psychologica 30, C (1969), 412–431. 00016918
DOI:
https://doi.org/10.1016/0001-6918(69)90065-1
[28]
Xiao Dou, Li Yan, Kai Wu, and Jin Niu. 2022. Eects of voice and lighting color on the social perception of Home
healthcare robots. Applied Sciences (Switzerland) 12, 23 (2022), 1–14. DOI: https://doi.org/10.3390/app122312191
[29] Gölge Eren and e Coqui TTS team. 2021. Coqui TTS. DOI: https://doi.org/10.5281/zenodo.6472420
[30]
Friederike Eyssel, Dieta Kuchenbrandt, Simon Bobinger, Laura De Ruiter, and Frank Hegel. 2012. ‘If you sound
like me, You must be more human’: On the interplay of robot and user features on Human-robot acceptance and
anthropomorphism. In Proceedings of the 7th Annual ACM/IEEE International Conference on Human-Robot Interaction
(HRI’12), 125–126. DOI: https://doi.org/10.1145/2157689.2157717
[31]
Neta Ezer, Arthur D. Fisk, and Wendy A. Rogers. 2008. Age-related dierences in reliance behavior aributable to
costs within a human-decision aid system. Human Factors 50, 6 (Dec. 2008), 853–863.
DOI:
https://doi.org/10.1518/
001872008X375018
[32]
Julia Fink. 2012. Anthropomorphism and human likeness in the design of robots and human-robot interaction. In
Proceedings of the International Conference on Social Robotics. Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 7621, 199–208. 03029743
DOI:
https://doi.org/10.1007/978-3-642-34103-8_20
[33]
Kerstin Fischer and Oliver Niebuhr. 2023. Which voice for which robot? Designing robot voices that indicate robot
size. ACM Transactions on Human-Robot Interaction 12, 4 (2023), 1–24. DOI: https://doi.org/10.1145/3632124
[34]
Kerstin Fischer, Oliver Niebuhr, Lars C. Jensen, and Leon Bodenhagen. 2020. Speech melody maers—How robots
prot from using charismatic speech. ACM Transactions on Human-Robot Interaction 9, 1 (2020), 1–21.
DOI:
https:
//doi.org/10.1145/3344274
[35]
Marlena R. Fraune. 2020. Our robots, our team: Robot anthropomorphism moderates group eects in human–robot
teams. Frontiers in Psychology 11 (2020), 1–14. DOI: https://doi.org/10.3389/fpsyg.2020.01275
[36]
Changzeng Fu, Qi Deng, Jingcheng Shen, Hamed Mahzoon, and Hiroshi Ishiguro. 2022. A preliminary study on
realizing Human–robot mental comforting dialogue Via sharing experience emotionally. Sensors 22, 3 (2022), 1–15.
DOI: https://doi.org/10.3390/s22030991
[37]
David C. Funder and Daniel J. Ozer. 2019. Evaluating eect size in psychological research: Sense and nonsense.
Advances in Methods and Practices in Psychological Science 2, 2 (2019), 156–168.
DOI:
https://doi.org/10.1177/
2515245919847202
[38]
Darci Gallimore, Joseph B. Lyons, y Vo, Sean Mahoney, and Kevin T. Wynne. 2019. Trusting robocop: Gender-based
eects on trust of an autonomous robot. Frontiers in Psychology 10 (2019), 1–9.
DOI:
https://doi.org/10.3389/fpsyg.
2019.00482
[39]
Norina Gasteiger, Jong Yoon Lim, Mehdi Hellou, Bruce A. MacDonald, and Ho Seok Ahn. 2022. A scoping review of
the literature on prosodic elements related to emotional speech in Human-robot interaction. International Journal of
Social Robotics 16 (2022), 659–670. DOI: https://doi.org/10.1007/s12369-022-00913-x
[40]
Ioanna Giorgi, Aniello Minutolo, Francesca Tiroo, Oksana Hagen, Massimo Esposito, Mario Gianni, Marco Palomino,
and Giovanni L. Masala. 2023. I am robot, your health adviser for older adults: Do you trust my advice? International
Journal of Social Robotics, 12 (2023), 1981–1991. DOI: https://doi.org/10.1007/s12369-023-01019-8
[41]
Jennifer Goetz, Sara Kiesler, and Aaron Powers. 2003. Matching robot appearance and behavior to tasks to improve
human-robot cooperation. In Proceedings of the IEEE International Workshop on Robot and Human Interactive
Communication, 55–60. DOI: https://doi.org/10.1109/ROMAN.2003.1251796
[42]
Dale L. Goodhue. 1995. Understanding user evaluations of information systems. Management Science 41, 12 (1995),
1827–1844. DOI: https://doi.org/10.1287/mnsc.41.12.1827
[43]
Dale L. Goodhue and Ronald L. ompson. 1995. Task-technology t and individual performance. MIS arterly:
Management Information Systems 19, 2 (1995), 213–233. DOI: https://doi.org/10.2307/249689
[44]
Peter A. Hancock, Deborah R. Billings, Kristin E. Schaefer, Jessie Y.C. Chen, Ewart J. De Visser, and Raja Parasuraman.
2011. A meta-analysis of factors aecting trust in Human-robot interaction. Human Factors 53, 5 (2011), 517–527.
DOI: https://doi.org/10.1177/0018720811417254
[45]
Corey Hannum, Rui Li, and Weitian Wang. 2023. A trust-assist framework for human–robot co-carry tasks. Robotics
12, 2 (Feb. 2023), 30. DOI: https://doi.org/10.3390/robotics12020030
[46] Caroline E. Harrio and Julie A. Adams. 2017. Towards reaction and response Time metrics for real-world human-
robot interaction. In Proceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communi-
cation (RO-MAN ’17), Vol. 2017, 799–804. DOI: https://doi.org/10.1109/ROMAN.2017.8172394
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:18 D. Becker et al.
[47]
Frank Hegel. 2012. Eects of a robot’s aesthetic design on the aribution of social capabilities. In Proceedings of the
IEEE International Workshop on Robot and Human Interactive Communication, 469–475.
DOI:
https://doi.org/10.1109/
ROMAN.2012.6343796
[48]
Nicholas Hertz and Eva Wiese. 2019. Good advice Is beyond all Price, but what if it comes from a machine? Journal
of Experimental Psychology: Applied 25, 3 (2019), 386–395. DOI: https://doi.org/10.1037/xap0000205
[49]
Georey Ho, Dana Wheatley, and Charles T. Scialfa. 2005. Age dierences in trust and reliance of a medication
management system. Interacting with Computers 17, 6 (Dec. 2005), 690–710.
DOI:
https://doi.org/10.1016/j.intcom.
2005.09.007
[50]
Kevin Anthony Ho and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that
inuence trust. Human Factors 57, 3 (2015), 407–434. DOI: https://doi.org/10.1177/0018720814547570
[51]
Ryan Blake Jackson, Tom Williams, and Nicole Smith. 2020. Exploring the role of gender in perceptions of robotic
noncompliance. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 559–567.
DOI:
https://doi.org/10.1145/3319502.3374831
[52]
A. D. Kaplan, T. T. Kessler, T. L. Sanders, J. Cruit, J. C. Brill, and P. A. Hancock. 2021. A Time to trust: Trust
as a function of time in human-robot interaction. In Trust in Human-Robot Interaction. Elsevier, 143–157.
DOI:
https://doi.org/10.1016/B978-0-12-819472-0.00006-X
[53]
Mahias Kerzel, Erik Strahl, Sven Magg, Nicolás Navarro-Guerrero, Stefan Heinrich, and Stefan Wermter. 2017.
NICO—neuro-inspired companion: A developmental humanoid robot platform for multimodal interaction. In Pro-
ceedings of the 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE,
113–120.
[54]
Zahra Rezaei Khavas. 2021. A review on trust in human-robot interaction. arXiv:2105.10045. Retrieved from http:
//arxiv.org/abs/2105.10045
[55]
Sara Kiesler. 2005. Fostering common ground in human-robot interaction. In Proceedings of the IEEE International
Workshop on Robot and Human Interactive Communication, Vol. 2005, IEEE, 729–734.
DOI:
https://doi.org/10.1109/
ROMAN.2005.1513866
[56]
Jaehyeon Kim, Jungil Kong, and Juhee Son. 2021. Conditional variational autoencoder with adversarial learning for
end-to-end text-to-speech. arXiv:2106.06103 Retrieved from http://arxiv.org/abs/2106.06103
[57]
Spencer C. Kohn, Ewart J. de Visser, Eva Wiese, Yi Ching Lee, and Tyler H. Shaw. 2021. Measurement of trust in
automation: A narrative review and reference guide. Frontiers in Psychology 12 (2021), 1–23.
DOI:
https://doi.org/10.
3389/fpsyg.2021.604977
[58]
Bing Cai Kok and Harold Soh. 2020. Trust in robots: Challenges and opportunities. Current Robotics Reports 1, 4
(2020), 297–309. DOI: https://doi.org/10.1007/s43154-020-00029-y
[59]
Mahias Kraus, Johannes Kraus, Martin Baumann, and Wolfgang Minker. 2019. Eects of gender stereotypes on trust
and likability in spoken Human-robot interaction. In LREC 2018 - Proceedings of the 11th International Conference on
Language Resources and Evaluation. 112–118.
[60]
Robert M. Krauss, Robin Freyberg, and Ezequiel Morsella. 2002. Inferring speakers’ physical aributes from their
voices. Journal of Experimental Social Psychology 38, 6 (2002), 618–625. 00221031
DOI:
https://doi.org/10.1016/S0022-
1031(02)00510-3
[61]
Katharina Kühne, Martin H. Fischer, and Yuefang Zhou. 2020. e human takes it all: Humanlike synthesized voices
are perceived as less eerie and more likable. Evidence from a subjective ratings study. Frontiers in Neurorobotics 14
(2020), 1–15. DOI: https://doi.org/10.3389/fnbot.2020.593732
[62]
Katharina Kühne, Erika Herbold, Oliver Bendel, Yuefang Zhou, and Martin H. Fischer. 2023. “Ick bin een Berlina”:
Dialect prociency impacts a robot’s trustworthiness and competence evaluation. Frontiers in Robotics and AI 10
(2023), 1–15. 22969144 DOI: https://doi.org/10.3389/frobt.2023.1241519
[63]
Rinaldo Kühne and Jochen Peter. 2023. Anthropomorphism in human–robot interactions: A multidimensional
conceptualization. Communication eory 33, 1 (2023), 42–52. DOI: https://doi.org/10.1093/ct/qtac020
[64]
John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1
(2004), 50–80. DOI: https://doi.org/10.1518/hfes.46.1.50_30392
[65]
Stephen C Levinson. 2020. Natural forms of purposeful interaction among humans: What makes interaction eective?
In Interactive Task Learning.DOI: https://doi.org/10.7551/mitpress/11956.003.0012
[66]
Michael Lewis, Katia Sycara, and Phillip Walker. 2018. e Role of Trust in Human-Robot Interaction. Springer
International Publishing, Cham, 135–159. DOI: https://doi.org/10.1007/978-3-319-64816-3_8
[67]
Mingming Li, Fu Guo, Xueshuang Wang, Jiahao Chen, and Jaap Ham. 2023. Eects of robot gaze and voice human-
likeness on users’ subjective perception, visual aention, and cerebral activity in voice conversations. Computers in
Human Behavior 141 (Apr. 2023), 107645. DOI: https://doi.org/10.1016/j.chb.2022.107645
[68]
Yuanchao Li and Catherine Lai. 2022. Robotic Speech Synthesis: Perspectives on Interactions, Scenarios, and Ethics. Vol.
1, ACM, New York, NY.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:19
[69]
LimeSurvey Project Team/Carsten Schmitz. 2012. LimeSurvey: An Open Source Survey Tool. LimeSurvey Project,
Hamburg, Germany. Retrieved from http://www.limesurvey.org
[70]
Rui Liu and Xiaoli Zhang. 2019. A review of methodologies for natural-language-facilitated human–robot cooperation.
International Journal of Advanced Robotic Systems 16 (2019), 1–17. DOI: https://doi.org/10.1177/1729881419851402
[71]
Christoph Lutz and Aurelia Tamò-Larrieux. 2021. Do privacy concerns about social robots aect use intentions?
Evidence from an experimental vignee study. Frontiers in Robotics and AI 8 (2021).
DOI:
https://doi.org/10.3389/
frobt.2021.627958
[72]
P. Madhavan and D. A. Wiegmann. 2007. Similarities and dierences between human–human and human–automation
trust: An integrative review. eoretical Issues in Ergonomics Science 8, 4 (2007), 277–301.
DOI:
https://doi.org/10.
1080/14639220500337708
[73]
Bertram Malle, Kerstin Fischer, James Young, AJung Moon, and Emily Collins. 2020. Trust and the Discrepancy
between Expectations and Actual Capabilities of Social Robots. Cambridge Scholars Press, 1–23.
[74]
Joseph H. Manson, Gregory A. Bryant, Mahew M. Gervais, and Michelle A. Kline. 2013. Convergence of speech
rate in conversation predicts cooperation. Evolution and Human Behavior 34, 6 (2013), 419–426.
DOI:
https://doi.org/
10.1016/j.evolhumbehav.2013.08.001
[75]
Alessandro Marin Vargas, Lorenzo Cominelli, Felice Dell’Orlea, and Enzo Pasquale Scilingo. 2021. Verbal communi-
cation in robotics: A study on salient terms, research Fields and trends in the last decades based on a computational
linguistic analysis. Frontiers of Computer Science 2 (2021), 1–12. DOI: https://doi.org/10.3389/fcomp.2020.591164
[76]
Fernando Alonso Martin, María Malfaz, Álvaro Castro-GonzáLez, José Carlos Castillo, and Miguel Ángel Salichs.
2020. Four-features evaluation of text to speech systems for three social robots. Electronics (Switzerland) 9, 2 (2020),
1–23. DOI: https://doi.org/10.3390/electronics9020267
[77]
Nikolaos Mavridis. 2015. A review of verbal and non-verbal human-robot interactive communication. Robotics and
Autonomous Systems 63 (1 2015), 22–35. DOI: https://doi.org/10.1016/j.robot.2014.09.031
[78]
Phil McAleer, Alexander Todorov, and Pascal Belin. 2014. How do you say ‘hello’? Personality impressions from
brief novel voices. PLoS ONE 9, 3 (2014), 1–9. DOI: https://doi.org/10.1371/journal.pone.0090779
[79]
Conor Mcginn and Ilaria Torre. 2019. Can you tell the robot by the voice? An exploratory study on the role of voice
in the perception of robots. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Vol.
2019. IEEE, 211–221. DOI: https://doi.org/10.1109/HRI.2019.8673305
[80]
Joachim Meyer and John D. Lee. 2013. Trust, Reliance, and Compliance. Oxford University Press, 1–28.
DOI:
https:
//doi.org/10.1093/oxfordhb/9780199757183.013.0007
[81]
Cliord Nass and Kwan Min Lee. 2001. Does computer-synthesized speech manifest personality? Experimental tests
of recognition, similarity-araction, and consistency-araction. Journal of Experimental Psychology: Applied 7, 3
(2001), 171–181. DOI: https://doi.org/10.1037/1076-898X.7.3.171
[82]
Manisha Natarajan and Mahew Gombolay. 2020. Eects of anthropomorphism and accountability on trust in
Human robot interaction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction,
33–42. DOI: https://doi.org/10.1145/3319502.3374839
[83]
Aidan Naughton and Tom Williams. 2021. How to tune your draggin’: Can body language mitigate face threat in
robotic noncompliance?. In Proceedings of the International Conference on Social Robotics. Lecture Notes in Computer
Science (including subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 13086,
247–256. DOI: https://doi.org/10.1007/978-3-030-90525-5_21
[84]
Andreea Niculescu, Betsy van Dijk, Anton Nijholt, Haizhou Li, and Swee Lan See. 2013. Making social robots More
aractive: e eects of voice pitch, humor and empathy. International Journal of Social Robotics 5, 2 (2013), 171–191.
DOI: https://doi.org/10.1007/s12369-012-0171-x
[85]
Oliver Niebuhr and Jan Michalsky. 2019. Computer-generated speaker charisma and its eects on Human actions in
a car-navigation system experiment - or how Steve Jobs’ tone of voice can take you anywhere. In Proceedings of
the International Conference on Computational Science and Its Applications (ICCSA ’19). Lecture Notes in Computer
Science (Including Subseries Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics), Vol. 11620,
375–390. DOI: https://doi.org/10.1007/978-3-030-24296-1_31
[86]
Shuichi Nishio, Kohei Ogawa, Yasuhiro Kanakogi, Shoji Itakura, and Hiroshi Ishiguro. 2012. Do robot appearance
and speech aect people’s aitude? Evaluation through the ultimatum Game. Proceedings of the IEEE International
Workshop on Robot and Human Interactive Communication September, 809–814.
DOI:
https://doi.org/10.1109/ROMAN.
2012.6343851
[87]
European Organisation, F O R e, Safety Of, A I R Navigation, European Air, and Trac Management. 2003.
Guidelines for Trust in Future ATM Systems: A Literature Review. European Air Trac Management Programme, 70.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:20 D. Becker et al.
[88] Richard Pak, Nicole Fink, Margaux Price, Brock Bass, and Lindsay Sturre. 2012. Decision support aids with anthro-
pomorphic characteristics inuence trust and performance in younger and older adults. Ergonomics 55, 9 (2012),
1059–1072. DOI: https://doi.org/10.1080/00140139.2012.691554
[89]
Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human Factors 39, 2
(Jun 1997), 230–253. DOI: https://doi.org/10.1518/001872097778543886
[90]
Melanie D. Polkosky and James R. Lewis. 2003. Expanding the MOS: Development and psychometric evaluation of the
MOS-R and MOS-X. International Journal of Speech Technology 6, 2 (2003), 161–182.
DOI:
https://doi.org/10.1023/A:
1022390615396
[91]
Aaron Powers, Adam D.I. Kramer, Shirlene Lim, Jean Kuo, Sau Lai Lee, and Sara Kiesler. 2005. Eliciting information
from people with a gendered humanoid robot. In Proceedings of the IEEE International Workshop on Robot and Human
Interactive Communication, Vol. 2005, 158–163. DOI: https://doi.org/10.1109/ROMAN.2005.1513773
[92]
Diogo Rato, Filipa Correia, André Pereira, and Rui Prada. 2023. Robots in games. International Journal of Social
Robotics 15, 1 (2023), 37–57. DOI: https://doi.org/10.1007/s12369-022-00944-4
[93]
Paul Robinee, Ayanna M. Howard, and Alan R. Wagner. 2017. Eect of robot performance on Human-robot
trust in time-critical situations. IEEE Transactions on Human-Machine Systems 47, 4 (2017), 425–436.
DOI:
https:
//doi.org/10.1109/THMS.2017.2648849
[94]
Paul Robinee, Wenchen Li, Robert Allen, Ayanna M. Howard, and Alan R. Wagner. 2016. Overtrust of robots in
emergency evacuation scenarios. Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction,
101–108. DOI: https://doi.org/10.1109/HRI.2016.7451740
[95]
Julia Rosén, Erik Lagerstedt, and Maurice Lamb. 2022. Is human-like speech in robots deception? In Proceedings of
Human-Robot Interaction (HRI ’22 Workshop). Vol. 1. ACM, New York, NY.
[96]
Henrik Skaug Saetra. 2021. Social robot deception and the culture of trust. Paladyn 12, 1 (2021), 276–286. 20814836
DOI: https://doi.org/10.1515/pjbr-2021-0021
[97]
Busra Sarigul and Burcu A. Urgen. 2023. Audio–visual predictive processing in the perception of humans and robots.
International Journal of Social Robotics 15, 5 (2023), 855–865. DOI: https://doi.org/10.1007/s12369-023-00990-6
[98] Nina Savela, Tuuli Turja, Rita Latikka, and Ae Oksanen. 2021. Media eects on the perceptions of robots. Human
Behavior and Emerging Technologies 3, 5 (2021), 989–1003. DOI: https://doi.org/10.1002/hbe2.296
[99]
Simon Schreibelmayr and Martina Mara. 2022. Robot voices in daily life: Vocal human-likeness and application
context as determinants of user acceptance. Frontiers in Psychology 13 (2022), 1–17.
DOI:
https://doi.org/10.3389/
fpsyg.2022.787499
[100]
Katie Seaborn, Norihisa P. Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2021. Voice in human-agent
interaction: A survey. ACM Computing Surveys 54, 4 (2021), 1–43. DOI: https://doi.org/10.1145/3386867
[101]
Michihiro Shimada and Takayuki Kanda. 2012. What is the appropriate speech rate for a communication robot?
Interaction Studies. Social Behaviour and Communication in Biological and Articial Systems 13, 3 (2012), 408–435.
DOI: https://doi.org/10.1075/is.13.3.05shi
[102]
Georgios Sideridis and Maisaa Taleb S. Alahmadi. 2022. e role of response times on the measurement of mental
ability. Frontiers in Psychology 13 (2022), 1–10. DOI: https://doi.org/10.3389/fpsyg.2022.892317
[103]
Olympia Simantiraki, Martin Cooke, and Simon King. 2018. Impact of dierent speech types on listening eort.
In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Vol.
2018, 2267–2271. DOI: https://doi.org/10.21437/Interspeech.2018-1358
[104]
Valerie K. Sims, Mahew G. Chin, Heather C. Lum, Linda Upham-Ellis, Tatjana Ballion, and Nicholas C. Lagauta.
2009. Robots’ auditory cues are subject to anthropomorphism. In Proceedings of the Human Factors and Ergonomics
Society, Vol. 3, 1418–1421. 10711813 DOI: https://doi.org/10.1518/107118109x12524444079352
[105]
Melissa A. Smith, M. Mowafak Allaham, and Eva Wiese. 2016. Trust in automated agents Is modulated by the
combined inuence of agent and task type. In Proceedings of the Human Factors and Ergonomics Society, 206–210.
10711813 DOI: https://doi.org/10.1177/1541931213601046
[106]
Yao Song, Da Tao, and Yan Luximon. 2023. In robot We trust? e eect of emotional expressions and contextual
cues on anthropomorphic trustworthiness. Applied Ergonomics 109 (May 2023), 103967.
DOI:
https://doi.org/10.
1016/j.apergo.2023.103967
[107]
Hang Su, Wen Qi, Jiahao Chen, Chenguang Yang, Juan Sandoval, and Med Amine Laribi. 2023. Recent advancements
in multimodal human–robot interaction. Frontiers in Neurorobotics 17 (2023), 1–21.
DOI:
https://doi.org/10.3389/
fnbot.2023.1084000
[108]
Rie Tamagawa, Catherine I. Watson, I. Han Kuo, Bruce A. Macdonald, and Elizabeth Broadbent. 2011. e eects of
synthesized voice accents on user perceptions of robots. International Journal of Social Robotics 3, 3 (2011), 253–262.
DOI: https://doi.org/10.1007/s12369-011-0100-4
[109]
Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu. 2021. A survey on neural speech synthesis. arXiv:2106.15561.
Retrieved from http://arxiv.org/abs/2106.15561
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:21
[110]
Ilaria Torre, Jeremy Goslin, Laurence White, and Debora Zanao. 2018. Trust in articial voices: A “congruency
eect” of First impressions and behavioural experience. In Proceedings of the ACM International Conference Proceeding
Series.DOI: https://doi.org/10.1145/3183654.3183691
[111]
Ilaria Torre and Laurence White. 2021. Trust in vocal human–robot interaction: Implications for robot voice design.
In Voice Aractiveness. Springer, Singapore, 299–316. DOI: https://doi.org/10.1007/978-981-15-6627-1_16
[112]
Daniel Ullman, Iolanda Leite, Jonathan Phillips, Julia Kim-Cohen, and Brian Scassellati. 2014. Smart human, smarter
robot: How cheating aects perceptions of social agency. In Proceedings of the 36th Annual Meeting of the Cognitive
Science Society (CogSci ’14), 2996–3001.
[113]
Daniel Ullman and Bertram F. Malle. 2019. Measuring gains and losses in Human-robot trust: Evidence for dieren-
tiable components of trust. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Vol.
2019, IEEE, 618–619. DOI: https://doi.org/10.1109/HRI.2019.8673154
[114]
Daniel Ullrich, Andreas Butz, and Sarah Diefenbach. 2021. e development of overtrust: An empirical simulation
and psychological analysis in the context of Human–robot interaction. Frontiers in Robotics and AI 8 (2021), 1–15.
DOI: https://doi.org/10.3389/frobt.2021.554578
[115]
Jacqueline Urakami and Katie Seaborn. 2023. Nonverbal cues in Human–robot interaction: A communication studies
perspective. ACM Transactions on Human-Robot Interaction 12, 2 (2023), 1–21.
DOI:
https://doi.org/10.1145/3570169
[116]
Ella Velner, Paul P. G. Boersma, and Maartje M. A. De Graaf. 2020. Intonation in robot speech: Does it work the same
as with people? In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 569–578.
DOI:
https://doi.org/10.1145/3319502.3374801
[117]
Lorenzo Vianello, Luigi Penco, Waldez Gomes, Yang You, Salvatore Maria Anzalone, Pauline Maurice, Vincent
omas, and Serena Ivaldi. 2021. Human-humanoid interaction and cooperation: A review. Current Robotics Reports
2, 4 (2021), 441–454. DOI: https://doi.org/10.1007/s43154-021-00068-z
[118]
M. L. Walters, D. S. Syrdal, K. L. Koay, K. Dautenhahn, and R. Te Boekhorst. 2008. Human approach distances to a
mechanical-looking robot with dierent robot voice styles. In Proceedings of the 17th IEEE International Symposium on
Robot and Human Interactive Communication (RO-MAN), 707–712.
DOI:
https://doi.org/10.1109/ROMAN.2008.4600750
[119]
Claire Whang and Hyunjoo Im. 2021. “I Like your suggestion!” e role of humanlikeness and parasocial relationship
on the website versus voice shopper’s perception of recommendations. Psychology and Marketing 38, 4 (2021),
581–595. DOI: https://doi.org/10.1002/mar.21437
[120] Christopher D. Wickens. 1981. Processing Resources in Aention. Academic Press, New York, 63–102.
[121]
Herbert Woodrow. 1911. Reaction Times. Psychological Bulletin 8, 11 (Nov. 1911), 387–390.
DOI:
https://doi.org/10.
1037/h0070885
[122]
Min Xin and Ehud Sharli. 2007. Playing games with robots A method for evaluating Human-robot interaction. In
Human Robot Interaction.DOI: https://doi.org/10.5772/5208
[123]
Junichi Yamagishi, Christophe Veaux, and Kirsten MacDonald. 2019. CSTR VCTK Corpus: English Multi-Speaker Corpus
for CSTR Voice Cloning Toolkit (version 0.92). [Sound]. University of Edinburgh. e Centre for Speech Technology
Research (CSTR). DOI: https://doi.org/10.7488/ds/2645
[124]
Jakub Zotowski, Hidenobu Sumioka, Shuichi Nishio, Dylan F. Glas, Christoph Bartneck, and Hiroshi Ishiguro. 2016.
Appearance of a robot aects the impact of its behaviour on perceived trustworthiness and empathy. Paladyn 7, 1
(2016), 55–66. DOI: https://doi.org/10.1515/pjbr-2016-0005
[125]
Joshua Zonca, Anna Folsø, and Alessandra Sciui. 2021. e role of reciprocity in human-robot social inuence.
iScience 24, 12 (2021), 103424. DOI: https://doi.org/10.1016/j.isci.2021.103424
Appendices
A Details on Selected Voices
For the selection of suitable voices for the robot evaluated in the pre-study, the voices of the CSTR
VCTK Corpus were analyzed. e CSTR VCTK Corpus consists of speech data uered by 110
English speakers with various accents. Metadata for all speakers was collected, and an audio sample
of each speaker was subjectively evaluated for the following speaker aributes:
fundamental frequency (𝑓0)
noise (breath sounds etc.)
pitch (high, bit high, neutral, bit slow, slow)
perceived gender (male, female, neutral)
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:22 D. Becker et al.
speed (fast, neutral, slow)
rhythm (monotonic, neutral, expressive)
e collected information for all speakers was utilized for voice selection. Specically, to provide
a wide variety of speech paerns, while providing a distinctive pitch and ensuring the understand-
ability of the generated voice. Table A1 shows the collected and evaluated aributes of the voices
in the pre-study. For the main study, voice p286 was utilized for the natural voice, which could be
described as a distinctively male voice with a calm speech paern. e voice p336 was utilized for
the mechanical voice in the main study, and aer applying the phaser eect, the resulting voice
reminds of a male child’s voice.
Table A1. Details and Estimates of the Selected Voices
ID Age Gender 𝑓0Accent Region Noise Pitch Perceived gender Speed Rhythm
p336 18 F 205 English Surrey No Neutral F Neutral Monotonic
p243 22 M 270 American Iowa No High F Neutral Neutral
p286 23 M 59 American Ohio No Bit low M Neutral Neutral
p285 21 M 96 American New York No Very low M Slow Neutral
B Pre-Study Pitch Preference
To analyze if the pre-study participants’ gender aects the preferred pitch, a repeated measures
proportional odds logistic regression was utilized. Each participant ranks the voices according to
their perceived suitability in the main study. For model estimation, the voices are grouped according
to their pitch (low, neutral, high), and the rank is the dependent variable. e interaction between
pitch and gender is the independent variable. e model estimates are shown in Table B1. e
results do not indicate a relationship between a participant’s gender and a preference for the robot’s
voice pitch.
Table B1. Repeated Measures Proportional Odds Logistic Regression
Model for Preference of Voice Pitch Depended on a Participant’s Gender
Estimate Std. Error z value Pr(>|z|)
Male 0.123 0.282 0.438 0.662
Neutral-pitch 0.600 0.383 1.566 0.117
High-pitch 0.408 0.358 1.140 0.254
Male : Neutral-pitch 0.531 0.487 1.091 0.275
Male : High-pitch 0.047 0.462 0.102 0.919
cuts1|21.685 0.219 7.699 <0.001
cuts2|30.750 0.213 3.521 <0.001
cuts3|40.043 0.209 0.208 0.836
cuts4|5 0.660 0.207 3.195 0.001
cuts5|6 1.587 0.202 7.858 <0.001
C Implementation Details
e implementation is separated into frontend, backend, and ROS services. An illustration is
provided in Figure C1. e experiment’s frontend is implemented as a React.js application leveraging
the ree.js WebGL 3D engine. e frontend implementation communicates with the backend via
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:23
an HTTP API. Every action by the participant corresponds to an HTTP call to the Django backend.
Additionally, the frontend is connected to the Django backend’s event stream, which allows the
frontend to instantly react to changes. e event stream is implemented on top of the JavaScript
EventSource specication, which allows transmiing asynchronous events to a browser via a
persistent HTTP connection. e experiment is implemented in the backend as a state machine that
calls the individual ROS services, and updates the frontend, and the opponent. us, the opponent
knows the locations of the participant’s ships. When the player accepts correct advice in the rst
experiment phase, a ship of the opponent is placed at the accepted location for the participant to
hit. Further, the backend utilizes a PostgreSQL database to store the participant’s interactions in
the experiment.
e robot is controlled via ROS services, where each of the robot’s actions corresponds to a ROS
service. Specically, ROS is used to control the robot’s voice lines, gestures, and facial expressions.
e robot’s actions, which are controlled by ROS, have a variety of voice lines that are accompa-
nied by gestures and facial expressions. Depending on whether the advice was accepted or rejected,
the robot displays dierent gestures and facial expressions. Details on the utilized gestures are
provided in the Appendix Dand on the variety of voice lines Appendix E.
Fig. C1. Module diagram of the experiment implementation.
D Implemented Robot Gestures
e voice lines used by the robot are accompanied by gestures. ese gestures increase the perceived
liveliness of the robot. An overview of the implemented gestures is provided in Table D1.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
29:24 D. Becker et al.
Table D1. List of Robot’s Gestures
Gesture Comment
Point at the touch table Positioning the ships, choosing a target
Shaking its head e opponent misses a shot
Waving Farewell of the participant
umbs up gesture e player shits an opponent’s ship
Move both arms in front of the body e robot provides a hint
Move hands in front of face Opponent hits a participant’s ship
Hands to head e participant misses its shot
Arms up e participant sinks an opponent’s ship
Head down e opponent wins
Greeting gesture First interaction with the participant
E Examples of Voice Lines
e robot in the experiment utilizes a variety of voice lines, which are pseudo-randomly selected,
to increase the robot’s liveliness and participant engagement. During the experiment, the robot
will guide the participant and inform the participant about the next steps. For example, the robot
will say: “e Opponent is now ring their cannons.”, or provide advice: “We just detected a signal
at position A1. I think there might be a ship. Should we change our target to that position?”. e
coordinates, in this example A1, are dynamically generated during the experiment. ese voice
lines are accompanied by gestures and facial expressions. Table E1 shows examples of the utilized
voice lines.
Table E1. Examples of the Robot’s Voice Lines
Action Facial expression Voice lines
Accept advice Happiness
“anks for trusting my advice. Changing target location to:”
“Let’s hope this intel is correct. Changing target location to:”
“Fingers crossed that my spies got the right information. Changing target location to:”
“I hope I decoded this message correctly. Changing target location to:”
“Let’s see if we can count on this spy. Changing target location to:”
Opponent sinks a ship Angry
“Oh no, that was my favorite ship!”
“Ah man, that was a tough one. We lost a good ship there.
“Well, that’s not how I wanted that round to go.
“It’s disappointing, but we still have a chance to turn things around.
Reject advice Sadness
“Alright, you’re the captain.
“I respect your decision, captain. I’m here to support you no maer what.
“I trust your judgment, captain.
“Alright, I’m here to serve you, captain.
F Mixed-Eect Model for the Accepted Advice
To account for the repeated measures of the participants, a mixed-eect model for advice acceptance
was estimated. e utilized model has an uncorrelated individual intercept and slope for each
participant. is represents that each participant might have a dierent initial trust level, which
individually decreases. ese individual estimates are uncorrelated to ensure that some participants
might have a high trust level, which could either decrease slowly or faster, independent of the
initial trust in the robot’s advice. e results of the mixed-eects model are shown in Table F1.
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
Influence of Robots’ Voice Naturalness on Trust and Compliance 29:25
Table F1. Mixed-Eects Model for Advice Acceptance
Fixed eects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.863 0.263 7.070 <0.001∗∗∗
Natural voice 0.156 0.239 0.650 0.516
Advice 0.222 0.041 5.442 <0.001∗∗∗
Random eects:
Groups Name Variance Std. Dev.
Participant (Intercept) 0.135 0.367
Participant (Advice) 0.009 0.093
Correlation of Fixed Eects:
Intercept Group
Group 0.400
Advice 0.771 0.047
∗∗∗p<.001.
e estimated model suggests a negative correlation with the number of advice. is shows that
the participants lost trust in the robot’s advice throughout the experiment and were less likely to
accept later advice. e random eects suggest that this loss in trust was consistent among the
participants, whereas the initial trust in the robot’s advice varies among the participants.
G Correlation between Response Time and the estionnaire Measures
To infer potential inuences on the response time and the assessed questionnaires, the estimated
correlations are provided in Table G1. From the data, no signicant eect of the robot’s perception
on the response time can be estimated.
Table G1. Spearman’s Rank Correlation between the Average Response Time and the Assessed Measures
Godspeed MOS-X MDMT
Item Animacy Likeability Anthropomorphism Intelligence Safety Intelligibility Naturalness Social impression Competent Reliable
𝜌.006 .046 .053 .170 .033 .056 .021 .100 .124 .161
p-value .963 .707 .665 .166 .786 .649 .864 .418 .314 .190
Received 27 February 2024; revised 19 August 2024; accepted 13 November 2024
ACM Transactions on Human-Robot Interaction, Vol. 14, No. 2, Article 29. Publication date: January 2025.
... Transparent AI voice systems that clarify their artificial nature tend to foster more trust than those that obscure it. Becker et al. [30] suggest that the naturalness of an AI-generated voice plays a pivotal Table 3. Ability to rewrite the news piece from news agencies into radio report according to the criteria of Slovak "radio language" by the five most used generative AI tools in Slovakia. role in compliance and trust. ...
Chapter
Full-text available
The chapter addresses the implementation of artificial intelligence in the radio industry in Slovakia and analyses its impact on various aspects of broadcasting. Artificial intelligence (AI) is increasingly used to automate processes in the media, as radio stations are using AI to generate news, personalise content, or create synthetic presenters. The chapter traces the historical development of AI in the radio environment and its current use in Slovak radio stations, highlighting both the benefits and risks associated with AI. Particular attention is paid to text generation by generative AI and listener reactions to artificially generate voices, with results showing that despite technological advances, the public still prefers human presenters. The future of AI in Slovak radio is also discussed, underscoring dependence on regulations, technological advances, and the media's ability to find a balance between AI efficiency and the human factor.
Article
Full-text available
Background: Robots are increasingly used as interaction partners with humans. Social robots are designed to follow expected behavioral norms when engaging with humans and are available with different voices and even accents. Some studies suggest that people prefer robots to speak in the user’s dialect, while others indicate a preference for different dialects. Methods: Our study examined the impact of the Berlin dialect on perceived trustworthiness and competence of a robot. One hundred and twenty German native speakers (M age = 32 years, SD = 12 years) watched an online video featuring a NAO robot speaking either in the Berlin dialect or standard German and assessed its trustworthiness and competence. Results: We found a positive relationship between participants’ self-reported Berlin dialect proficiency and trustworthiness in the dialect-speaking robot. Only when controlled for demographic factors, there was a positive association between participants’ dialect proficiency, dialect performance and their assessment of robot’s competence for the standard German-speaking robot. Participants’ age, gender, length of residency in Berlin, and device used to respond also influenced assessments. Finally, the robot’s competence positively predicted its trustworthiness. Discussion: Our results inform the design of social robots and emphasize the importance of device control in online experiments.
Article
Full-text available
Many social robots will have the capacity to interact via speech in the future, and thus they will have to have a voice. However, so far it is unclear how we can create voices that fit their robotic speakers. In this paper, we explore how robot voices can be designed to fit the size of the respective robot. We therefore investigate the acoustic correlates of human voices and body size. In Study I, we analyzed 163 speech samples in connection with their speakers’ body size and body height. Our results show that specific acoustic parameters are significantly associated with body height, and to a lesser degree to body weight, but that different features are relevant for female and male voices. In Study II, we tested then for female and male voices to what extent the acoustic features identified can be used to create voices that are reliably associated with the size of robots. The results show that the acoustic features identified provide reliable clues to whether a large or a small robot is speaking.
Article
Full-text available
Artificial intelligence and robotic solutions are seeing rapid development for use across multiple occupations and sectors, including health and social care. As robots grow more prominent in our work and home environments, whether people would favour them in receiving useful advice becomes a pressing question. In the context of human–robot interaction (HRI), little is known about people’s advice-taking behaviour and trust in the advice of robots. To this aim, we conducted an experimental study with older adults to measure their trust and compliance with robot-based advice in health-related situations. In our experiment, older adults were instructed by a fictional human dispenser to ask a humanoid robot for advice on certain vitamins and over-the-counter supplements supplied by the dispenser. In the first experimented condition, the robot would give only information-type advice, i.e., neutral informative advice on the supplements given by the human. In the second condition, the robot would give recommendation-type advice, i.e., advice in favour of more supplements than those suggested initially by the human. We measured the trust of the participants in the type of robot-based advice, anticipating that they would be more trusting of information-type advice. Moreover, we measured the compliance with the advice, for participants who received robot-based recommendations, and a closer proxy of the actual use of robot health advisers in home environments or facilities in the foreseeable future. Our findings indicated that older adults continued to trust the robot regardless of the type of advice received, highlighting a type of protective role of robot-based recommendations on their trust. We also found that higher trust in the robot resulted in higher compliance with its advice. The results underpinned the likeliness of older adults welcoming a robot at their homes or health facilities.
Article
Full-text available
Robotics have advanced significantly over the years, and human–robot interaction (HRI) is now playing an important role in delivering the best user experience, cutting down on laborious tasks, and raising public acceptance of robots. New HRI approaches are necessary to promote the evolution of robots, with a more natural and flexible interaction manner clearly the most crucial. As a newly emerging approach to HRI, multimodal HRI is a method for individuals to communicate with a robot using various modalities, including voice, image, text, eye movement, and touch, as well as bio-signals like EEG and ECG. It is a broad field closely related to cognitive science, ergonomics, multimedia technology, and virtual reality, with numerous applications springing up each year. However, little research has been done to summarize the current development and future trend of HRI. To this end, this paper systematically reviews the state of the art of multimodal HRI on its applications by summing up the latest research articles relevant to this field. Moreover, the research development in terms of the input signal and the output signal is also covered in this manuscript.
Article
Full-text available
Recent work in cognitive science suggests that our expectations affect visual perception. With the rise of artificial agents in human life in the last few decades, one important question is whether our expectations about non-human agents such as humanoid robots affect how we perceive them. In the present study, we addressed this question in an audio–visual context. Participants reported whether a voice embedded in a noise belonged to a human or a robot. Prior to this judgment, they were presented with a human or a robot image that served as a cue and allowed them to form an expectation about the category of the voice that would follow. This cue was either congruent or incongruent with the category of the voice. Our results show that participants were faster and more accurate when the auditory target was preceded by a congruent cue than an incongruent cue. This was true regardless of the human-likeness of the robot. Overall, these results suggest that our expectations affect how we perceive non-human agents and shed light on future work in robot design.
Article
Full-text available
Robots are increasingly being employed for diverse applications where they must work and coexist with humans. The trust in human–robot collaboration (HRC) is a critical aspect of any shared-task performance for both the human and the robot. The study of a human-trusting robot has been investigated by numerous researchers. However, a robot-trusting human, which is also a significant issue in HRC, is seldom explored in the field of robotics. Motivated by this gap, we propose a novel trust-assist framework for human–robot co-carry tasks in this study. This framework allows the robot to determine a trust level for its human co-carry partner. The calculations of this trust level are based on human motions, past interactions between the human–robot pair, and the human’s current performance in the co-carry task. The trust level between the human and the robot is evaluated dynamically throughout the collaborative task, and this allows the trust to change if the human performs false positive actions, which can help the robot avoid making unpredictable movements and causing injury to the human. Additionally, the proposed framework can enable the robot to generate and perform assisting movements to follow human-carrying motions and paces when the human is considered trustworthy in the co-carry task. The results of our experiments suggest that the robot effectively assists the human in real-world collaborative tasks through the proposed trust-assist framework.
Article
Full-text available
During the past two decades, robots have been increasingly deployed in games. Researchers use games to better understand human-robot interaction and, in turn, the inclusion of social robots during gameplay creates new opportunities for novel game experiences. The contributions from social robotics and games communities cover a large spectrum of research questions using a wide variety of scenarios. In this article, we present the first comprehensive survey of the deployment of robots in games. We organise our findings according to four dimensions: (1) the societal impact of robots in games, (2) games as a research platform, (3) social interactions in games, and (4) game scenarios and materials. We discuss some significant research achievements and potential research avenues for the gaming and social robotics communities. This article describes the state of the art of the research on robots in games in the hope that it will assist researchers to contextualise their work in the field, to adhere to best practices and to identify future areas of research and multidisciplinary collaboration.
Chapter
Experts from a range of disciplines explore how humans and artificial agents can quickly learn completely new tasks through natural interactions with each other. Humans are not limited to a fixed set of innate or preprogrammed tasks. We learn quickly through language and other forms of natural interaction, and we improve our performance and teach others what we have learned. Understanding the mechanisms that underlie the acquisition of new tasks through natural interaction is an ongoing challenge. Advances in artificial intelligence, cognitive science, and robotics are leading us to future systems with human-like capabilities. A huge gap exists, however, between the highly specialized niche capabilities of current machine learning systems and the generality, flexibility, and in situ robustness of human instruction and learning. Drawing on expertise from multiple disciplines, this Strüngmann Forum Report explores how humans and artificial agents can quickly learn completely new tasks through natural interactions with each other. The contributors consider functional knowledge requirements, the ontology of interactive task learning, and the representation of task knowledge at multiple levels of abstraction. They explore natural forms of interactions among humans as well as the use of interaction to teach robots and software agents new tasks in complex, dynamic environments. They discuss research challenges and opportunities, including ethical considerations, and make proposals to further understanding of interactive task learning and create new capabilities in assistive robotics, healthcare, education, training, and gaming. ContributorsTony Belpaeme, Katrien Beuls, Maya Cakmak, Joyce Y. Chai, Franklin Chang, Ropafadzo Denga, Marc Destefano, Mark d'Inverno, Kenneth D. Forbus, Simon Garrod, Kevin A. Gluck, Wayne D. Gray, James Kirk, Kenneth R. Koedinger, Parisa Kordjamshidi, John E. Laird, Christian Lebiere, Stephen C. Levinson, Elena Lieven, John K. Lindstedt, Aaron Mininger, Tom Mitchell, Shiwali Mohan, Ana Paiva, Katerina Pastra, Peter Pirolli, Roussell Rahman, Charles Rich, Katharina J. Rohlfing, Paul S. Rosenbloom, Nele Russwinkel, Dario D. Salvucci, Matthew-Donald D. Sangster, Matthias Scheutz, Julie A. Shah, Candace L. Sidner, Catherine Sibert, Michael Spranger, Luc Steels, Suzanne Stevenson, Terrence C. Stewart, Arthur Still, Andrea Stocco, Niels Taatgen, Andrea L. Thomaz, J. Gregory Trafton, Han L. J. van der Maas, Paul Van Eecke, Kurt VanLehn, Anna-Lisa Vollmer, Janet Wiles, Robert E. Wray III, Matthew Yee-King
Article
Following the evolution of technology and its application in various daily contexts, social robots work as an advanced artificial intelligence (AI) system to interact with humans. However, limited research has been done to discuss the role of emotional expressions and contextual cues in influencing anthropomorphic trustworthiness, especially from the design perspective. To address this research gap, the current study designed a specific robot prototype and conducted two lab experiments to explore the effect of emotional expressions and contextual cues on trustworthiness via a combination of subjective ratings and physiological measures. Results showed that: 1) positive (vs. negative) emotional expressions enjoyed a higher level of anthropomorphic trustworthiness and visual attention; 2) regulatory fit was expanded in parasocial interaction and worked as a prime to activate anthropomorphic trustworthiness for social robots. Theoretical contributions and design implications were also discussed in this study.
Article
Robot gaze and voice are essential anthropomorphic features to promote users' engagement in voice conversations. Earlier research chiefly examined how robot gaze and voice human-likeness separately influenced users' subjective perception. When implementing gaze on robots with different human-like voices, there has little evidence of their possible interaction effects, particularly on users' visual attention and cerebral activity, which could help to understand the perceptual and cognitive processing of anthropomorphic features. Therefore, a within-subject experiment of voice conversations with diverse robot gaze (gaze versus no gaze) and human-like voices (high human-like versus low human-like) using subjective reporting, eye-tracker, and fNIRS was conducted. The results showed that the robot with gaze or a high human-like voice evoked more pleasure, higher arousal, more perceived likability, and less negative attitudes. Robot gaze significantly increased users' average fixation durations and total fixation time, while voice human-likeness prolonged first fixation durations. Moreover , the robot with a high human-like voice (or gaze) induced increased activity in the left DLPFC and decreased activity in the right Broca's area than that had no gaze (or a low human-like voice). The results suggest that robot gaze might chiefly capture users' sustained attention, voice human-likeness might attract users' initial attention, and they might jointly influence users' perceptual processing of prosodic features and emotional processing.