Eran Raveh's research while affiliated with Universität des Saarlandes and other places

Publications (17)

Article
We present a Wizard-of-Oz experiment examining phonetic accommodation of human interlocutors in the context of human-computer interaction. Forty-two native speakers of German engaged in dynamic spoken interaction with a simulated virtual tutor for learning the German language called Mirabella. Mirabella was controlled by the experimenter and used e...
Article
Full-text available
The present study investigates whether native speakers of German phonetically accommodate to natural and synthetic voices in a shadowing experiment. We aim to determine whether this phenomenon, which is frequently found in HHI, also occurs in HCI involving synthetic speech. The examined features pertain to different phonetic domains: allophonic var...
Conference Paper
Full-text available
The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing 'happiness' (0 %, 33 %, 66 % 'hap...
Conference Paper
Full-text available
The present paper compares phonetic accommodation of L1 French speakers in interaction with the simulated virtual language learning tutor for German, Mirabella, to that of L1 German speakers from a previous study. In a question-and-answer exchange, the L1 French speakers adapted the intonation contours of wh-questions as falling or rising according...
Poster
Full-text available
Results of a Wizard-of-Oz experiment examining phonetic accommodation in human-computer interaction. The name of our simulated system is Mirabella. She is a tutor for German as a foreign language. We are looking for accommodation of question intonation and allophonic variation in German.
Conference Paper
Full-text available
This study examines how the presence of other speakers affects the interaction with a spoken dialogue system. We analyze participants’ speech regarding several phonetic features, viz., fundamental frequency, intensity, and articulation rate, in two conditions: with and without additional speech input from a human confederate as a third interlocutor...
Conference Paper
Full-text available
This paper discusses phonetic accommodation of 20 native German speakers interacting with the simulated spoken dialogue system Mirabella in a Wizard-of-Oz experiment. The study examines intonation of wh-questions and pronunciation of allophonic contrasts in German. In a question-and-answer exchange with the system, the users produce predominantly f...
Conference Paper
Full-text available
This paper presents a Wizard-of-Oz experiment designed to study phonetic accommodation in human-computer interaction. The experiment comprises a dynamic exchange of information between a human interlocutor and a supposedly intelligent system, while allowing for planned manipulation of the system's speech output on the level of phonetic detail. In t...
Chapter
Full-text available
This paper presents a study that examines the difference of certain phonetic features between human-directed speech (HDS) and device-directed speech (DDS) in human-human-computer interactions. The corpus used consists of tasks, in which participants perform task with a confederate and a computer is used for the analyses. This includes distributiona...
Chapter
Full-text available
This paper presents a study on mutual speech variation influences in a human-computer setting. The study highlights behavioral patterns in data collected as part of a shadowing experiment, and is performed using a novel end-to-end platform for studying phonetic variation in dialogue. It includes a spoken dialogue system capable of detecting and tra...
Conference Paper
Full-text available
In this paper we present (1) a processing architecture used to collect multi-modal sensor data, both for corpora collection and real-time processing, (2) an open-source implementation thereof and (3) a use-case where we deploy the architecture in a multi-party deception game, featuring six human players and one robot. The architecture is agnostic t...

Citations

... Thus, we expected them to be less prone to convergence than in the case of a human interlocutor. However, we are aware that increasing evidence of prosodic accommodation to a synthetic or artificially modified voice in human-computer interaction has been reported in the literature (Bell et al., 2003;Gessinger et al., 2021;Raveh et al., 2019;Suzuki & Katagiri, 2007). ...
... Speech level is a variation of language whose difference is determined by the speaker's attitude to the interlocutor or the third person being spoken to [31], [32]. Differences in age, degree of social level, and distance of intimacy between the speaker and the interlocutor will determine the variation of the language chosen. ...
... Alternatively, our second assumption may have counteracted the latter, namely that the participants probably did not perceive the SDS as fully linguistically flexible and therefore again assumed that it could likely benefit from convergence. Extending the present study to L1-L2 communication by having non-native speakers of German interact with Mirabella is a next step to further investigate these dynamics (see Gessinger, Möbius, Andreeva, Raveh, & Steiner, 2020 for native speakers of French). ...
... Furthermore, the extent to which individual variation by humans' social and cognitive characteristics shapes speech adaptation to voice-AI is a promising area for future research. Prior work has shown variation in how people perceive and personify technological agents, such as robots (Hinz et al., 2019) and voice-AI (Cohn, Raveh, et al., 2020;Etzrodt & Engesser, 2021). Recently, some work has revealed differences in speech alignment toward voice-AI by speaker age (e.g., older vs. college-age adults in Zellou, Cohn, & Ferenc Segedin, 2021) and cognitive processing style (e.g., autisticlike traits in Snyder et al., 2019), suggesting these differences could shape voice-AI speech adaptation as well. ...
... Together with the modeling information and the techniques from Part III, these components are utilized in the system introduced in Chapter 10. The extended architecture, usage possibilities, and graphical visualizations of this system are EranRaveh et al. (May 2020). "Prosodic Alignments in Shadowed Singing of Familiar and Novel Music".In: Speech Prosody.Tokyo, Japan, pp. ...
... However, the situations in which more than one person is involved with the artificial agent are becoming increasingly important, especially when technologies integrate into the users' everyday environment. Recent studies on robots (e.g., Diederich et al., 2019;Thompson and Gillan, 2010;Fortunati et al., 2020) and commercial VPAs (Etzrodt and Engesser, 2021;Lopatovska and Williams, 2018;Porcheron et al., 2018;Purington et al., 2017;Raveh et al., 2019) indicate that the social situation regarding how many people interact with the agent alters the interaction with and the perception of the agent as well as how people might relate to it. However, their results remain vague and are primarily collateral findings. ...
... The WOz experiment constitutes Chapter 4 of this thesis. This chapter is based on the following published peer-reviewed conference and journal articles: Gessinger et al. (2019b), Gessinger et al. (2019a), Gessinger et al. (2020), and Gessinger et al. (2021a). ...
... The WOz experiment constitutes Chapter 4 of this thesis. This chapter is based on the following published peer-reviewed conference and journal articles: Gessinger et al. (2019b), Gessinger et al. (2019a), Gessinger et al. (2020), and Gessinger et al. (2021a). ...
... The present study aims for better understanding of human behavior during human-robot interactions by exploring the extent to which humans adapt their speaking style to the listener in unstructured human-human-robot interactions. A similar study was conducted by [11,12]. However, their experiment used a voice-based device (Amazon Alexa) as the computer interlocutor. ...
... This way, no discontinuities are created and the same number of datapoints can be extracted for both speakers to create a better temporal representation of the conversation. Feature extraction was done using the system described in Chapter 10 (and see Raveh et al., 2018). The values ...