Conference Paper

Offscreen and in the Chair Next to Your: Conversational Agents Speaking Through Actual Human Bodies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper demonstrates how to interact with a conversational agent that speaks through an actual human body face-to-face and in person (i.e., offscreen). This is made possible by the cyranoid method: a technique involving a human person speech shadowing for a remote third-party (i.e., receiving their words via a covert audio-relay apparatus and repeating them aloud in real-time). When a person shadows for an artificial conversational agent source, we call the resulting hybrid an “echoborg.” We report a study in which people encountered conversational agents either through a human shadower face-to-face or via a text interface under conditions where they assumed their interlocutor to be an actual person. Our results show that the perception of a conversational agent is dramatically altered when the agent is voiced by an actual, tangible person. We discuss the potential implications this methodology has for the development of conversational agents and general person perception research.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, given the hybrid nature of the (M)EB (mind of a machine, body of a human), the prior work does not allow for direct predictions in this regard. In previous work on EBs, the non-EB condition featured textual interfaces rather than alternative (artificial) embodiments [2,5,26], and as such does not provide insights on how an (M)EB might perform when compared to other embodied agents. For our present work, we compare two conversational agent embodiments with a representation of a real or virtual body, pulling the compared conditions more alike. ...
... From the point of view of the methodology, we referred to Corti et. al [26] as benchmark. They analysed the adjectives participants attributed to the respective conversational partner. ...
Article
Full-text available
In this paper we present a Multimodal Echoborg interface to explore the effect of different embodiments of an Embodied Conversational Agent (ECA) in an interaction. We compared an interaction where the ECA was embodied as a virtual human (VH) with one where it was embodied as an Echoborg, i.e, a person whose actions are covertly controlled by a dialogue system. The Echoborg in our study not only shadowed the speech output of the dialogue system but also its non-verbal actions. The interactions were structured as a debate between three participants on an ethical dilemma. First, we collected a corpus of debate sessions with three humans debaters. This we used as baseline to design and implement our ECAs. For the experiment, we designed two debate conditions. In one the participant interacted with two ECAs both embodied by virtual humans). In the other the participant interacted with one ECA embodied by a VH and the other by an Echoborg. Our results show that a human embodiment of the ECA overall scores better on perceived social attributes of the ECA. In many other respects the Echoborg scores as poorly as the VH except copresence .
... Instead, it turns its attention to the optimal future of the relationship between humans and AI. The notion of an echoborg was invented by social psychologists Kevin Corti and Alex Gillespie for a series of experiments on conversational agent development (Corti and Gillespie, 2015). I am beyond the instrumental prototype deployed in their experiments. ...
Article
This essay considers the nature and stakes of creative making with computational automation technologies. I will argue that Bernard Stiegler’s organological approach to the human as “technical life” takes care of the question of the nature of creative making, and the pharmacological critical practice that it mandates takes care of the question of the stakes. I say “takes care” to emphasise that Stiegler’s theoretical enterprise is dedicated to a “therapeutics” of contemporary technocultural transformation, because culture is best understood as a taking care of the technical pharmakon – both poison and cure – that is our irreducible technical supplementarity. After providing an assessment of Stiegler’s thinking on organology and pharmacological critique, I will discuss the work of some creative makers I have worked with or was able to interview as part of the South West Creative Technologies Network’s Automation Fellowship programme in 2019-2020. The goal is to interpret their work pharmacologically and so to elaborate and extend Stiegler’s work on contemporary technocultural becoming. Digital automation and AI are powerful drivers of the so-called Silicon Valley era of disruptive “creative destruction”. This means that the stakes of creative making and its possibilities for taking care of the future cannot be higher today.
Article
Full-text available
In The Voice in the Machine, Roberto Pieraccini takes the reader on a journey through the history of speech technologies, from the speaking machine of von Kempelen in 1804 through the current uses of statistical machine learning that make possible speech recognition, artificial speech production, speaker recognition, and dialog management. Although this is more of a popular science book about the history of speech technologies than a book dedicated to voice user interface (VUI) design, HCI researchers and VUI designers will very likely find the information accessible and valuable. The fundamental goal of human factors engineering is to strive for optimal allocation of tasks to machines and humans, and you can’t do that if you don’t have a good understanding of machine and human capability. Without getting into the complex mathematics of machine learning, Pieraccini does a superb job of communicating how machines work with speech and provides enough background about human speech to aid an understanding of the capabilities and limitations of current systems.
Article
Full-text available
The current article argues that researcher-as-subject self-experimentation can provide valuable insight and systematic knowledge to social psychologists. This approach, the modus operandi of experimental psychology when the field was in its infancy, has been largely eclipsed by an almost exclusive focus on participant-as-subject other-experimentation. Drawing from the non-experimental first-person traditions of autoethnography, participant observation, and phenomenology, we argue that participating as both observer and subject within one’s own social psychological experiment affords researchers at least three potential benefits: (1) access to “social qualia,” that is, the subjective experience of social phenomena; (2) improved mental models of social phenomena, potentially stimulating new research questions; and (3) an enhanced ability to be reflexive about the given experiment. To support our position, we provide first-person self-reflections from researchers who have self-experimented with transformed social interactions involving Milgram’s cyranoid method. We close by offering guidelines on how one might approach self-experimentation, and discuss a variety of first-person perspective ethnographic technologies that can be incorporated into the practice.
Conference Paper
Full-text available
One aim of participatory innovation is to find new ways of engaging people in various situations that result in ideas and suggestions. For a socio-technical system – whether the design process or its implemented result – to work smoothly (aesthetically) as a specific social order, we should understand just how it is practically accomplished. In this paper, the just-thisness of an experimental teaching situation is explored in which students were acting as cyranoids for their teacher who was located elsewhere. The situation gives us empirical materials about a relatively simple task (listening through headphones words to repeat and instructions for action), though extraordinary circumstances (mediating teaching which would normally have been relayed through video/audio). We show how the surrogate has to orient to the on-going lecture as intelligible, to show their understanding of the situation to the absent teacher, and to recipient design the delivery to the fellow students. They have to become a porous membrane that mediates between the two sites, exactly like a user friendly interface or any fitting part of a socio-technical system would. Thus the data shows what seen but unnoticed interactional work is required to participate, rather than to purely mimic or mediate. INTRODUCTION
Article
Full-text available
ABSTRACT In two studies based on Stanley Milgram's original pilots, we present the first systematic examination of cyranoids as social psychological research tools. A cyranoid is created by cooperatively joining in real-time the body of one person with speech generated by another via covert speech shadowing. The resulting hybrid persona can subsequently interact with third parties face-to-face. We show that naïve interlocutors perceive a cyranoid to be a unified, autonomously communicating person, evidence for a phenomenon Milgram termed the "cyranic illusion." We also show that creating cyranoids composed of contrasting identities (a child speaking adult-generated words and vice versa) can be used to study how stereotyping and person perception are mediated by inner (dispositional) vs. outer (physical) identity. Our results establish the cyranoid method as a unique means of obtaining experimental control over inner and outer identities within social interactions rich in mundane realism.
Article
Full-text available
The current study investigates the influence of lexical factors on phonetic convergence and explores the relationship between acoustic and perceptual measures of convergence. A set of talkers produced baseline and shadowed tokens of target words that varied in frequency and phonological neighbor density independently. Experiment 1 demonstrated the impact of lexical factors on vowel dispersion in speech production. In Experiment 2, separate lis- teners judged the relative similarity of shadowed to model tokens in an AXB perceptual test of phonetic convergence. Acoustic measures of inter-talker distances in duration, fun- damental frequency, and vowel formants for baseline and shadowed speech were com- pared to the perceptual measures. A mixed-effects regression model using a combination of acoustic convergence measures predicted perceived phonetic convergence better than lexical factors or individual acoustic attributes alone. These findings have important meth- odological and theoretical implications for understanding the complexities of phonetic convergence. Studies of convergence should consider examining acoustic and perceptual measures in tandem. Lexical factors impact speech production and perception, but their effects appear to be independent of those that evoke phonetic convergence.
Article
Full-text available
This paper reports upon an ongoing investigation exploring a provoking concept in interpersonal interaction. The origins of the concept of human conduits or cyranoids as a tool of de- ception is outlined. Informal exploration of the technique in social settings is described. It was discovered that a participa- tory unveiling of the illusion might accelerate the formation of positive new interpersonal relationships. A follow up trial in a workplace setting probed if the technique had potential as a medium of business communication. Refl ections upon the diffi culties of accurately relaying emotions through a human conduit conclude the paper.
Article
Full-text available
2 studies tested the hypothesis that attending to a particular individual in a social situation leads to regarding that individual as the causal agent in the situation. Both studies, using a total of 92 male and female college students, furnished strong support for the hypothesis. The effect did not extend to perceptions of individual behavior as more dispositionally based. The effect was not dependent on differential retention of information about the dependent on perceptual salience. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The development of robots that closely resemble human beings can contribute to cognitive research. An android provides an experimental apparatus that has the potential to be controlled more precisely than any human actor. However, preliminary results indicate that only very humanlike devices can elicit the broad range of responses that people typically direct toward each other. Conversely, to build androids capable of emulating human behavior, it is necessary to investigate social activity in detail and to develop models of the cognitive mechanisms that support this activity. Because of the reciprocal relationship between android development and the exploration of social mechanisms, it is necessary to establish the field of android science. Androids could be a key testing ground for social, cognitive, and neuroscientific theories as well as platform for their eventual unification. Nevertheless, subtle flaws in appearance and movement can be more apparent and eerie in very humanlike robots. This uncanny phenomenon may be symptomatic of entities that elicit our model of human other but do not measure up to it. If so, very humanlike robots may provide the best means of pinpointing what kinds of behavior are perceived as human, since deviations from human norms are more obvious in them than in more mechanical-looking robots. In pursuing this line of inquiry, it is essential to identify the mechanisms involved in evaluations of human likeness. One hypothesis is that, by playing on an innate fear of death, an uncanny robot elicits culturally-supported defense responses for coping with death’s inevitability. An experiment, which borrows from methods used in terror management research, was performed to test this hypothesis. [Thomson Reuters Essential Science Indicators: Fast Breaking Paper in Social Sciences, May 2008]
Article
Full-text available
Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems.
Article
Full-text available
Pioneering research by Chistovich and her colleagues used speech shadowing to study the mechanisms of immediate speech processing, and in doing so exploited the phenomenon of close shadowing, where the delay between hearing a speech stimulus and repeating it is reduced to 250 msec or less. The research summarised here began with an extension of Chistovich's findings to the close shadowing of connected prose. Twenty-five percent of the women tested were able to accurately shadow connected prose at mean delays ranging from 250 to 300 msec. The other women, and all the men tested, were only able to do so at longer latencies, averaging over 500 msec. There are called distant shadowers. A second series of experiments established that close, just as much as distant shadowers, were syntactically and semantically analysing the material as they repeated it. This was reflected in the ways their spontaneous errors were constrained, and in their sensitivity to disruptions of the syntactic and semantic structure of the materials they were shadowing. A third series of experiments showed that the difference between close and distant shadowers was in their output strategy. Close shadowers are able to use the products of on-line speech analysis to drive their articulatory apparatus before they are fully aware of what these products are. This means that close shadowing not only provides a continuous reflection of the outcome of the process of language comprehension, but also does so relatively unaffected by post-perceptual processes. In this sense, therefore, close shadowing provides us with uniquely privileged access to the properties of the system.
Article
Full-text available
SPEECH shadowing is an experimental task in which the subject is required to repeat (shadow) speech as he hears it. When the shadower is presented with a sentence, he will start to repeat it before he has heard all of it. The response latency to each word of a sentence can therefore be measured.
Article
Full-text available
In this article the author proposes an episodic theory of spoken word representation, perception, and production. By most theories, idiosyncratic aspects of speech (voice details, ambient noise, etc.) are considered noise and are filtered in perception. However, episodic theories suggest that perceptual details are stored in memory and are integral to later perception. In this research the author tested an episodic model (MINERVA 2; D. L. Hintzman, 1986) against speech production data from a word-shadowing task. The model predicted the shadowing-response-time patterns, and it correctly predicted a tendency for shadowers to spontaneously imitate the acoustic patterns of words and nonwords. It also correctly predicted imitation strength as a function of "abstract" stimulus properties, such as word frequency. Taken together, the data and theory suggest that detailed episodes constitute the basic substrate of the mental lexicon.
Book
An examination of more than sixty years of successes and failures in developing technologies that allow computers to understand human spoken language. Stanley Kubrick's 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to “say or press 1”? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer? In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL's capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model—specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?
Article
Understanding contexts is an important challenge that is made harder for designers by the increasing speed at which contexts change. To assist designers, three types of contextual dynamism are distinguished: physical, ontological and social. To inform understanding ontological dynamism and social dynamism, "social contraptions" - a form of socially interactive design experimentation is proposed. This paper focuses on cryanic social contraptions in which unseen users interact through a human surrogate that they guide via radio transmissions. Observations from initial trials are reported along with a discussion of themes arising for design and an appraisal of this approach's potential as a design tool.
Article
What is the hallmark of success in human-agent interaction? In animation and robotics, many have concentrated on the looks of the agent — whether the appearance is realistic or lifelike. We present an alternative benchmark that lies in the dyad and not the agent alone: Does the agent's behavior evoke intersubjectivity from the user? That is, in both conscious and unconscious communication, do users react to behaviorally realistic agents in the same way they react to other humans? Do users appear to attribute similar thoughts and actions? We discuss why we distinguish between appearance and behavior, why we use the benchmark of intersubjectivity, our methodology for applying this benchmark to embodied conversational agents (ECAs), and why we believe this benchmark should be applied to human-robot interaction.
Article
This study investigated impression formation while a speaker presented either spontaneous or “shadowed” messages to observers. The spontaneous messages were two-minute, impromptu, self-originated speeches. The shadowed messages were two-minute speeches transmitted to the speaker, which he repeated as he heard them through a small, inconspicuous earphone. Ss were 42 male and female undergraduates who rated the speeches and the speaker on such characteristics as fluency, sincerity, truthfulness, and attractiveness. Both spontaneous and shadowed speeches were rated positively, but spontaneous speeches were rated higher than shadowed speeches. This appeared to reflect an interaction between speech content and the rated characteristics. It was tentatively concluded that the speaker was able, while shadowing, to present consistent verbal and nonverbal cues to Ss which allowed positive impression formation.
Article
Intersubjectivity refers to the variety of possible relations between perspectives. It is indispensable for understanding human social behaviour. While theoretical work on intersubjectivity is relatively sophisticated, methodological approaches to studying intersubjectivity lag behind. Most methodologies assume that individuals are the unit of analysis. In order to research intersubjectivity, however, methodologies are needed that take relationships as the unit of analysis. The first aim of this article is to review existing methodologies for studying intersubjectivity. Four methodological approaches are reviewed: comparative self-report, observing behaviour, analysing talk and ethnographic engagement. The second aim of the article is to introduce and contribute to the development of a dialogical method of analysis. The dialogical approach enables the study of intersubjectivity at different levels, as both implicit and explicit, and both within and between individuals and groups. The article concludes with suggestions for using the proposed method for researching intersubjectivity both within individuals and between individuals and groups.
Article
Roboticists believe that people will have an unpleasant impression of a humanoid robot that has an almost, but not perfectly, realistic human appearance. This is called the uncanny valley, and is not limited to robots, but is also applicable to any type of human-like object, such as dolls, masks, facial caricatures, avatars in virtual reality, and characters in computer graphics movies. The present study investigated the uncanny valley by measuring observers' impressions of facial images whose degree of realism was manipulated by morphing between artificial and real human faces. Facial images yielded the most unpleasant impressions when they were highly realistic, supporting the hypothesis of the uncanny valley. However, the uncanny valley was confirmed only when morphed faces had abnormal features such as bizarre eyes. These results suggest that to have an almost perfectly realistic human appearance is a necessary but not a sufficient condition for the uncanny valley. The uncanny valley emerges only when there is also an abnormal feature.
The St. Unicorn’s Trust
  • L Pawlak
Cleverbot [Computer Program]
  • R Carpenter
Mitsuku [Computer Program]
  • S Worswick
The Man Who Shocked the World: The Life and Legacy of Stanley Milgram
  • T Blass
Cyranic contraptions: using personality surrogates to explore ontologically and socially dynamic contexts
  • R Mitchell
  • A Gillespie
  • B Neill