Conference PaperPDF Available

Social talk: making conversation with people and machine

Authors:

Abstract and Figures

Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected.
Content may be subject to copyright.
Social Talk: Making Conversation with People and Machine
Emer Gilmartin
Trinity College Dublin
Ireland
gilmare@tcd.ie
Marine Collery
Grenoble INP Phelma
France
marine.collery@grenoble-inp.org
Ketong Su
Trinity College Dublin
Ireland
kesu@tcd.ie
Yuyun Huang
Trinity College Dublin
Ireland
huangyu@tcd.ie
Christy Elias
Trinity College Dublin
Ireland
eliasc@tcd.ie
Benjamin R. Cowan
Trinity College Dublin/University
College Dublin
Ireland
benjamin.cowan@ucd.ie
Nick Campbell
Trinity College Dublin
Ireland
nick@tcd.ie
ABSTRACT
Social or interactive talk diers from task-based or instrumental
interactions in many ways. Quantitative knowledge of these dier-
ences will aid the design of convincing human-machine interfaces
for applications requiring machines to take on roles including so-
cial companions, healthcare providers, or tutors. We briey review
accounts of social talk from the literature. We outline a three part
data collection of human-human, human-woz and human-machine
dialogs incorporating light social talk and a guessing game. We
nally describe our ongoing experiments on the corpus collected.
CCS CONCEPTS
Human-centered computing Natural language interfaces
;
Interaction design theory, concepts and paradigms
;
Com-
puting methodologies Discourse, dialogue and pragmat-
ics;
KEYWORDS
HCI, casual conversation, intelligent agents
ACM Reference Format:
Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias,
Benjamin R. Cowan, and Nick Campbell. 2017. Social Talk: Making Con-
versation with People and Machine. In Proceedings of 1st ACM SIGCHI
International Workshop on Investigating Social Interactions with Articial
Agents (ISIAA’17). ACM, New York, NY, USA, 2pages. https://doi.org/10.
1145/3139491.3139494
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ISIAA’17, November 13, 2017, Glasgow, UK
©2017 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-5558-2/17/11.. .$15.00
https://doi.org/10.1145/3139491.3139494
1 CASUAL SOCIAL CONVERSATION
Casual social conversation,‘talking just for the sake of talking’[
4
],
includes smalltalk, gossip, and conversational narrative and has
been described as the most basic use of speech [
7
]. Researchers
theorize that such talk functions to build social bonds and avoid un-
friendly silence, rather than simply exchange linguistic information
or express thought.
Early work focused on conversational openings and closings
where small talk performs a transitional function from silence to
the ‘meat’ of interaction [
6
]. Casual conversation has distinct phases
[
9
]. Schneider noted that casual talk did not seem to follow Gricean
maxims and contains idling sequences of repetitions of agreeing
tails which keep the conversation going rather than add any new
information [
8
]. Eggins and Slade viewed casual conversation as
the space in which people form and rene their social reality [
4
].
Task-based conversations are bounded by task completion and tend
to be short, casual conversation can go on indenitely. Syntactical,
lexical, and discourse features of (casual) conversation are described
in Biber and Leech’s work on the Longman Corpus of Spoken and
Written English (LSWE) and in Schneider’s work [2,8].
As the goal of casual conversation is to ‘pass the time’ rather
than simply exchange information, the success of such interactions
may depend more on para- or extra-linguistic factors. For example,
a short gap may make the conversation seem rushed or uncom-
fortable, whereas an overlong silence might signal boredom. It has
been theorised that the timing and the ‘feel’ of a conversation is
vital to the success of social conversation [1,5].
We concentrate on the architecture of casual conversation in
terms of chat and chunk phases, as this will greatly aect the
‘dosage’ of talk in articial systems. We are also interested in pauses,
gaps and overlap in social talk which has implications for endpoint-
ing and turntaking protocols. We have built a spoken dialog system
and collected a corpus of dyadic conversations as described below.
2 CARA DIALOGUE SYSTEM
CARA is a custom-built Java-based spoken dialog system based
on the architectural principles used in SEMAINE and in the IrisTK
31
ISIAA’17, November 13, 2017, Glasgow, UK E. Gilmartin et al.
toolkit. The latest iteration uses CMU’s Sphinx ASR and Cereproc’s
Caitlin Irish accented voice, but is congurable to use other ASR
and TTS applications. A WOZ system has been integrated into the
system which allows a WOZ user interface to be generated automat-
ically for any dialogue ow. Both the automatic and WOZ systems
are browser-based. Using this system, we have collected a corpus
of human-machine, human-WOZ and human-human dialogues.
3 CORPUS DESIGN AND DATA COLLECTION
The domain for the system dialogue was dyadic social talk about
food in two phases. The rst phase was a ‘joshing’ stage where
the system engaged the user in a short chat about themselves and
about food, while producing puns and teasing. The second phase
was a guessing game, where the user attempted to guess the sys-
tem’s favourite dish. Each participant was recorded in the human-
machine, woz, and human-human conditions in both audio and
video. The same dialogue was performed in two separate sessions –
one with the system running in automatic mode, and another in
WOZ mode where a human chose WHEN to make the next utter-
ance but not WHAT to say. The WOZ and automatic conditions
were balanced to prevent any confounding order eects, and held
a week apart in order to reduce any priming eects. Participants
lled out questionnaires on sense of humour and assessments of
dialogue quality after each interaction with the system (automatic
or WOZ) as described in [
3
]. The content of the human-human
sessions was designed to be as similar as possible to the human-
machine sessions, with pairs of subjects instructed to chat together
and then to play ‘Guess my favourite food’.
There were 22 participants (50% female) ranging in age from 18
to 40, all native English speakers with no connection to speech and
language technology. The recordings were made in a quiet room at
the Speech Communication Lab, Trinity College Dublin.
For the human-machine conditions, the subject was seated at
a table opposite a screen showing an image of a robot. This face
to face conguration is not totally natural but was necessary in
order to collect video of the subject’s face suitable for analysis.
The subject was asked to wait until the system spoke and then to
respond naturally. The interaction was controlled from an adjacent
room. In the WOZ case, the same experimenter controlled the
timing for all participants.
For the human-human recordings, pairs of subjects sat opposite
each other, with HD video cameras facing each of them as in Fig. 1.
The cameras were mounted together on a single tripod between
the speakers at around chest height to allow both participants a
clear view of one another’s faces.
Both subjects were asked to chat for a few minutes but no in-
structions were given on what to talk about. Care was taken to
ensure that all participants understood that they were free to talk
or not as the mood took them. After the chat phase, subjects played
the guessing game together. The recordings have been segmented,
synchronized, and are currently being transcribed. They have also
been annotated for engagement and for dialog acts using the ISO
standard. sectionExperiments The data collected are being used in
recognition of engagement in conversation, to contrast timing in
human-human and human-machine talk, and to explore the archi-
tecture of casual talk in terms of chat and chunk phases. We have
Figure 1: Human Human Setup
discovered signicant dierences in the structure of chat and chunk
phases in multiparty casual conversation in terms of the timing and
amount of speech and silence among participants and are currently
applying the same methodology to dyadic conversations using the
human-human recordings from the CARA data collection. Results
will be available at the workshop.
4 CONCLUSIONS
We have described the rationale for, design and implementation of
a data collection for use in studies of social talk between humans
and machines. It is hoped that knowledge gained in this work will
inform the design of social talk in human-machine applications.
ACKNOWLEDGMENTS
This work is supported by the CHISTERA-JOKER project and by
Science Foundation Ireland through the CNGL Programme (Grant
12/CE/I2267) in the ADAPT Centre at Trinity College Dublin
REFERENCES
[1]
David Abercrombie. 1956. Problems and principles: Studies in the Teaching of
English as a Second Language. Longmans, Green.
[2] Douglas Biber, Stig Johansson, Georey Leech, Susan Conrad, Edward Finegan,
and Randolph Quirk. 1999. Longman grammar of spoken and written English.
Vol. 2. Longman London.
[3]
L. Devillers, S. Rosset, G. D. Duplessis, M. A. Sehili, L. BÃľchade, A. Delaborde, C.
Gossart, V. Letard, F. Yang, Y. Yemez, B. B. TÃijrker, M. Sezgin, K. E. Haddad, S.
Dupont, D. Luzzati, Y. Esteve, E. Gilmartin, and N. Campbell. 2015. Multimodal
data collection of human-robot humorous interactions in the Joker project. In
2015 International Conference on Aective Computing and Intelligent Interaction
(ACII). 348–354. https://doi.org/10.1109/ACII.2015.7344594
[4]
S. Eggins and D. Slade. 2004. Analysing casual conversation. Equinox Publishing
Ltd.
[5]
Samuel Ichiyé Hayakawa. 1990. Language in thought and action. Houghton
Miin Harcourt.
[6]
John Laver. 1975. Communicative functions of phatic communion. Organization
of behavior in face-to-face interaction (1975), 215–238.
[7]
B. Malinowski. 1923. The problem of meaning in primitive languages. Supple-
mentary in the Meaning of Meaning (1923), 1–84.
[8]
Klaus P. Schneider. 1988. Small talk: Analysing phatic discourse. Vol. 1. Hitzeroth
Marburg. http://www.getcited.org/pub/102832247
[9]
Eija Ventola. 1979. The structure of casual conversation in English. Journal of
Pragmatics 3, 3 (1979), 267–298.
32
... Superficial cues of human-likeness in speech agents, such as expressive synthetic voices [2] that use conversational rules and structures adapted from HHD [53,61], prompt frequent comparisons with humans among users [38]. Indeed, human-like heuristic models seem to act as an anchor for users' initial expectations [38,42,49]. ...
... The dimensionality identified is also useful for informing how other design choices may influence partner models. For example, expressive synthesis [25] and the use of more social talk [61] are likely to have an influence on specific model dimensions. Our work is an important first step in allowing researchers to explore how specific design choices affect these models more specifically. ...
Preprint
Full-text available
Perceptions of system competence and communicative ability, termed partner models, play a significant role in speech interface interaction. Yet we do not know what the core dimensions of this concept are. Taking a psycholexical approach, our paper is the first to identify the key dimensions that define partner models in speech agent interaction. Through a repertory grid study (N=21), a review of key subjective questionnaires, an expert review of resulting word pairs and an online study of 356 user of speech interfaces, we identify three key dimensions that make up a users' partner model: 1) perceptions toward competence and capability; 2) assessment of human-likeness; and 3) a system's perceived cognitive flexibility. We discuss the implications for partner modelling as a concept, emphasising the importance of salience and the dynamic nature of these perceptions.
... Recent work has begun to explore within what context speech agents may be able to interrupt [9], yet we currently do not know how these interruptions should be designed, especially in contexts where this information may be urgent or time sensitive. Similar to other speech technology work [13,16,28], our study aims to gather insight from humanhuman interaction to inform speech technology design. Specifically we look to identify how to design proactive agent interruptions through through a mixed-methods analysis of how people interrupt others when they are busy conducting a complex task. ...
Preprint
Full-text available
Current speech agent interactions are typically user-initiated, limiting the interactions they can deliver. Future functionality will require agents to be proactive, sometimes interrupting users. Little is known about how these spoken interruptions should be designed, especially in urgent interruption contexts. We look to inform design of proactive agent interruptions through investigating how people interrupt others engaged in complex tasks. We therefore developed a new technique to elicit human spoken interruptions of people engaged in other tasks. We found that people interrupted sooner when interruptions were urgent. Some participants used access rituals to forewarn interruptions, but most rarely used them. People balanced speed and accuracy in timing interruptions, often using cues from the task they interrupted. People also varied phrasing and delivery of interruptions to reflect urgency. We discuss how our findings can inform speech agent design and how our paradigm can help gain insight into human interruptions in new contexts.
... While the reviews of speech interactions have been valuable to understand real use of these systems, we are not aware of research that does a conversational analysis on this in-the-wild data, though several have explored use of spoken language with computers through other methodologies [22,23]. ...
... Research on humanness has typically focused on robots and embodied conversational agents, pointing out how embodied features, like the agent's appearance, its gestures and facial expressions affect whether and how users perceive humanness in the agent (Ho & Mac-Dorman, 2017;Schwind et al., 2018). Several works also explored conversations with intelligent personal assistants which emulate human aspects of speech (Gilmartin et al., 2017). This review points out that humanness is a central topic in research that studies human-chatbot interaction as well. ...
Article
Over the last ten years there has been a growing interest around text-based chatbots, software applications interacting with humans using natural written language. However, despite the enthusiastic market predictions, ‘conversing’ with this kind of agents seems to raise issues that go beyond their current technological limitations, directly involving the human side of interaction. By adopting a Human-Computer Interaction (HCI) lens, in this article we present a systematic literature review of 83 papers that focus on how users interact with text-based chatbots. We map the relevant themes that are recurrent in the last ten years of research, describing how people experience the chatbot in terms of satisfaction, engagement, and trust, whether and why they accept and use this technology, how they are emotionally involved, what kinds of downsides can be observed in human-chatbot conversations, and how the chatbot is perceived in terms of its humanness. On the basis of these findings, we highlight open issues in current research and propose a number of research opportunities that could be tackled in future years.
... IPAs are commonly used for information search, controlling music applications, setting alarms and timers, and to control IoT (Internet of Things) devices (e.g. smart lights) [4], through limited question-answer type dialogues [23,34]. Especially through smart speakers, these assistants can be used by multiple people at once, becoming a social focal point [34]. ...
... IPAs are commonly used for information search, controlling music applications, setting alarms and timers, and to control IoT (Internet of Things) devices (e.g. smart lights) [4], through limited question-answer type dialogues [23,34]. Especially through smart speakers, these assistants can be used by multiple people at once, becoming a social focal point [34]. ...
Preprint
Limited linguistic coverage for Intelligent Personal Assistants (IPAs) means that many interact in a non-native language. Yet we know little about how IPAs currently support or hinder these users. Through native (L1) and non-native (L2) English speakers interacting with Google Assistant on a smartphone and smart speaker, we aim to understand this more deeply. Interviews revealed that L2 speakers prioritised utterance planning around perceived linguistic limitations, as opposed to L1 speakers prioritising succinctness because of system limitations. L2 speakers see IPAs as insensitive to linguistic needs resulting in failed interaction. L2 speakers clearly preferred using smartphones, as visual feedback supported diagnoses of communication breakdowns whilst allowing time to process query results. Conversely, L1 speakers preferred smart speakers, with audio feedback being seen as sufficient. We discuss the need to tailor the IPA experience for L2 users, emphasising visual feedback whilst reducing the burden of language production.
... Our findings are limited to a relatively constrained linguistic task of IPA interaction. IPAs are generally designed to perform simple tasks [3,13] through question-answer adjacency pair dialogues [20,38], rather than being designed for more conversational or open-ended speech tasks [10,16]. It is important that future research considers the nature of L2 speech behaviours in these more open-ended scenarios. ...
Preprint
Through proliferation on smartphones and smart speakers, intelligent personal assistants (IPAs) have made speech a common interaction modality. Yet, due to linguistic coverage and varying levels of functionality, many speakers engage with IPAs using a non-native language. This may impact the mental workload and pattern of language production displayed by non-native speakers. We present a mixed-design experiment, wherein native (L1) and non-native (L2) English speakers completed tasks with IPAs through smartphones and smart speakers. We found significantly higher mental workload for L2 speakers during IPA interactions. Contrary to our hypotheses, we found no significant differences between L1 and L2 speakers in terms of number of turns, lexical complexity, diversity, or lexical adaptation when encountering errors. These findings are discussed in relation to language production and processing load increases for L2 speakers in IPA interaction.
Article
The main concern of this paper is an exploration of the structure of casual conversation. It will be argued that this structure varies according to the social distance (Hasan 1978a) between the interactants. This variation in turn permits a structural classification of conversation into two types: minimal and non-minimal. The functions of these two types of conversation differ: the first serves to establish and/or maintain contact between interactants while the second is an expression of greater involvement.
Communicative functions of phatic communion. Organization of behavior in face-to-face interaction
  • John Laver
John Laver. 1975. Communicative functions of phatic communion. Organization of behavior in face-to-face interaction (1975), 215-238.
John Laver. 1975. Communicative functions of phatic communion. Organization of behavior in face-to-face interaction
  • John Laver