Conference PaperPDF Available

Social talk: making conversation with people and machine

Authors:

Abstract and Figures

Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected.
Content may be subject to copyright.
Social Talk: Making Conversation with People and Machine
Emer Gilmartin
Trinity College Dublin
Ireland
gilmare@tcd.ie
Marine Collery
Grenoble INP Phelma
France
marine.collery@grenoble-inp.org
Ketong Su
Trinity College Dublin
Ireland
kesu@tcd.ie
Yuyun Huang
Trinity College Dublin
Ireland
huangyu@tcd.ie
Christy Elias
Trinity College Dublin
Ireland
eliasc@tcd.ie
Benjamin R. Cowan
Trinity College Dublin/University
College Dublin
Ireland
benjamin.cowan@ucd.ie
Nick Campbell
Trinity College Dublin
Ireland
nick@tcd.ie
ABSTRACT
Social or interactive talk diers from task-based or instrumental
interactions in many ways. Quantitative knowledge of these dier-
ences will aid the design of convincing human-machine interfaces
for applications requiring machines to take on roles including so-
cial companions, healthcare providers, or tutors. We briey review
accounts of social talk from the literature. We outline a three part
data collection of human-human, human-woz and human-machine
dialogs incorporating light social talk and a guessing game. We
nally describe our ongoing experiments on the corpus collected.
CCS CONCEPTS
Human-centered computing Natural language interfaces
;
Interaction design theory, concepts and paradigms
;
Com-
puting methodologies Discourse, dialogue and pragmat-
ics;
KEYWORDS
HCI, casual conversation, intelligent agents
ACM Reference Format:
Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias,
Benjamin R. Cowan, and Nick Campbell. 2017. Social Talk: Making Con-
versation with People and Machine. In Proceedings of 1st ACM SIGCHI
International Workshop on Investigating Social Interactions with Articial
Agents (ISIAA’17). ACM, New York, NY, USA, 2pages. https://doi.org/10.
1145/3139491.3139494
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ISIAA’17, November 13, 2017, Glasgow, UK
©2017 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-5558-2/17/11.. .$15.00
https://doi.org/10.1145/3139491.3139494
1 CASUAL SOCIAL CONVERSATION
Casual social conversation,‘talking just for the sake of talking’[
4
],
includes smalltalk, gossip, and conversational narrative and has
been described as the most basic use of speech [
7
]. Researchers
theorize that such talk functions to build social bonds and avoid un-
friendly silence, rather than simply exchange linguistic information
or express thought.
Early work focused on conversational openings and closings
where small talk performs a transitional function from silence to
the ‘meat’ of interaction [
6
]. Casual conversation has distinct phases
[
9
]. Schneider noted that casual talk did not seem to follow Gricean
maxims and contains idling sequences of repetitions of agreeing
tails which keep the conversation going rather than add any new
information [
8
]. Eggins and Slade viewed casual conversation as
the space in which people form and rene their social reality [
4
].
Task-based conversations are bounded by task completion and tend
to be short, casual conversation can go on indenitely. Syntactical,
lexical, and discourse features of (casual) conversation are described
in Biber and Leech’s work on the Longman Corpus of Spoken and
Written English (LSWE) and in Schneider’s work [2,8].
As the goal of casual conversation is to ‘pass the time’ rather
than simply exchange information, the success of such interactions
may depend more on para- or extra-linguistic factors. For example,
a short gap may make the conversation seem rushed or uncom-
fortable, whereas an overlong silence might signal boredom. It has
been theorised that the timing and the ‘feel’ of a conversation is
vital to the success of social conversation [1,5].
We concentrate on the architecture of casual conversation in
terms of chat and chunk phases, as this will greatly aect the
‘dosage’ of talk in articial systems. We are also interested in pauses,
gaps and overlap in social talk which has implications for endpoint-
ing and turntaking protocols. We have built a spoken dialog system
and collected a corpus of dyadic conversations as described below.
2 CARA DIALOGUE SYSTEM
CARA is a custom-built Java-based spoken dialog system based
on the architectural principles used in SEMAINE and in the IrisTK
31
ISIAA’17, November 13, 2017, Glasgow, UK E. Gilmartin et al.
toolkit. The latest iteration uses CMU’s Sphinx ASR and Cereproc’s
Caitlin Irish accented voice, but is congurable to use other ASR
and TTS applications. A WOZ system has been integrated into the
system which allows a WOZ user interface to be generated automat-
ically for any dialogue ow. Both the automatic and WOZ systems
are browser-based. Using this system, we have collected a corpus
of human-machine, human-WOZ and human-human dialogues.
3 CORPUS DESIGN AND DATA COLLECTION
The domain for the system dialogue was dyadic social talk about
food in two phases. The rst phase was a ‘joshing’ stage where
the system engaged the user in a short chat about themselves and
about food, while producing puns and teasing. The second phase
was a guessing game, where the user attempted to guess the sys-
tem’s favourite dish. Each participant was recorded in the human-
machine, woz, and human-human conditions in both audio and
video. The same dialogue was performed in two separate sessions –
one with the system running in automatic mode, and another in
WOZ mode where a human chose WHEN to make the next utter-
ance but not WHAT to say. The WOZ and automatic conditions
were balanced to prevent any confounding order eects, and held
a week apart in order to reduce any priming eects. Participants
lled out questionnaires on sense of humour and assessments of
dialogue quality after each interaction with the system (automatic
or WOZ) as described in [
3
]. The content of the human-human
sessions was designed to be as similar as possible to the human-
machine sessions, with pairs of subjects instructed to chat together
and then to play ‘Guess my favourite food’.
There were 22 participants (50% female) ranging in age from 18
to 40, all native English speakers with no connection to speech and
language technology. The recordings were made in a quiet room at
the Speech Communication Lab, Trinity College Dublin.
For the human-machine conditions, the subject was seated at
a table opposite a screen showing an image of a robot. This face
to face conguration is not totally natural but was necessary in
order to collect video of the subject’s face suitable for analysis.
The subject was asked to wait until the system spoke and then to
respond naturally. The interaction was controlled from an adjacent
room. In the WOZ case, the same experimenter controlled the
timing for all participants.
For the human-human recordings, pairs of subjects sat opposite
each other, with HD video cameras facing each of them as in Fig. 1.
The cameras were mounted together on a single tripod between
the speakers at around chest height to allow both participants a
clear view of one another’s faces.
Both subjects were asked to chat for a few minutes but no in-
structions were given on what to talk about. Care was taken to
ensure that all participants understood that they were free to talk
or not as the mood took them. After the chat phase, subjects played
the guessing game together. The recordings have been segmented,
synchronized, and are currently being transcribed. They have also
been annotated for engagement and for dialog acts using the ISO
standard. sectionExperiments The data collected are being used in
recognition of engagement in conversation, to contrast timing in
human-human and human-machine talk, and to explore the archi-
tecture of casual talk in terms of chat and chunk phases. We have
Figure 1: Human Human Setup
discovered signicant dierences in the structure of chat and chunk
phases in multiparty casual conversation in terms of the timing and
amount of speech and silence among participants and are currently
applying the same methodology to dyadic conversations using the
human-human recordings from the CARA data collection. Results
will be available at the workshop.
4 CONCLUSIONS
We have described the rationale for, design and implementation of
a data collection for use in studies of social talk between humans
and machines. It is hoped that knowledge gained in this work will
inform the design of social talk in human-machine applications.
ACKNOWLEDGMENTS
This work is supported by the CHISTERA-JOKER project and by
Science Foundation Ireland through the CNGL Programme (Grant
12/CE/I2267) in the ADAPT Centre at Trinity College Dublin
REFERENCES
[1]
David Abercrombie. 1956. Problems and principles: Studies in the Teaching of
English as a Second Language. Longmans, Green.
[2] Douglas Biber, Stig Johansson, Georey Leech, Susan Conrad, Edward Finegan,
and Randolph Quirk. 1999. Longman grammar of spoken and written English.
Vol. 2. Longman London.
[3]
L. Devillers, S. Rosset, G. D. Duplessis, M. A. Sehili, L. BÃľchade, A. Delaborde, C.
Gossart, V. Letard, F. Yang, Y. Yemez, B. B. TÃijrker, M. Sezgin, K. E. Haddad, S.
Dupont, D. Luzzati, Y. Esteve, E. Gilmartin, and N. Campbell. 2015. Multimodal
data collection of human-robot humorous interactions in the Joker project. In
2015 International Conference on Aective Computing and Intelligent Interaction
(ACII). 348–354. https://doi.org/10.1109/ACII.2015.7344594
[4]
S. Eggins and D. Slade. 2004. Analysing casual conversation. Equinox Publishing
Ltd.
[5]
Samuel Ichiyé Hayakawa. 1990. Language in thought and action. Houghton
Miin Harcourt.
[6]
John Laver. 1975. Communicative functions of phatic communion. Organization
of behavior in face-to-face interaction (1975), 215–238.
[7]
B. Malinowski. 1923. The problem of meaning in primitive languages. Supple-
mentary in the Meaning of Meaning (1923), 1–84.
[8]
Klaus P. Schneider. 1988. Small talk: Analysing phatic discourse. Vol. 1. Hitzeroth
Marburg. http://www.getcited.org/pub/102832247
[9]
Eija Ventola. 1979. The structure of casual conversation in English. Journal of
Pragmatics 3, 3 (1979), 267–298.
32
... Superficial cues of human-likeness in speech agents, such as expressive synthetic voices [2] that use conversational rules and structures adapted from HHD [53,61], prompt frequent comparisons with humans among users [38]. Indeed, human-like heuristic models seem to act as an anchor for users' initial expectations [38,42,49]. ...
... The dimensionality identified is also useful for informing how other design choices may influence partner models. For example, expressive synthesis [25] and the use of more social talk [61] are likely to have an influence on specific model dimensions. Our work is an important first step in allowing researchers to explore how specific design choices affect these models more specifically. ...
Conference Paper
Full-text available
Perceptions of system competence and communicative ability, termed partner models, play a significant role in speech interface interaction. Yet we do not know what the core dimensions of this concept are. Taking a psycholexical approach, our paper is the first to identify the key dimensions that define partner models in speech agent interaction. Through a repertory grid study (N=21), a review of key subjective questionnaires, an expert review of resulting word pairs and an online study of 356 users of speech interfaces, we identify three key dimensions that make up a users’ partner model: 1) perceptions towards partner competence and dependability; 2) assessment of human-likeness; and 3) a system’s perceived cognitive flexibility. We discuss the implications for partner modelling as a concept, emphasising the importance of salience and the dynamic nature of these perceptions.
... Voice assistants are marketed and designed to mimic human-like conversations [21,26]. Therefore, users may treat VAs as if they are engaging in human dialogue [13,17], specifically older adults [40]. ...
Preprint
Full-text available
Supporting older adults in health self-management is crucial for promoting independent aging, particularly given the growing strain on healthcare systems. While voice assistants (VAs) hold the potential to support aging in place, they often lack tailored assistance and present usability challenges. We addressed these issues through a five-stage design process with older adults to develop a personal health assistant. Starting with in-home interviews (N=17), we identified two primary challenges in older adult's health self-management: health awareness and medical adherence. To address these challenges, we developed a high-fidelity LLM-powered VA prototype to debrief doctor's visit notes and generate tailored medication reminders. We refined our prototype with feedback from co-design workshops (N=10) and validated its usability through in-home studies (N=5). Our work highlights key design features for personal health assistants and provides broader insights into desirable VA characteristics, including personalization, adapting to user context, and respect for user autonomy.
... To endow an ECA with a human-like conversational behaviour, a TTS system can not just synthesise understandable speech at a fast rate. Instead, it needs to account for further speech nuances in order to reproduce elements and peculiarities of human conversations [8]. ...
Chapter
A text-to-speech (TTS) synthesiser has to generate intelligible and natural speech while modelling linguistic and paralinguistic components characterising human voice. In this work, we present ITAcotron 2, an Italian TTS synthesiser able to generate speech in several voices. In its development, we explored the power of transfer learning by iteratively fine-tuning an English Tacotron 2 spectrogram predictor on different Italian data sets. Moreover, we introduced a conditioning strategy to enable ITAcotron 2 to generate new speech in the voice of a variety of speakers. To do so, we examined the zero-shot behaviour of a speaker encoder architecture, previously trained to accomplish a speaker verification task with English speakers, to represent Italian speakers’ voiceprints. We asked 70 volunteers to evaluate intelligibility, naturalness, and similarity between synthesised voices and real speech from target speakers. Our model achieved a MOS score of 4.15 in intelligibility, 3.32 in naturalness, and 3.45 in speaker similarity. These results showed the successful adaptation of the refined system to the new language and its ability to synthesise novel speech in the voice of several speakers.
... Imitating the structure and rules of human dialog helps generate similar dialog capabilities for CAs (Gilmartin et al., 2017). Hence, integrating strategies and rules of human communication into the reply design of CAs may be of great importance to improve the UX with CAs. ...
Article
Conversational agents (CAs) have recently become ubiquitous. Smart speakers, mobile phone voice assistants, and in-car voice assistants have entered our lives. Studies have examined some factors influencing the user experience (UX) of CAs. However, there is little research on CAs’ reply design, especially when the information the users need is uncertain, which is regarded as an uncertain information scenario. The current research mainly focuses on measuring the UX with CAs’ replies in uncertain information scenarios. We designed two reply strategies of CAs, namely, a further inquiry strategy and a list-style reply strategy, to improve UX in this kind of uncertain information scenario. Two studies were designed and conducted based on the E-prime platform to verify the effect of the two reply strategies on UX. In Study 1, we verified the influence of inquiry strategy with different address terms for users on UX. In Study 2, we verified the influence of the list-style reply strategy (explicit and implicit) on UX. The gender differences in the evaluation of the two reply strategies were also examined. The results showed that, in the uncertain information scenarios, the reply strategy of further inquiries received a higher UX evaluation than direct replies in Study 1. Moreover, male participants preferred the “master” address to the “nin” address. However, male participants had no significant preference for further inquiries. For male participants, there was no significant difference in the UX evaluations for the “master” address and “nin” address. List-style replies with ranking information received the highest UX ratings, followed by list-style replies without ranking information and direct replies in Study 2, indicating that users preferred the explicit reply design of CAs. In addition, we found that male participants tended to have a higher rating of CA replies for all reply methods than female participants in our studies, suggesting that women have higher expectations for the reply design of CAs. In general, these results may contribute to the design of CA replies and highlight the importance of personalizing CA language.
... Recent work has begun to explore within what context speech agents may be able to interrupt [9], yet we currently do not know how these interruptions should be designed, especially in contexts where this information may be urgent or time sensitive. Similar to other speech technology work [13,16,28], our study aims to gather insight from humanhuman interaction to inform speech technology design. Specifically we look to identify how to design proactive agent interruptions through through a mixed-methods analysis of how people interrupt others when they are busy conducting a complex task. ...
Preprint
Full-text available
Current speech agent interactions are typically user-initiated, limiting the interactions they can deliver. Future functionality will require agents to be proactive, sometimes interrupting users. Little is known about how these spoken interruptions should be designed, especially in urgent interruption contexts. We look to inform design of proactive agent interruptions through investigating how people interrupt others engaged in complex tasks. We therefore developed a new technique to elicit human spoken interruptions of people engaged in other tasks. We found that people interrupted sooner when interruptions were urgent. Some participants used access rituals to forewarn interruptions, but most rarely used them. People balanced speed and accuracy in timing interruptions, often using cues from the task they interrupted. People also varied phrasing and delivery of interruptions to reflect urgency. We discuss how our findings can inform speech agent design and how our paradigm can help gain insight into human interruptions in new contexts.
... While the reviews of speech interactions have been valuable to understand real use of these systems, we are not aware of research that does a conversational analysis on this in-the-wild data, though several have explored use of spoken language with computers through other methodologies [22,23]. ...
... Research on humanness has typically focused on robots and embodied conversational agents, pointing out how embodied features, like the agent's appearance, its gestures and facial expressions affect whether and how users perceive humanness in the agent (Ho & Mac-Dorman, 2017;Schwind et al., 2018). Several works also explored conversations with intelligent personal assistants which emulate human aspects of speech (Gilmartin et al., 2017). This review points out that humanness is a central topic in research that studies human-chatbot interaction as well. ...
Article
Over the last ten years there has been a growing interest around text-based chatbots, software applications interacting with humans using natural written language. However, despite the enthusiastic market predictions, ‘conversing’ with this kind of agents seems to raise issues that go beyond their current technological limitations, directly involving the human side of interaction. By adopting a Human-Computer Interaction (HCI) lens, in this article we present a systematic literature review of 83 papers that focus on how users interact with text-based chatbots. We map the relevant themes that are recurrent in the last ten years of research, describing how people experience the chatbot in terms of satisfaction, engagement, and trust, whether and why they accept and use this technology, how they are emotionally involved, what kinds of downsides can be observed in human-chatbot conversations, and how the chatbot is perceived in terms of its humanness. On the basis of these findings, we highlight open issues in current research and propose a number of research opportunities that could be tackled in future years.
Chapter
The elderly population grows worldwide while the size of families decreases. In addition, there is the figure of caregivers of the elderly, who play a crucial role in preserving these older adults’ health and reducing their loneliness. However, in the not-too-distant future, the availability of this professional in the market may become scarce to meet the growing demand arising from the aging of the world population. Therefore, it is necessary to build systems that promote healthy aging. Social agents embedded in devices like Alexa can be a significant weapon in this scenario. However, the development of such agents is still in its infancy, and there are few guidelines to guide new projects. This work investigated the construction of an agent based on the common ground theory as a conversation starter, which proved to be effective in producing meaningful dialogues. The text reports this experience and presents a series of guidelines for future developers.KeywordGuidelinesRelational agentCommon groundElderlyWizard of Oz experiment
Article
The main concern of this paper is an exploration of the structure of casual conversation. It will be argued that this structure varies according to the social distance (Hasan 1978a) between the interactants. This variation in turn permits a structural classification of conversation into two types: minimal and non-minimal. The functions of these two types of conversation differ: the first serves to establish and/or maintain contact between interactants while the second is an expression of greater involvement.
Communicative functions of phatic communion. Organization of behavior in face-to-face interaction
  • John Laver
John Laver. 1975. Communicative functions of phatic communion. Organization of behavior in face-to-face interaction (1975), 215-238.
John Laver. 1975. Communicative functions of phatic communion. Organization of behavior in face-to-face interaction
  • John Laver
  • Laver John