Social Talk: Making Conversation with People and Machine
Trinity College Dublin
Grenoble INP Phelma
Trinity College Dublin
Trinity College Dublin
Trinity College Dublin
Benjamin R. Cowan
Trinity College Dublin/University
Trinity College Dublin
Social or interactive talk diers from task-based or instrumental
interactions in many ways. Quantitative knowledge of these dier-
ences will aid the design of convincing human-machine interfaces
for applications requiring machines to take on roles including so-
cial companions, healthcare providers, or tutors. We briey review
accounts of social talk from the literature. We outline a three part
data collection of human-human, human-woz and human-machine
dialogs incorporating light social talk and a guessing game. We
nally describe our ongoing experiments on the corpus collected.
•Human-centered computing →Natural language interfaces
Interaction design theory, concepts and paradigms
puting methodologies →Discourse, dialogue and pragmat-
HCI, casual conversation, intelligent agents
ACM Reference Format:
Emer Gilmartin, Marine Collery, Ketong Su, Yuyun Huang, Christy Elias,
Benjamin R. Cowan, and Nick Campbell. 2017. Social Talk: Making Con-
versation with People and Machine. In Proceedings of 1st ACM SIGCHI
International Workshop on Investigating Social Interactions with Articial
Agents (ISIAA’17). ACM, New York, NY, USA, 2pages. https://doi.org/10.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ISIAA’17, November 13, 2017, Glasgow, UK
©2017 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-5558-2/17/11.. .$15.00
1 CASUAL SOCIAL CONVERSATION
Casual social conversation,‘talking just for the sake of talking’[
includes smalltalk, gossip, and conversational narrative and has
been described as the most basic use of speech [
theorize that such talk functions to build social bonds and avoid un-
friendly silence, rather than simply exchange linguistic information
or express thought.
Early work focused on conversational openings and closings
where small talk performs a transitional function from silence to
the ‘meat’ of interaction [
]. Casual conversation has distinct phases
]. Schneider noted that casual talk did not seem to follow Gricean
maxims and contains idling sequences of repetitions of agreeing
tails which keep the conversation going rather than add any new
]. Eggins and Slade viewed casual conversation as
the space in which people form and rene their social reality [
Task-based conversations are bounded by task completion and tend
to be short, casual conversation can go on indenitely. Syntactical,
lexical, and discourse features of (casual) conversation are described
in Biber and Leech’s work on the Longman Corpus of Spoken and
Written English (LSWE) and in Schneider’s work [2,8].
As the goal of casual conversation is to ‘pass the time’ rather
than simply exchange information, the success of such interactions
may depend more on para- or extra-linguistic factors. For example,
a short gap may make the conversation seem rushed or uncom-
fortable, whereas an overlong silence might signal boredom. It has
been theorised that the timing and the ‘feel’ of a conversation is
vital to the success of social conversation [1,5].
We concentrate on the architecture of casual conversation in
terms of chat and chunk phases, as this will greatly aect the
‘dosage’ of talk in articial systems. We are also interested in pauses,
gaps and overlap in social talk which has implications for endpoint-
ing and turntaking protocols. We have built a spoken dialog system
and collected a corpus of dyadic conversations as described below.
2 CARA DIALOGUE SYSTEM
CARA is a custom-built Java-based spoken dialog system based
on the architectural principles used in SEMAINE and in the IrisTK
ISIAA’17, November 13, 2017, Glasgow, UK E. Gilmartin et al.
toolkit. The latest iteration uses CMU’s Sphinx ASR and Cereproc’s
Caitlin Irish accented voice, but is congurable to use other ASR
and TTS applications. A WOZ system has been integrated into the
system which allows a WOZ user interface to be generated automat-
ically for any dialogue ow. Both the automatic and WOZ systems
are browser-based. Using this system, we have collected a corpus
of human-machine, human-WOZ and human-human dialogues.
3 CORPUS DESIGN AND DATA COLLECTION
The domain for the system dialogue was dyadic social talk about
food in two phases. The rst phase was a ‘joshing’ stage where
the system engaged the user in a short chat about themselves and
about food, while producing puns and teasing. The second phase
was a guessing game, where the user attempted to guess the sys-
tem’s favourite dish. Each participant was recorded in the human-
machine, woz, and human-human conditions in both audio and
video. The same dialogue was performed in two separate sessions –
one with the system running in automatic mode, and another in
WOZ mode where a human chose WHEN to make the next utter-
ance but not WHAT to say. The WOZ and automatic conditions
were balanced to prevent any confounding order eects, and held
a week apart in order to reduce any priming eects. Participants
lled out questionnaires on sense of humour and assessments of
dialogue quality after each interaction with the system (automatic
or WOZ) as described in [
]. The content of the human-human
sessions was designed to be as similar as possible to the human-
machine sessions, with pairs of subjects instructed to chat together
and then to play ‘Guess my favourite food’.
There were 22 participants (50% female) ranging in age from 18
to 40, all native English speakers with no connection to speech and
language technology. The recordings were made in a quiet room at
the Speech Communication Lab, Trinity College Dublin.
For the human-machine conditions, the subject was seated at
a table opposite a screen showing an image of a robot. This face
to face conguration is not totally natural but was necessary in
order to collect video of the subject’s face suitable for analysis.
The subject was asked to wait until the system spoke and then to
respond naturally. The interaction was controlled from an adjacent
room. In the WOZ case, the same experimenter controlled the
timing for all participants.
For the human-human recordings, pairs of subjects sat opposite
each other, with HD video cameras facing each of them as in Fig. 1.
The cameras were mounted together on a single tripod between
the speakers at around chest height to allow both participants a
clear view of one another’s faces.
Both subjects were asked to chat for a few minutes but no in-
structions were given on what to talk about. Care was taken to
ensure that all participants understood that they were free to talk
or not as the mood took them. After the chat phase, subjects played
the guessing game together. The recordings have been segmented,
synchronized, and are currently being transcribed. They have also
been annotated for engagement and for dialog acts using the ISO
standard. sectionExperiments The data collected are being used in
recognition of engagement in conversation, to contrast timing in
human-human and human-machine talk, and to explore the archi-
tecture of casual talk in terms of chat and chunk phases. We have
Figure 1: Human Human Setup
discovered signicant dierences in the structure of chat and chunk
phases in multiparty casual conversation in terms of the timing and
amount of speech and silence among participants and are currently
applying the same methodology to dyadic conversations using the
human-human recordings from the CARA data collection. Results
will be available at the workshop.
We have described the rationale for, design and implementation of
a data collection for use in studies of social talk between humans
and machines. It is hoped that knowledge gained in this work will
inform the design of social talk in human-machine applications.
This work is supported by the CHISTERA-JOKER project and by
Science Foundation Ireland through the CNGL Programme (Grant
12/CE/I2267) in the ADAPT Centre at Trinity College Dublin
David Abercrombie. 1956. Problems and principles: Studies in the Teaching of
English as a Second Language. Longmans, Green.
 Douglas Biber, Stig Johansson, Georey Leech, Susan Conrad, Edward Finegan,
and Randolph Quirk. 1999. Longman grammar of spoken and written English.
Vol. 2. Longman London.
L. Devillers, S. Rosset, G. D. Duplessis, M. A. Sehili, L. BÃľchade, A. Delaborde, C.
Gossart, V. Letard, F. Yang, Y. Yemez, B. B. TÃĳrker, M. Sezgin, K. E. Haddad, S.
Dupont, D. Luzzati, Y. Esteve, E. Gilmartin, and N. Campbell. 2015. Multimodal
data collection of human-robot humorous interactions in the Joker project. In
2015 International Conference on Aective Computing and Intelligent Interaction
(ACII). 348–354. https://doi.org/10.1109/ACII.2015.7344594
S. Eggins and D. Slade. 2004. Analysing casual conversation. Equinox Publishing
Samuel Ichiyé Hayakawa. 1990. Language in thought and action. Houghton
John Laver. 1975. Communicative functions of phatic communion. Organization
of behavior in face-to-face interaction (1975), 215–238.
B. Malinowski. 1923. The problem of meaning in primitive languages. Supple-
mentary in the Meaning of Meaning (1923), 1–84.
Klaus P. Schneider. 1988. Small talk: Analysing phatic discourse. Vol. 1. Hitzeroth
Eija Ventola. 1979. The structure of casual conversation in English. Journal of
Pragmatics 3, 3 (1979), 267–298.