Article

The role of beliefs in lexical alignment: Evidence from dialogs with humans and computers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Five experiments examined the extent to which speakers' alignment (i.e., convergence) on words in dialog is mediated by beliefs about their interlocutor. To do this, we told participants that they were interacting with another person or a computer in a task in which they alternated between selecting pictures that matched their 'partner's' descriptions and naming pictures themselves (though in reality all responses were scripted). In both text- and speech-based dialog, participants tended to repeat their partner's choice of referring expression. However, they showed a stronger tendency to align with 'computer' than with 'human' partners, and with computers that were presented as less capable than with computers that were presented as more capable. The tendency to align therefore appears to be mediated by beliefs, with the relevant beliefs relating to an interlocutor's perceived communicative capacity.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Voice user interfaces (VUIs), like Google Assistant, Apple's Siri, and Amazon Alexa, are now commonplace. Much effort has gone into identifying and realizing ideal forms of engagement and user experience (UX) with these technologies [17,63], whilst also understanding people's perceptions of VUIs as dialogue partners [21,24], termed partner models [13,14,21,24]. This refers to the cognitive representation of a dialogue partner's communicative competence and social relevance [13,20]. ...
... Much effort has gone into identifying and realizing ideal forms of engagement and user experience (UX) with these technologies [17,63], whilst also understanding people's perceptions of VUIs as dialogue partners [21,24], termed partner models [13,14,21,24]. This refers to the cognitive representation of a dialogue partner's communicative competence and social relevance [13,20]. Recent work [24] suggests three underlying dimensions for VUIs: competence and dependability, human-likeness, and communicative flexibility [24]. ...
... Partner models can affect UX, impacting trust in the system [48] and knowledge expectations [20]. They are also used to guide speakers toward appropriate speech and language choices for a given dialogue partner, increasing the chances of communicative success [13,25]. ...
Preprint
Full-text available
Recent research has begun to assess people's perceptions of voice user interfaces (VUIs) as dialogue partners, termed partner models. Current self-report measures are only available in English, limiting research to English-speaking users. To improve the diversity of user samples and contexts that inform partner modelling research, we translated, localized, and evaluated the Partner Modelling Questionnaire (PMQ) for non-English speaking Western (German, n=185) and East Asian (Japanese, n=198) cohorts where VUI use is popular. Through confirmatory factor analysis (CFA), we find that the scale produces equivalent levels of goodness-to-fit for both our German and Japanese translations, confirming its cross-cultural validity. Still, the structure of the communicative flexibility factor did not replicate directly across Western and East Asian cohorts. We discuss how our translations can open up critical research on cultural similarities and differences in partner model use and design, whilst highlighting the challenges for ensuring accurate translation across cultural contexts.
... When these interactions breakdown, users tend to adapt their language, using strategies such as hyperarticulating, changing the command structure, or altering their accent [48,53,58] to raise the chance that the VA will recognise the user's intended input [60]. This adaptation, as well as the selection of more simplistic utterances, is thought to be due to users seeing VAs as being at risk of communicative failure [14,53]. Similar to the concepts of recipient or audience design [7,50], this perception leads users to adapt their speech based on these perceived limitations so as to be more likely to communicate successfully with the system [14,22]. ...
... This adaptation, as well as the selection of more simplistic utterances, is thought to be due to users seeing VAs as being at risk of communicative failure [14,53]. Similar to the concepts of recipient or audience design [7,50], this perception leads users to adapt their speech based on these perceived limitations so as to be more likely to communicate successfully with the system [14,22]. ...
... Following on from Shneiderman, recent work on the nature of conversational interaction with speech agents suggests that, even though emulating human capabilities, such as context awareness, a dialogue with a system is a dierent genre of dialogue with its own norms and rules [20,58,60], with computers being seen as stereotypically more inexible in their capabilities compared to human interlocutors [14,21]. Previous work on user partner models of speech interfaces [25], suggests that the perceived exibility of a system, along with its human likeness and its perceived knowledge scaold user perceptions of a VA's competence as a conversational partner. ...
Conference Paper
Full-text available
Voice Agents (VAs) are touted as being able to help users in complex tasks such as cooking and interacting as a conversational partner to provide information and advice while the task is ongoing. Through conversation analysis of 7 cooking sessions with a commercial VA, we identify challenges caused by a lack of contextual awareness leading to irrelevant responses, misinterpretation of requests, and information overload. Informed by this, we evaluated 16 cooking sessions with a wizard-led context-aware VA. We observed more fluent interaction between humans and agents, including more complex requests, explicit grounding within utterances, and complex social responses. We discuss reasons for this, the potential for personalisation, and the division of labour in VA communication and proactivity. Then, we discuss the recent advances in generative models and the VAs interaction challenges. We propose limited context awareness in VAs as a step toward explainable, explorable conversational interfaces.
... The basic tenet of partner modelling is that people form a mental representation of their dialogue partner as a communicative and social entity [13,30]. Originating in psycholinguistics, the concept proposes that this mental representation informs what people say to a given interlocutor, how they say it, and the types of tasks someone might entrust their partner to carry out [13,15]. ...
... The basic tenet of partner modelling is that people form a mental representation of their dialogue partner as a communicative and social entity [13,30]. Originating in psycholinguistics, the concept proposes that this mental representation informs what people say to a given interlocutor, how they say it, and the types of tasks someone might entrust their partner to carry out [13,15]. Hence, partner models might also be understood as a heuristic account of a partner's communicative ability and social relevance that guides a speaker toward interaction and language behaviours that are appropriate for a given interlocutor. ...
... Recent work aimed at cultivating this type of knowledge has focused on a concept known as 'partner modelling' [15,33]. Partner models are said to reflect perceptions of a dialogue partner's communicative ability, and have been shown to influence language production in both human-human dialogue (HHD) and human-machine dialogue (HMD) [13,28], with people adapting their speech and language behaviours based on their partner model for both human and machine dialogue partners [13,30]. ...
Preprint
Full-text available
Recent work has looked to understand user perceptions of speech agent capabilities as dialogue partners (termed partner models), and how this affects user interaction. Yet, currently partner model effects are inferred from language production as no metrics are available to quantify these subjective perceptions more directly. Through three studies, we develop and validate the Partner Modelling Questionnaire (PMQ): an 18-item self-report semantic differential scale designed to reliably measure people's partner models of non-embodied speech interfaces. Through principal component analysis and confirmatory factor analysis, we show that the PMQ scale consists of three factors: communicative competence and dependability, human-likeness in communication, and communicative flexibility. Our studies show that the measure consistently demonstrates good internal reliability, strong test-retest reliability over 12 and 4-week intervals, and predictable convergent/divergent validity. Based on our findings we discuss the multidimensional nature of partner models, whilst identifying key future research avenues that the development of the PMQ facilitates. Notably, this includes the need to identify the activation, sensitivity, and dynamism of partner models in speech interface interaction.
... For instance, IAM predicts that alignment is primarily the result of automatic activation of linguistic representations (Pickering & Garrod, 2004). Recent research, however, has been increasingly devoted to the moderating effects of extralinguistic factors on linguistic alignment, showing that the degree of lexical alignment was modulated by speakers' perceptions about their interlocutor and thereby bringing to the fore an audience design component in alignment (Branigan, Pickering, Pearson, McLean, & Brown, 2011;Brennan & Clark, 1996;Suffill, Kutasi, Pickering, & Branigan, 2021). More specifically, the audience design account of alignment assumes that interlocutors converge on lexical use to facilitate mutual understanding with a consideration of communicative success, and this process involves belief-based judgment about what would be the most intelligible for the conversational partner. ...
... Some research has approached this issue by investigating whether the audience design account operates in HCI as well as in HHI, but the findings are still mixed and inconclusive. There is evidence in support of speakers' reliance on audience design when adapting their language choices to that of a computer interlocutor (Bergmann et al., 2015;Branigan et al., 2011). For instance, Branigan et al. (2011) observed that people aligned their lexical choices to a computer interlocutor more than to a human interlocutor, indicating that audience design is especially marked in HCI. ...
... There is evidence in support of speakers' reliance on audience design when adapting their language choices to that of a computer interlocutor (Bergmann et al., 2015;Branigan et al., 2011). For instance, Branigan et al. (2011) observed that people aligned their lexical choices to a computer interlocutor more than to a human interlocutor, indicating that audience design is especially marked in HCI. Their explanation of the finding is that, driven by the belief that the computer interlocutor had limited capabilities, speakers tended to reuse the words produced by the computer in order to increase the probabilities of being understood. ...
Article
Over the past decade, people's language behaviors towards computer partners have been capturing growing interests with the prevalence of interaction with dialogue systems. However, it remains controversial whether people perceive and respond to computer and human partners in the same way. To address this issue, the current study investigated whether speakers converge with their conversational partner on lexical choices (i.e., lexical alignment) to the same extent when they believed the partner was a human (i.e., human-human interaction, HHI) and when they believed the partner was a computer (i.e., human-computer interaction, HCI), and whether the strength of lexical alignment is moderated by individuals' social skills in the same fashion in HHI and HCI. A speech-based picture naming and matching task was adopted to measure participants' lexical alignment towards their conversational partner while participants' social skills were assessed by using the Chinese University-students Social Skill Inventory (ChUSSI). Results indicated that lexical alignment in HCI was stronger than that in HHI (79.5% vs. 58.6%). In addition, participants' social skills score, in particular the score on protecting Partner's Mianzi (i.e., dignity and prestige) in the ChUSSI significantly predicted participants' propensity of lexical alignment in HHI but didn't in HCI. More specifically, participants who were evaluated to be more concerned with others' social standing were significantly more likely to align with their partner in the HHI context (β = 0.896, Z = 2.847, p < 0.001), but this correlation did not hold in HCI (β = −0.333, Z = −1.241, p = 0.214). These findings shed light on the potential boundary between speakers' representations of human and computer interlocutors .
... Research into human conversation suggests that people adapt their speech and language choices in dialogue based on the perceived communicative competence and social relevance of their dialogue partner [3,11,12,15]. Essentially, it is argued that people make informal hypotheses about their partner's communicative abilities and tendencies based on stereotypical assumptions drawn from superficial cues (i.e., age, ethnicity, gender, etc.) [11,12]. Then, following exposure to a given partner, they develop a more nuanced heuristic or mental model of their interlocutor's capabilities [12]. ...
... Research into human conversation suggests that people adapt their speech and language choices in dialogue based on the perceived communicative competence and social relevance of their dialogue partner [3,11,12,15]. Essentially, it is argued that people make informal hypotheses about their partner's communicative abilities and tendencies based on stereotypical assumptions drawn from superficial cues (i.e., age, ethnicity, gender, etc.) [11,12]. Then, following exposure to a given partner, they develop a more nuanced heuristic or mental model of their interlocutor's capabilities [12]. ...
... In psycholinguistics these are referred to as partner models, which are said to support successful communication by guiding a speaker toward appropriate language choices for a given partner [11,12]. As such, partner modelling is closely associated with concepts like perspective taking and audience design [7], whereby speakers adopt an allocentric and empathetic stance, tailoring their speech for a specific audience (person or group). ...
Preprint
Full-text available
People form impressions of their dialogue partners, be they other people or machines, based on cues drawn from their communicative style. Recent work has suggested that the gulf between people's expectations and the reality of CUI interaction widens when these impressions are misaligned with the actual capabilities of conversational user interfaces (CUIs). This has led some to rally against a perceived overriding concern for naturalness, calling instead for more representative, or appropriate communicative cues. Indeed, some have argued for a move away from naturalness as a goal for CUI design and communication. We contend that naturalness need not be abandoned, if we instead aim for ecologically grounded design. We also suggest a way this might be achieved and call on CUI designers to embrace incompetence! By letting CUIs express uncertainty and embarrassment through ecologically valid and appropriate cues that are ubiquitous in human communication - CUI designers can achieve more appropriate communication without turning away from naturalness entirely.
... A growing body of work has shown that people mirror linguistic patterns produced by technology, as well. For example, people adopt the words and syntactic structures produced by a computer system [9,10] and the pronunciation patterns of text-to-speech (TTS) voices presented across a variety of forms [14,17,19,24,30,49,54,56,57]. However, the magnitude of mirroring often differs when making direct comparisons between a human and technological interlocutor. ...
... As seen in Figure 2.B., results show that for individual layers, mirroring is relatively stronger during the early layers (1)(2)(3)(4)(5). For each individual layer, we found that the early layers had a positive fixed effect, the middle layer (6-7) effects were around zero, and the later layers (8)(9)(10)(11)(12) were negative. There were no interactions between Layer and Prosody. ...
... Although most prior work on alignment has investigated human-human interactions, there is evidence that people engage in lexical alignment with non-human partners as well, including conversational agents [13], [14]. In fact, alignment is stronger when interacting with an automated partner as compared to another human, and stronger still when interacting with a computer that is allegedly "basic" versus "advanced" [15]. In general, people align more to partners that they believe are less linguistically competent, to make themselves more easily understood by their interlocutor [16]- [18]. ...
... Users aligned their word choice to the agent seemingly by default, even without feedback that it did not understand non-aligned words. This is consistent with prior work showing greater alignment to less linguistically competent partners [15]- [18], and thus, the mere knowledge that one's interlocutor is a non-human conversational agent may be enough to influence users' lexical production. ...
... The first research chapter-chapter 4considers the validity of this approach, demonstrating that language can be horizontally transmitted and inherited; with behaviour and interactions leading to and predicting, changes in language. This extends previous work with small sample work that did not exclude imitation (Branigan et al., 2011;Christopherson, 2011;Danescu-Niculescu-Mizil et al., 2011;De Looze et al., 2011;Hemphill and Otterbacher, 2012;Steinhauser et al., 2011). ...
... Within pairwise interactions, during that interaction, people's language becomes more similar to their partner's (Bonin et al., 2013;Danescu-Niculescu-Mizil et al., 2011;Ward, 2007). People imitate one another syntactically (Hemphill and Otterbacher, 2012) and semantically (Branigan et al., 2011;Mitchell, 2012). This imitation is known in linguistics as "Communication Accommodation Theory" (CAT). ...
Thesis
Full-text available
Twitter, social media and big data promise much in terms of terrorist signals amenable to analysis. As, however, these signals are noisy, subjectively ambiguous and new, this thesis addresses four questions that are key to reliably ‘tuning in’ to these signals. Each chapter uses big data to investigate patterns too subtle to have been amenable to prior study, with the importance of controlling for the noise associated with big data a central theme running through the thesis. Chapter 1 introduces the work, Chapter 2 reviews the relevant literature and Chapter 3 introduces and discusses the overarching methodology. Chapter 4 considers the validity of inferring information about users from their Twitter language and tweets. I demonstrate that language can be horizontally transmitted and inherited; with behaviour and interactions leading to and predicting, changes in language. This extends previous work with small sample work that did not exclude imitation. In Chapter 5, I characterise jihadist-linked accounts that resurge back from suspension—as identified with novel methods. I show that suspension is less disruptive than previous case studies implied, but that pseudoreplication has been underestimated (Wright, 2016). Having demonstrated the scale of resurgence, Chapter 6 tests whether automated machine methods can improve identification. I develop a text similarity based model and validate it against human-annotated data. The final research chapter, Chapter 7, tackles noise in big data when inferring information about events in the offline world. Extending similar work, I evaluate computational and human coded predictions of how positive geopolitical events are for Daesh. I demonstrate that while the Baqiya family tweets differently on different types of day, most patterns emerge as easily by chance in the negative control data. The work is novel as although some attempts have been made to address the questions in this thesis—or similar ones—using case studies, small samples and laboratory studies, all of these suffer limitations. Some studies have not asked the exact same question, some conclusions have been insufficiently supported with evidence and others have simply been beyond the reach of existing methods. Together, the pieces of work in this thesis shows that computational analysis of big data enables tuning in to subtle signals and sometimes reveals conclusions that contradict less developed research. Control noise, however, often contains as many patterns and thus, future studies should pay particular attention to their methodologies when using noisy, subjective, social media data. [Full text: ttps://pure.royalholloway.ac.uk/ws/portalfiles/portal/28018827/Shaun_Wright_Doctoral_Thesis.pdf]
... (See Cohn et al., 2022 for review of device-DS findings). Greater articulatory effort when talking to a device indicates that speakers have an assumption that there is a larger communicative barrier to overcome in HCI, relative to with human listeners (Branigan et al., 2011;Cowan et al., 2015). Thus, device-directed speech patterns suggest that people conceptualize technology as a less communicatively competent spoken language comprehender than human listeners Cohn et al., 2022). ...
Article
This article reviews recent literature investigating speech variation in production and comprehension during spoken language communication between humans and devices. Human speech patterns toward voice-AI presents a test to our scientific understanding about speech communication and language use. First, work exploring how human-AI interactions are similar to, or different from, human-human interactions in the realm of speech variation is reviewed. In particular, we focus on studies examining how users adapt their speech when resolving linguistic misunderstandings by computers and when accommodating their speech toward devices. Next, we consider work that investigates how top-down factors in the interaction can influence users' linguistic interpretations of speech produced by technological agents and how the ways in which speech is generated (via text-to-speech synthesis, TTS) and recognized (using automatic speech recognition technology, ASR) has an effect on communication. Throughout this review, we aim to bridge both HCI frameworks and theoretical linguistic models accounting for variation in human speech. We also highlight findings in this growing area that can provide insight to the cognitive and social representations underlying linguistic communication more broadly. Additionally, we touch on the implications of this line of work for addressing major societal issues in speech technology.
... Speech in challenging conditions, i.e., in noisy environments or towards less-proficient interlocutors, is frequently observed to be produced with increased vocal effort, characterised by increased fundamental frequency (f0) mean and range [e.g., [1][2][3][4][5][6][7]. The introduction of voice activated artificially intelligent, or voice-AI, assistants has created a new kind of challenging interlocutor, which speakers assume to require more effortful talk [7][8][9][10][11]. However, human interactions with these systems have thus far only been explored using speech recognition technologies. ...
Conference Paper
Full-text available
Speech adaptations occur frequently in the presence of perceived communication barriers. Modern technological advancements have brought with them new interlocutors for human speakers with the introduction of voice-AI assistants. Findings have shown that voice-AI-directed speech is characterised by an increase in vocal effort resulting from the presumed capabilities of these systems for understanding speech. However, studies focus solely on voice-AI assistants which perform speech recognition. In this study, we present an acoustic analysis of speaker interactions with two voice-AI systems with different goals (speech interpretation vs. speaker verification). Using f0 mean and range as acoustic correlates of vocal effort, we found that speakers show some evidence of increased vocal effort towards voice-AI systems regardless of final task, however, this is enhanced by speech intelligibility goals. This finding is interpreted to suggest that voice-AI-directed speech globally exhibits increased vocal effort, but task plays a clear role in the extent of this. (149 words)
... The patterns of entrainment in speech and language characteristics have been extensively analysed in native oral interactions, bringing insights on the cognitive aspects of inter-personal entrainment on various linguistic and para-linguistic levels, such as semantics [1], syntax [2,3,4], prosody, and phonetic realizations [5,6,7], or focusing on diverse social and psychological factors underlying entrainment [e.g. 8,9]. These studies show that entrainment is a complex phenomenon occurring on multiple levels of communication, depending on diverse underlying factors whose character is still not well understood. ...
Conference Paper
Full-text available
Speech entrainment (also known as alignment or accommodation) has been documented on various linguistic and paralinguistic levels but often with complex and sometimes conflicting outcomes. Hence, the understanding of the mechanism(s) underlying this complex behaviour in dyadic conversations is still limited. In an effort to increase this understanding, the current study tests the effect of language (L1 Slovak vs. L2 English) and task complexity (easy, medium, difficult) on local and global entrainment in f0 and intensity in a within-subject design. Pairs of undergraduates played a collaborative game of giving directions structured into three levels gradually rising in complexity. The results corroborate other recent findings in yielding complex and unexpected patterns, particularly failing to show the assumed link between cognitively easier tasks and greater entrainment.
... Linguistic alignment is also observed during human-computer interaction as well. A growing body of work has shown evidence that people also align their speech and language patterns toward technology, such as convergence toward pronunciation patterns of text-to-speech (TTS) voices (Lubold and Pon-Barry, 2014;Gessinger et al., 2017Gessinger et al., , 2021Be nu s et al., 2018;Cohn et al., , 2021aCohn et al., , 2021bZellou et al., 2021aZellou et al., , 2021b, as well as a computer's word choice (e.g., Branigan et al., 2011) or syntactic structure (e.g., Branigan et al., 2003;Cowan et al., 2015). ...
... For example, it has been found that, over the course of some types of interactions, individuals entrain their neural oscillations (Montague et al., 2002;Konvalinka & Roepstorff, 2012), their postural sway (Shockley et al., 2003) and other movements (Richardson et al., 2007;Schmidt & O'Brien, 1997). Individuals also align their speech patterns at the level of pronunciation (phonetic convergence; Pardo, 2006), word choice (lexical alignment; Branigan et al. 2011), grammatical structure (syntactic alignment; Branigan et al., 2007), and speaking rate (Manson et al. 2013), as well as coordinate the position of their gaze (Richardson & Dale, 2005). Furthermore, human individuals may imitate each other's facial expressions (McIntosh, 2006), postures, gestures, etc. (Chartrand & Van Baaren, 2009). ...
Article
Full-text available
Collective intelligence, broadly conceived, refers to the adaptive behavior achieved by groups through the interactions of their members, often involving phenomena such as consensus building, cooperation, and competition. The standard view of collective intelligence is that it is a distinct phenomenon from supposed individual intelligence. In this position piece, we argue that a more parsimonious stance is to consider all intelligent adaptive behavior as being driven by similar abstract principles of collective dynamics. To illustrate this point, we highlight how similar principles are at work in the intelligent behavior of groups of non-human animals, multicellular organisms, brains, small groups of humans, cultures, and even evolution itself. If intelligent behavior in all of these systems is best understood as the emergent result of collective interactions, we ask what is left to be called "individual intelligence"? We believe that viewing all intelligence as collective intelligence offers greater explanatory power and generality, and may promote fruitful cross-disciplinary exchange in the study of intelligent adaptive behavior.
... More cognitively oriented accounts draw on extensive lab-based research, which has shown that alignment in L1 processing emerges naturally during authentic interaction and affects all communication levels (phonology, lexicon, morphosyntax, gestures, and eye-gazes) see reviews by Pickering and Ferreira (2008) and Raissi et al. (2020), and special issues by Dell and Ferreira (2016) and Pickering and Brannigan (2019), as well as recent comprehensive theoretical account in Pickering and Garrod (2021). Psycholinguistic experiments suggest that L1 alignment draws on largely automatic and resource-free processes outside a speaker's awareness, J o u r n a l P r e -p r o o f that is, being largely implicit in nature (Pickering & Branigan, 1999;Dell & Ferreira, 2016) although beliefs about the interlocutor's communicative abilities and intentions might influence the extent of alignment (cf., Branigan et al., 2010;Branigan et al., 2011;Schoot et al., 2019). A prominent view on priming builds on the idea of residual lexicalist activation (Pickering & Branigan, 1998), roughly explained as: items of recent discourse are more active in explicit memory and thus processed faster and more easily than alternatives (Pickering & Ferreira, 2008). ...
... The variation in laughter alignment and contingent multimodal explicit responses over time in mothers might therefore be one of the features of caregivers' adaptation to the communicative development of their children, similarly to the well known characteristics of child directed speech [105,90,57]. The data presented also matches results from other studies suggesting that when interacting with simpler systems, e.g., virtual agents or robots, human behavioural alignment is particularly marked [13,12]. The same seems to apply also to very young children, partly motivated by the will to be at the same level and partly (even unconsciously) aiming to reinforce behaviour, offer explicit feedback, contingent response, and helping scaffolding a functional communication development. ...
Chapter
In the current work a brief overview of some studies conducted on laughter taking a multidisciplinary perspective will be presented. The integration of analyses of corpus data, theoretical and formal insights, behavioural experiments, machine learning methods, and developmental data, turned out to be fruitful to gain insight into laughter behaviour and on how its production contributes to our conversations. A crucial claim emerging from the studies presented is that laughter conveys propositional meaning interacting with other modalities, in a manner akin to other content bearing words. The implications that such results have for the implementations of more competent, from a semantic and pragmatic perspective, spoken dialogue systems will be outlined. Especially the qualitative and quantitative analysis of developmental data will offer the basis for the proposal of some specific applications.
... In extract 4, "." indicates a brief pause, "()" indicates contextual comments, "*" indicates paired instances of simultaneous talk, while more than one "-" stand for a combination of pauses. Extract 5 was manually transcribed by the authors. 2 Behaviour matching is not necessarily caused by automatic alignment and is sometimes used strategically to make the conversation smoother or easier to follow for interlocutors [10,11], but we will not focus on such cases here. 3 In the current and following sessions, we use the term alignment to refer to what we defined as situation model alignment, unless otherwise specified. ...
Article
Full-text available
In dialogue, speakers process a great deal of information, take and give the floor to each other, and plan and adjust their contributions on the fly. Despite the level of coordination and control that it requires, dialogue is the easiest way speakers possess to come to similar conceptualizations of the world. In this paper, we show how speakers align with each other by mutually controlling the flow of the dialogue and constantly monitoring their own and their interlocutors' way of representing information. Through examples of conversation, we introduce the notions of shared control, meta-representations of alignment and commentaries on alignment, and show how they support mutual understanding and the collaborative creation of abstract concepts. Indeed, whereas speakers can share similar representations of concrete concepts just by mutually attending to a tangible referent or by recalling it, they are likely to need more negotiation and mutual monitoring to build similar representations of abstract concepts. This article is part of the theme issue ‘Concepts in interaction: social engagement and inner experiences’.
... The variation in laughter alignment and contingent multimodal explicit responses over time in mothers might therefore be one of the features of caregivers' adaptation to the communicative development of their children, similarly to the well known characteristics of child directed speech [105,90,57]. The data presented also matches results from other studies suggesting that when interacting with simpler systems, e.g., virtual agents or robots, human behavioural alignment is particularly marked [13,12]. The same seems to apply also to very young children, partly motivated by the will to be at the same level and partly (even unconsciously) aiming to reinforce behaviour, offer explicit feedback, contingent response, and helping scaffolding a functional communication development. ...
... For example, it has been found that, over the course of some types of interactions, individuals entrain their neural oscillations (Montague et al., 2002;Konvalinka & Roepstorff, 2012), their postural sway (Shockley et al., 2003) and other movements (Richardson et al., 2007;Schmidt & O'Brien, 1997). Individuals also align their speech patterns at the level of pronunciation (phonetic convergence; Pardo, 2006), word choice (lexical alignment; Branigan et al. 2011), grammatical structure (syntactic alignment; Branigan et al., 2007), and speaking rate (Manson et al. 2013), as well as coordinate the position of their gaze (Richardson & Dale, 2005). Furthermore, human individuals may imitate each other's facial expressions (McIntosh, 2006), postures, gestures, etc. (Chartrand & Van Baaren, 2009). ...
Preprint
Full-text available
Collective intelligence, broadly conceived, refers to the adaptive behavior achieved by groups through the interactions of their members, often involving phenomena such as consensus building, cooperation, and competition. The standard view of collective intelligence is that it is a distinct phenomenon from supposed individual intelligence. In this position piece, we argue that a more parsimonious stance is to consider all intelligent adaptive behavior as being driven by similar abstract principles of collective dynamics. To illustrate this point, we highlight how similar principles are at work in the intelligent behavior of groups of non-human animals, multicellular organisms, brains, small groups of humans, cultures, and even evolution itself. If intelligent behavior in all of these systems is best understood as the emergent result of collective interactions, we ask what is left to be called “individual intelligence”? We believe that viewing all intelligence as collective intelligence offers greater explanatory power and generality, and may promote fruitful cross-disciplinary exchange in the study of intelligent adaptive behavior.
... We think it is striking that participants aligned no more, or even less, to non-native speakers' grammatical structures, given that the opposite may occur in speech production. Work in speech production has shown that interlocutors align with each other at different linguistic levels (e.g., temporal phonetic, lexical, syntactic; [70,71] and that this alignment is mediated by social factors, such as the perceptions both of one's own language proficiency and one's interlocutor [30,72,73], as well as how socially similar they are to one other [74]. This may reflect an explicit attempt to accommodate non-native speakers to achieve linguistic alignment while engaged in dialogue [75] and/or to serve social purposes, such as demonstrating liking of an interlocutor (see Communication Accommodation Theory: [76,77]). ...
Article
Full-text available
Comprehenders frequently need to adapt to linguistic variability between talkers and dialects. Previous research has shown, given repeated exposure to quasi-grammatical structures, comprehenders begin to perceive them as more grammatical (Luka & Barsalou 2005, Luka & Choi 2012). We examined whether grammatical acceptability judgements differ for native versus non-native speech. In an exposure phase, native English speakers listened to, retyped, and rated the grammaticality of quasi-grammatical sentences (e.g., What Emily is thankful for is that she is here ) spoken by a native or non-native speaker. In a subsequent test phase, participants rated additional sentences, some of which had the same structure as exposure sentences. Participants rated native-accented sentences as more grammatical, demonstrating a role for talker identity in perceptions of grammaticality. Furthermore, structures previously heard during the exposure phase were rated as more grammatical than novel unprimed structures, but only for the native speaker. Subset analyses suggest this effect is driven by speaker intelligibility, which holds implications for communication between native and non-native speakers.
... Many challenges reported in this review echo those discussed in related CA work. ASR errors are long-standing concerns with speech-based CAs [66] that can lead users to alter their speech patterns to increase comprehension [67]. Difficulties in fostering user engagement have been highlighted when CAs cease to perform their intended utility [68]. ...
Article
Full-text available
Background: Health care and well-being are 2 main interconnected application areas of conversational agents (CAs). There is a significant increase in research, development, and commercial implementations in this area. In parallel to the increasing interest, new challenges in designing and evaluating CAs have emerged. Objective: This study aims to identify key design, development, and evaluation challenges of CAs in health care and well-being research. The focus is on the very recent projects with their emerging challenges. Methods: A review study was conducted with 17 invited studies, most of which were presented at the ACM (Association for Computing Machinery) CHI 2020 conference workshop on CAs for health and well-being. Eligibility criteria required the studies to involve a CA applied to a health or well-being project (ongoing or recently finished). The participating studies were asked to report on their projects’ design and evaluation challenges. We used thematic analysis to review the studies. Results: The findings include a range of topics from primary care to caring for older adults to health coaching. We identified 4 major themes: (1) Domain Information and Integration, (2) User-System Interaction and Partnership, (3) Evaluation, and (4) Conversational Competence. Conclusions: CAs proved their worth during the pandemic as health screening tools, and are expected to stay to further support various health care domains, especially personal health care. Growth in investment in CAs also shows the value as a personal assistant. Our study shows that while some challenges are shared with other CA application areas, safety and privacy remain the major challenges in the health care and well-being domains. An increased level of collaboration across different institutions and entities may be a promising direction to address some of the major challenges that otherwise would be too complex to be addressed by the projects with their limited scope and budget.
... The variation in laughter mimicry and contingent multimodal explicit responses over time in mothers might therefore be one of the features of caregivers' adaptation to the communicative development of their children, similarly to the well-known characteristics of child-directed speech (Saxton, 2009). Our data also matches results from other studies suggesting that when interacting with simpler systems, e.g., virtual agents or robots, human behavioral alignment is particularly marked (Branigan et al., 2010(Branigan et al., , 2011. The same seems to also apply to very young children, partly motivated by the will to be at the same level and partly (even unconsciously) aiming to reinforce behavior, offer explicit feedback, contingent response, and helping to scaffold functional communication development. ...
Article
Full-text available
Laughter is a valuable means for communicating and engaging in interaction since the earliest months of life. Nevertheless, there is a dearth of work on how its use develops in early interactions—given its putative reflexive nature, it has often been disregarded from studies on pre-linguistic vocalizations. We provide a longitudinal characterization of laughter use analyzing interactions of 4 babies with their mothers at five time-points (12, 18, 24, 30, and 36 months). We show how child laughter is very distinct from mothers’ (and adults’ generally), in terms of frequency, duration, level of arousal displayed, overlap with speech, and responsiveness to others’ laughter. Notably, contrary to what might be expected, we observed that children laugh significantly less than their mothers, especially at the first time-points analyzed. We indeed observe an increasing developmental trajectory in the production of laughter overall and in the contingent multimodal response to mothers’ laughter, showing the child’s increasing attunement to the social environment, interest in others’ appraisals and mental states, and awareness of its communicative value. We also show how mothers’ contingent responses to child laughter change over time, going from high-frequency mimicry, to a lower rate of diversified multimodal responses, in line with the child’s neuro-psychological development. Our data support a dynamic view of dialogue where interactants influence each other bidirectionally and emphasizes the crucial communicative value of laughter. When language is not fully developed, laughter might be an early means, in its already fully available expressiveness, to hold the conversational turn and enable meaningful vocal contribution in interaction at the same level of the interlocutor. Our study aims to provide a benchmark for typical laughter development, since we believe it can be an early means, along with other commonly analyzed behaviors (e.g., smiling, gazing, pointing, etc.), to gain insight into early child neuro-psychological development.
... Impaired ability to develop and sustain social connections and fluent social communicative interactions is a hallmark of ASD, a genetically-based neurodevelopmental disorder characterized by the presence of repetitive behaviors and restricted interests 7 , as well as impairments in communication and distinct language domains, including prosody (e.g., intonation modulation 8 , volume modulation 9-11 , speech rhythm 12 and rate 13 ), lexico-semantics (i.e., word choice and meaning), and syntax (i.e., grammar) 7 . Entrainment across each of these language domains plays an important role in supporting the fluidity of social interactions and communication [14][15][16][17][18][19][20] , and when impaired can contribute to pervasive troubles in these areas (see Fig. 1 for schematic). ...
Article
Full-text available
Entrainment, the unconscious process leading to coordination between communication partners, is an important dynamic human behavior that helps us connect with one another. Difficulty developing and sustaining social connections is a hallmark of autism spectrum disorder (ASD). Subtle differences in social behaviors have also been noted in first-degree relatives of autistic individuals and may express underlying genetic liability to ASD. In-depth examination of verbal entrainment was conducted to examine disruptions to entrainment as a contributing factor to the language phenotype in ASD. Results revealed distinct patterns of prosodic and lexical entrainment in individuals with ASD. Notably, subtler entrainment differences in prosodic and syntactic entrainment were identified in parents of autistic individuals. Findings point towards entrainment, particularly prosodic entrainment, as a key process linked to social communication difficulties in ASD and reflective of genetic liability to ASD.
... Twenty-four participants (5 men; age: M = 22.25 years, SD = 2.9, range = 18-30 years) participated in the study. Our sample size was based on previous studies of alignment for infrequent names and syntactic structures (e.g., Branigan et al., 2011;Suffill et al., 2021). All participants were native speakers of French with normal or corrected-to-normal vision. ...
Article
Full-text available
In this study we investigated whether people conceptually align when performing a language task together with a robot. In a joint picture-naming task, 24 French native speakers took turns with a robot in naming images of objects belonging to fifteen different semantic categories. For a subset of those semantic categories, the robot was programmed to produce the superordinate, semantic category name (e.g., fruit) instead of the more typical basic-level name associated with an object (e.g., pear). Importantly, while semantic categories were shared between the participant and the robot (e.g., fruits), different objects were assigned to each of them (e.g., the object of ‘a pear’ for the robot and of ‘an apple’ for the participant). Logistic regression models on participants' responses revealed that they aligned with the conceptual choices of the robot, producing over the course of the experiment more superordinate names (e.g., saying ‘fruit’ to the picture of an ‘apple’) for those objects belonging to the same semantic category as where the robot produced a superordinate name (e.g., saying ‘fruit’ to the picture of a ‘pear’). These results provide evidence for conceptual alignment affecting speakers' word choices as a result of adaptation to the partner, even when the partner is a robot.
... Speakers align by imitating each other's pronunciation if they express positive views about each other (Babel, 2010(Babel, , 2012, and speakers who imitate their interlocutors also show greater comprehension and more favorable attitudes toward them (Adank et al., 2010(Adank et al., , 2013. Third, linguistic alignment is often interlocutor-centered, such that speakers reuse language forms based on the linguistic background, communicative need, and prior knowledge of their interaction partners (Bortfeld & Brennan, 1997;Branigan et al., 2011;Loy et al., 2020). ...
Article
Full-text available
Conversation is a co-constructed social activity, and interlocutors, including second language (L2) speakers, frequently align in their linguistic and nonlinguistic behaviors to create shared understanding. Given that L2 speakers make assumptions about their interlocutors, this exploratory study examines whether their perceptions about linguistic, socio-affective, and behavioral dimensions of interaction align. It also explores whether such alignment is related to their agreement about the success of the conversation. Eighty-four pairs of L2 English university students completed a 10-minute academic discussion task, subsequently rating each other’s comprehensibility, fluency, anxiety, motivation, and collaboration. At the end of a 30-minute session, they also assessed thecommunicative success of their conversational experience. Speakers were generally aligned in their evaluations of each other and in their perception of communicative success, with alignment operationalized as the difference between the partners’ scores. Although alignment in all dimensions of interaction was associated with perceived communicative success, collaboration had the strongest relationship (.40 or 16% shared variance). The findings provide preliminary evidence that L2 speakers’ alignment in perceived dimensions of interaction, particularly collaboration, is associated with their perceived communicative success.
Article
Voice-based assistants (VBAs) are transforming how we interact with technology. We rely heavily on VBAs (such as Siri, Alexa and Google Assistant) to manage our daily routines and tasks. The current study extends past research on blurting to the human–machine communication (HMC) context. An online survey (n = 185) was conducted to investigate blurting with a VBA. We analysed open-ended accounts of instances in which participants had blurted during conversations with a VBA. In addition, findings show that blurting with a VBA is related to the less sophisticated views of interpersonal arguments and higher argument seeking, verbal aggressiveness, psychological reactance, extraversion and neuroticism. Our findings suggest that although people blurt with VBAs, in line with the general Computers Are Social Actors paradigm, the reasons are unique to HMC.
Article
Adults are skilled at using language to construct/negotiate identity and to signal affiliation with others, but little is known about how these abilities develop in children. Clearly, children mirror statistical patterns in their local environment (e.g., Canadian children using zed instead of zee ), but do they flexibly adapt their linguistic choices on the fly in response to the choices of different peers? To address this question, we examined the effect of group membership on 7‐ to 9‐year‐olds' labeling of objects in a trivia game, exploring whether they were more likely to use a particular label (e.g., sofa vs. couch ) if members of their “team” also used that label. In a preregistered study, children ( N = 72) were assigned to a team (red or green) and were asked during experimental trials to answer questions—which had multiple possible answers (e.g., blackboard or chalkboard )—after hearing two teammates and two opponents respond to the same question. Results showed that children were significantly more likely to produce labels less commonly used by the community (i.e., dispreferred labels) when their teammates had produced those labels. Crucially, this effect was tied to group membership, and could not be explained by children simply repeating the most recently used label. These findings demonstrate how social processes (i.e., group membership) can guide linguistic variation in children.
Article
Listeners have a remarkable ability to adapt to novel speech patterns, such as a new accent or an idiosyncratic pronunciation. In almost all of the previous studies examining this phenomenon, the participating listeners had reason to believe that the speech signal was produced by a human being. However, people are increasingly interacting with voice-activated artificially intelligent (voice-AI) devices that produce speech using text-to-speech (TTS) synthesis. Will listeners also adapt to novel speech input when they believe it is produced by a device? Across three experiments, we investigate this question by exposing American English listeners to shifted pronunciations accompanied by either a ‘human’ or a ‘device’ guise and testing how this exposure affects their subsequent categorization of vowels. Our results show that listeners exhibit perceptual learning even when they believe the speaker is a device. Furthermore, listeners generalize these adjustments to new talkers, and do so particularly strongly when they believe that both old and new talkers are devices. These results have implications for models of speech perception, theories of human-computer interaction, and the interface between social cognition and linguistic theory.
Article
One of the challenges on developing intelligent tutoring systems in collaborative learning is to providing adaptive feedbacks with adequate facilitations. This study focuses on collaborative learning involving a knowledge integration activity, whereby learner dyads explain each other’s expert knowledge supported by a Pedagogical Conversational Agent. The goal of this paper was to investigate how collaborative process and learning gain can be determined by the degree to which learners synchronize their gaze (gaze recurrence) and use overlapping language (lexical alignment) during their interaction. This study conducted a laboratory-based eye-tracking experiment, wherein thirty-four learners’ gazes and oral dialogs were analyzed. Through this experiment, the author investigated how gaze recurrence and lexical alignment can predict collaborative learning process and learning gain. Multiple regression analysis was conducted, wherein learning performance was regressed on the two independent variables and shows how the model predicts both collaborative process and gain. The results also showed that both gaze recurrence and lexical overlap significantly predicted learning performance. These results indicate that the two variables might be useful for developing detection modules that enable a better understanding of learner-learner collaborative learning.
Article
Full-text available
Constructive interactions and knowledge integration activities are methods commonly used for learning; however, establishing successful coordination becomes a hurdle in computer-mediated collaborations. The development of systems to facilitate communication activities in such situations has been attempted, but models are still required for capturing learners’ interactions and detecting their quality. This study explored several types of verbal and nonverbal behaviors of learners that can be implemented while designing tutoring systems to effectively capture their interaction processes in scenarios where learners engage in collaborative learning mediated by a pedagogical conversational agent (PCA). This study focused on the degree of behavior recurrence of each speaker, which is considered suitable for observing levels of effectiveness. Specifically, this study focused on three indicators—gaze synchronization, language conformance, and emotional matching through facial expression—to establish a system-based index for measuring learners’ collaborative processes such as synchronization. This study experimentally examined the relationship between these indicators and the performance and process of collaborative learning among 44 learners while using PCA for facilitation. Subsequently, numerous dependent variables in the collaborative learning process were predicted using the three proposed indicators. However, no significant correlation was established between learning performance and the indicators used. These findings show that the recurrence of indicators is useful for estimating the collaborative learning process and that these indicators can be used in the development of learning support systems to trace learners’ achievements in successful interactions.
Article
Listeners have a remarkable ability to adapt to novel speech patterns, such as a new accent or an idiosyncratic pronunciation. In almost all of the previous studies examining this phenomenon, the participating listeners had reason to believe that the speech signal was produced by a human being. However, people are increasingly interacting with voice-activated artificially intelligent (voice-AI) devices that produce speech using text-to-speech (TTS) synthesis. Will listeners also adapt to novel speech input when they believe it is produced by a device? Across three experiments, we investigate this question by exposing American English listeners to shifted pronunciations accompanied by either a ‘human’ or a ‘device’ guise and testing how this exposure affects their subsequent categorization of vowels. Our results show that listeners exhibit perceptual learning even when they believe the speaker is a device. Furthermore, listeners generalize these adjustments to new talkers, and do so particularly strongly when they believe that both old and new talkers are devices. These results have implications for models of speech perception, theories of human-computer interaction, and the interface between social cognition and linguistic theory
Chapter
Professor Albert Costa (1972-2018) was one of the most influential scholars in the fields of psycholinguistics and bilingualism. This book provides a faithful look at the most relevant lines of research in which he worked during his academic career. Written by some of his close collaborators and friends, the book presents a coherent summary of the most relevant psycholinguistic theories on language processing and bilingualism, including critical reviews to current models of lexical access, the representation of cognate words, neurolinguistic models of bilingualism, cross-linguistic effects in bimodal bilinguals (sign language), prediction processes and linguistic alignment in bilinguals, the influence of foreign-language effects in social cognition and the effects of bilingualism in emotion and decision making processing. This volume is a tribute to Prof. Costa and his work, and is born from a deep love and respect for his way of approaching the science of multilingualism from a psycholinguistic perspective.
Article
Full-text available
Voice user interface (VUI) is widely used in developing intelligent products due to its low learning cost. However, most of such products do not consider the cognitive and language ability of elderly people, which leads to low interaction efficiency, poor user experience, and unfriendliness to them. Firstly, the paper analyzes the factors which influence the voice interaction behavior of elderly people: speech rate of elderly people, dialog task type, and feedback word count. And then, the voice interaction simulation experiment was designed based on the wizard of Oz testing method. Thirty subjects (M = 61.86 years old, SD = 7.16; 15 males and 15 females) were invited to interact with the prototype of a voice robot through three kinds of dialog tasks and six configurations of the feedback speech rate. Elderly people’s speech rates at which they speak to a person and to a voice robot, the feedback speech rates they expected for three dialog tasks were collected. The correlation between subjects’ speech rate and the expected feedback speech rate, the influence of dialog task type, and feedback word count on elderly people’s expected feedback speech rate were analyzed. The results show that elderly people speak to a voice robot with a lower speech rate than they speak to a person, and they expected the robot feedback speech rate to be lower than the rate they speak to the robot. There is a positive correlation between subjects’ speech rate and the expected speech rate, which implies that elderly people with faster speech rates expected a faster feedback speech rate. There is no significant difference between the elderly people’s expected speech rate for non-goal-oriented and goal-oriented dialog tasks. Meanwhile, a negative correlation between the feedback word count and the expected feedback speech rate is found. This study extends the knowledge boundaries of VUI design by investigating the influencing factors of voice interaction between elderly people and VUI. These results also provide practical implications for developing suitable VUI for elderly people, especially for regulating the feedback speech rate of VUI.
Article
The increased popularity of CUIs has motivated HCI work around specific approaches to research, design, and implementation, while also reflecting on these topics. However, current research is highly fragmented and lacks critical mass around topics such as theory, methods and design. Building this critical mass is a fundamentally multidisciplinary endeavour. CUIs involve language based interaction, either through speech or text, with another agent(s) or device(s). This type of interaction not only needs to engage with traditional HCI approaches, but also to embrace methods from communicative and social sciences. This is crucial for making progress towards human-centred conversational interfaces. Along with the recent ACM SIGCHI Conversational User Interfaces conference (ACM CUI), this special issue showcases research to further solidify the foundations of the field in these areas. Below we outline some key challenges faced by the field, describe the papers in this special issue, and then outline areas for future research.
Article
To date, a growing body of second language (L2) research has investigated linguistic alignment as a pedagogical intervention, focusing on L2 learners’ alignment behaviors in task-based interactions (e.g., Jung, YeonJoo, YouJin Kim & John Murphy. 2017. The role of task repetition in learning word-stress patterns through auditory priming tasks. Studies in Second Language Acquisition 39(2). 319–346; Kim, YouJin, YeonJoo Jung & Stephen Skalicky. 2019. Linguistic alignment, learner characteristics, and the production of stranded prepositions in relative clauses: Comparing FTF and SCMC contexts. Studies in Second Language Acquisition 41(5). 937–969). Linguistic alignment refers to a tendency where one speaker’s utterances align with particular language features of those of the other speaker in dialogue. The current study investigated how L2 speakers’ alignment behaviors differ in natural dialogues between L2-L1 and L2-L2 dyads in terms of language style (i.e., stylistic alignment) and the role of non-linguistic factors in the occurrence of stylistic alignment. The study analyzed a corpus of 360 texts using a computational tool. Results showed that stylistic alignment occurred to a greater extent in the L2-L2 dyad than in the L2-L1 dyad with respect to the word range, word frequency, word imageability, and proportion of bigrams proportion produced by the interlocutors. Furthermore, findings demonstrated the degree of stylistic alignment on each of the four selected lexical features was affected by numerous factors including age, group membership, nonnative speaker status, familiarity between interlocutors, and linguistic distance between L1 and L2. The effect of each factor on stylistic alignment in conversation is discussed in detail.
Chapter
Although conversational agents (CA) have increased in the control of smart devices, recent research has revealed that the frequency of interaction with agents decreases over time due to a gap between the user’s expectations and the actual experience. To reduce the gap, previous studies explored the mental model related to the user’s expectation for designing a CA through a verbal approach such as an interview, but this was insufficient because the mental model can contain abstract images that are difficult to express in words. Therefore, in this paper, we aim to understand user perceptions through a drawing approach. We asked 34 smart speaker users to draw what the CA looks like. We found that the participants drew not only the CA but also the environment surrounding the CA, and the perception of the environment influences the expectation and intimacy with the CA. Based on these findings, we suggest that environmental factors be considered significant in designing CA persona.KeywordsConversational agentsPersona of conversational agentsPersona designDrawing studyMental models
Article
First language (L1) interactants quickly develop a coordinated form of communication, reusing each other's linguistic choices and aligning to their partner (Pickering & Garrod, 2021). More recently, research became interested in second language (L2) alignment (cf., Kim & Michel, this issue). Earlier work has shown that both lexical and syntactic alignment can be found in L2 dialogues, with task type and context as potential mediating factors (e.g., Dao, Trofimovich & Kennedy, 2018). This study adds to the existing work on alignment in second language production by exploring task effects in English-Spanish teletandem conversations. Twenty-nine English-Spanish tandem pairs completed video-based free conversation and Spot-the-Difference tasks, alternating the language of communication: both participants acted as L2 learner and as L1 expert in turns. The 174 task performances were scrutinized for alignment by identifying the number of overlapping lexical and syntactic n-grams (cf., Michel & Smith, 2018). We compared alignment between paired students (i.e., real pairs) to ‘coincidental overlap’ in created conversations of randomly combined speaker pairs. Results showed significantly more alignment by real than random pairs, and more syntactic than lexical alignment, while task effects were mixed. We discuss our findings in light of telecollaborative task-based interaction as support for L2 development.
Article
The term ‘multimodality’ has come to take on several somewhat different meanings depending on the underlying theoretical paradigms and traditions, and the purpose and context of use. The term is closely related to embodiment, which in turn is also used in several different ways. In this paper, we elaborate on this connection and propose that a pragmatic and pluralistic stance is appropriate for multimodality. We further propose a distinction between first and second order effects of multimodality; what is achieved by multiple modalities in isolation and the opportunities that emerge when several modalities are entangled. This highlights questions regarding ways to cluster or interchange different modalities, for example through redundancy or degeneracy. Apart from discussing multimodality with respect to an individual agent, we further look to more distributed agents and situations where social aspects become relevant. In robotics, understanding the various uses and interpretations of these terms can prevent miscommunication when designing robots, as well as increase awareness of the underlying theoretical concepts. Given the complexity of the different ways in which multimodality is relevant in social robotics, this can provide the basis for negotiating appropriate meanings of the term at a case by case basis.
Article
Young children care what others think of them, but are these concerns specific to interactions with humans? Here we ask whether 4-year-old children engage in self-presentational behaviors even with a puppet. After failing to activate a toy in the presence of a puppet, children selectively demonstrated their success on the toy when the puppet was absent during their final success. This pattern was found when the puppet was treated as an agent capable of holding mental states (Exp.1), but not when it was treated as an object (Exp.2); we further explore the role of indirect, linguistic cues to the puppet’s agency (Exp.3). These results highlight the importance of social contexts, particularly how an entity is depicted by others, in eliciting self-presentational behaviors. We discuss how depiction of puppets may influence their effectiveness in developmental research, and the possibility of self-presentational concerns in children’s interactions with social robots and AI agents.
Article
In people with schizophrenia and related disorders, impairments in communication and social functioning can negatively impact social interactions and quality of life. In the present study, we investigated the cognitive basis of a specific aspect of linguistic communication—lexical alignment—in people with schizophrenia and bipolar disorder. We probed lexical alignment as participants played a collaborative picture-naming game with the experimenter, in which the two players alternated between naming a dual-name picture (e.g., rabbit/bunny) and listening to their partner name a picture. We found evidence of lexical alignment in all three groups, with no differences between the patient groups and the controls. We argue that these typical patterns of lexical alignment in patients were supported by preserved—and in some cases increased—bottom-up mechanisms, which balanced out impairments in top-down perspective-taking.
Article
Full-text available
When people in conversation refer repeatedly to the same object, they come to use the same terms. This phenomenon, called lexical entrainment, has several possible explanations. Ahistorical accounts appeal only to the informativeness and availability of terms and to the current salience of the object's features. Historical accounts appeal in addition to the recency and frequency of past references and to partner-specific conceptualizations of the object that people achieve interactively. Evidence from 3 experiments favors a historical account and suggests that when speakers refer to an object, they are proposing a conceptualization of it, a proposal their addressees may or may not agree to. Once they do establish a shared conceptualization, a conceptual pact, they appeal to it in later references even when they could use simpler references. Over time, speakers simplify conceptual pacts and, when necessary, abandon them for new conceptualizations.
Article
Full-text available
Two pairs of studies examined effects of perspective taking in communication, using a 2-stage methodology that first obtained people's estimates of the recognizability to others of specific stimuli (public figures and everyday objects) and then examined the effects of these estimates on message formulation in a referential communication task. Ss were good at estimating stimulus identifiability but were biased in the direction of their own knowledge. The amount of information in a referring expression varied inversely with the perceived likelihood that addressees could identify the target stimulus. However, effects were less strong than anticipated. Although communicators do take others' knowledge into account, the extent to which they do so involves a trade-off with other sorts of information in the communicative situation.
Article
Full-text available
Presents a standardized set of 260 pictures for use in experiments investigating differences and similarities in the processing of pictures and words. The pictures are black-and-white line drawings executed according to a set of rules that provide consistency of pictorial representation. They have been standardized on 4 variables of central relevance to memory and cognitive processing: name agreement, image agreement, familiarity, and visual complexity. The intercorrelations among the 4 measures were low, suggesting that they are indices of different attributes of the pictures. The concepts were selected to provide exemplars from several widely studied semantic categories. Sources of naming variance, and mean familiarity and complexity of the exemplars, differed significantly across the set of categories investigated. The potential significance of each of the normative variables to a number of semantic and episodic memory tasks is discussed. (34 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
Full-text available
Traditional accounts of language processing suggest that monologue – presenting and listening to speeches – should be more straightforward than dialogue – holding a conversation. This is clearly not the case. We argue that conversation is easy because of an interactive processing mechanism that leads to the alignment of linguistic representations between partners. Interactive alignment occurs via automatic alignment channels that are functionally similar to the automatic links between perception and behaviour (the so-called perception – behaviour expressway) proposed in recent accounts of social interaction. We conclude that humans are 'designed' for dialogue rather than monologue. Whereas many people find it difficult to present a speech or even listen to one, we are all very good at talking to each other. This might seem a rather obvious and banal observation, but from a cognitive point of view the apparent ease of conversation is paradoxical. The range and complexity of the information that is required in monologue (preparing and listening to speeches) is much less than is required in dialogue (holding a conversation). In this article we suggest that dialogue processing is easy because it takes advantage of a processing mechanism that we call 'interactive alignment'. We argue that interactive alignment is automatic and reflects the fact that humans are designed for dialogue rather than monologue. We show how research in social cognition points to other similar automatic alignment mechanisms.
Article
Full-text available
Abstract Referring expressions are thought to be tailored to the needs of the listener, even when those needs might be costly to assess, but tests of this claim seldom manipulate,listener’s and ,speaker’s ,knowledge independently. The design of the ,HCRC Map Task enables,us to ,do so. ,We examine ,two ,‘tailoring’ changes in repeated mentions of landmark names: faster articulation and ,simplified ,referring ,expressions. Articulation results replicate Bard et al. (2000), depending,only on what,the speaker has heard. Change between,mentions ,was ,no greater ,when ,it could ,be inferred that the listener could see the named,item (Expt 1), and no less when the listener explicitly denied ability to do so (Expt 2). Word duration fell for speaker- Given listener-New items (Expt 3). Reduction was unaffected by the repeater’s ability to see the mentioned landmark (Expt 4). In contrast, referential form was more,sensitive to both ,listener- (Expt 3) and speaker- knowledge,(Expt 4). The results conform ,most closely toa Dual Process model: fast, automatic, processes let the speaker-knowledge prime word articulation, while costly assessments of listener-knowledge influence only referential form.
Article
Full-text available
In conversation, two people inevitably know different amounts about the topic of discussion, yet to make their references understood, they need to draw on knowledge and beliefs that they share. An expert and a novice talking with each other, therefore, must assess each other's expertise and accommodate to their differences. They do this in part, it is proposed, by assessing, supplying, and acquiring expertise as they collaborate in completing their references. In a study of this accommodation, pairs of people who were or were not familiar with New York City were asked to work together to arrange pictures of New York City landmarks by talking about them. They were able to assess each other's level of expertise almost immediately and to adjust their choice of proper names, descriptions, and perspectives accordingly. In doing so, experts supplied, and novices acquired, specialized knowledge that made referring more efficient.
Article
Full-text available
Investigated whether a counselor who was mirror imaging a congruent arm and leg position of a client would significantly increase the client's perception of the counselor's level of empathy over the level of the client's perception when the counselor did not mirror image congruent arm and leg position. 80 high school juniors met individually with a counselor for 15 min to discuss career plans. Three variables were controlled for: counselor's direct body orientation, position of counselor's head, and empathy level of the counselor's verbal responses. The dependent variable was the Empathy subscale of the Barrett-Lennard Relationship Inventory. ANOVA results showed that clients rated the counselor as having a significantly greater level of empathy in the congruent than in the noncongruent condition. (23 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Examined how native and non-native speakers adjust their referring expressions to each other in conversation. 20 Asian language speakers learning English were tested before and after conversations with native English speakers in which they repeatedly matched picture of common objects (Exp 1). Lexical entrainment was just as common in native/non-native pairs as in native/native pairs. People alternated director/matcher roles in the matching task; natives uttered more words than non-natives in the same roles. In Exp 2, 31 natives rated the pre- and post-test expressions for naturalness; non-natives' post-test expressions were more natural than their pre-test expressions. In Exp 3, 20 natives rated expressions from the transcribed conversations. Native expressions took longer to rate and were judged less natural-sounding when they were addressed to non-natives than to other natives. These results are consistent with H. H. Clark and D. Wilkes-Gibbs's (see record 1987-07185-001) principle of Least Collaborative Effort. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Four experiments in written and spoken dialogue tested the predictions of two distinct accounts of syntactic encoding in sentence production: a lexicalist, residual activation account and an implicit-learning account. Experiments 1 and 2 showed syntactic priming (i.e., the tendency to reuse the syntactic structure of a prime sentence in the production of a target sentence) and a lexical boost of syntactic priming (i.e., an enhanced priming effect when the verb in prime and target was the same). Experiments 3 and 4 varied the number of filler sentences between prime and target (lag) and showed that lexical enhancement of priming is short-lived, whereas the priming effect is much more long-lived. These results did not depend on whether the modality of prime and target was written or spoken. The persistence of priming supports the view that syntactic priming is a form of implicit learning. However, only a multi-factorial account, in which lexically-based, short-term mechanisms operate in tandem with abstract, longer-term learning mechanisms can explain the full pattern of results.
Article
Full-text available
The design of robust interfaces that process conversational speech is a challenging research direction largely because users' spoken language is so variable. This research explored a new dimension of speaker stylistic variation by examining whether users' speech converges systematically with the text-to-speech (TTS) heard from a software partner. To pursue this question, a study was conducted in which twenty-four 7 to 10-year-old children conversed with animated partners that embodied different TTS voices. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously adapt several basic acoustic-prosodic features of their speech 10--50&percnt;, with the largest adaptations involving utterance pause structure and amplitude. Children's speech adaptations were relatively rapid, bidirectional, and dynamically readaptable when introduced to new partners, and generalized across different types of users and TTS voices. Adaptations also occurred consistently, with 70--95&percnt; of children converging with their partner's TTS, although individual differences in magnitude of adaptation were evident. In the design of future conversational systems, users' spontaneous convergence could be exploited to guide their speech within system processing bounds, thereby enhancing robustness. Adaptive system processing could yield further significant performance gains. The long-term goal of this research is the development of predictive models of human-computer communication to guide the design of new conversational interfaces.
Article
Full-text available
A laboratory experiment examines the claims that (1) humans are susceptible to flattery from computers and (2) the effects of flattery from computers are the same as the effects of flattery from humans. In a cooperative task with a computer, subjects (N=41) received one of three types of feedback from a computer: “sincere praise”, “flattery” (insincere praise) or “generic feedback”. Compared to generic-feedback subjects, flattery subjects reported more positive affect, better performance, more positive evaluations of the interaction and more positive regard for the computer, even though subjects knew that the flattery from the computer was simply noncontingent feedback. Subjects in the sincere praise condition responded similarly to those in the flattery condition. The study concludes that the effects of flattery from a computer can produce the same general effects as flattery from humans, as described in the psychology literature. These findings may suggest significant implications for the design of interactive technologies.
Article
Full-text available
This study investigated the claim that humans will readily form team relationships with computers. Drawing from the group dynamic literature in human-human interactions, a laboratory experiment (n=56) manipulated identity and interdependence to create team affiliation in a human-computer interaction. The data show that subjects who are told they are interdependent with the computer affiliate with the computer as a team. The data also show that the effects of being in a team with a computer are the same as the effects of being in a team with another human: subjects in the interdependence conditions perceived the computer to be more similar to themselves, saw themselves as more cooperative, were more open to influence from the computer, thought the information from the computer was of higher quality, found the information from the computer friendlier, and conformed more to the computer's information. Subjects in the identity conditions showed neither team affiliation nor the effects of team affiliation.
Article
Full-text available
Can human beings relate to computer or television programs in the same way they relate to other human beings? Based on numerous psychological studies, this book concludes that people not only can but do treat computers, televisions, and new media as real people and places. Studies demonstrate that people are "polite" to computers; that they treat computers with female voices differently than "male" ones; that large faces on a screen can invade our personal space; and that on-screen and real-life motion can provoke the same physical responses. Using everyday language to engage readers interested in psychology, communication, and computer technology, Reeves and Nass detail how this knowledge can help in designing a wide range of media.
Article
Full-text available
Naming of a pictured object is substantially facilitated when the name has recently been produced in response to a definition or read aloud. The first experiment shows this to be so when over one hundred trials have intervened, and when the subjects can name the pictures quickly and accurately in the absence of priming. The locus of the effect must be in lexicalization processes subsequent to picture identification and is unlikely to be mediated by recovery of an episodic trace. Two further experiments show that prior production of a homophone of the object's name is not an effective prime, (although slower responses are somewhat facilitated when the homophones are spelled the same). Hence the facilitation observed for repeated production of the same word cannot be associated with the repetition of the phonological form per se. We conclude that the facilitation must be associated with retrieval of the semantic specification or the process of mapping of that specification to its associated phonological representation.
Article
Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...
Conference Paper
Social interaction can lead to various forms of accommodative behavior. This project examines convergence in the acoustic-phonetic attributes of conversational speech. Unacquainted same- and mixed-sex pairs of talkers participated in a conversational activity, the HCRC Map Task. The recordings were analyzed acoustically and excerpts from the recordings were submitted to perceptual similarity tests using independent listeners. In general, talkers converged in acoustic- phonetic attributes of speech, but convergence was subtle and asymmetrical. The results indicate that conversational interaction leads to both convergence and divergence of different acoustic- phonetic parameters within a single social exchange.
Article
Investigated whether a counselor who was mirror imaging a congruent arm and leg position of a client would significantly increase the client's perception of the counselor's level of empathy over the level of the client's perception when the counselor did not mirror image congruent arm and leg position. 80 high school juniors met individually with a counselor for 15 min to discuss career plans. Three variables were controlled for: counselor's direct body orientation, position of counselor's head, and empathy level of the counselor's verbal responses. The dependent variable was the Empathy subscale of the Barrett-Lennard Relationship Inventory. ANOVA results showed that clients rated the counselor as having a significantly greater level of empathy in the congruent than in the noncongruent condition. (23 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
The chameleon effect refers to nonconscious mimicry of the postures, mannerisms, facial expressions, and other behaviors of one's interaction partners, such that one's behavior passively rind unintentionally changes to match that of others in one's current social environment. The authors suggest that the mechanism involved is the perception-behavior link, the recently documented finding (e.g., J. A. Bargh, M. Chen, & L. Burrows, 1996) that the mere perception of another' s behavior automatically increases the likelihood of engaging in that behavior oneself Experiment 1 showed that the motor behavior of participants unintentionally matched that of strangers with whom they worked on a task. Experiment 2 had confederates mimic the posture and movements of participants and showed that mimicry facilitates the smoothness of interactions and increases liking between interaction partners. Experiment 3 showed that dispositionally empathic individuals exhibit the chameleon effect to a greater extent than do other people.
Article
The design of robust interfaces that process conversational speech is a challenging research direction largely because users' spoken language is so variable. This research explored a new dimension of speaker stylistic variation by examining whether users' speech converges systematically with the text-to-speech (TTS) heard from a software partner. To pursue this question, a study was conducted in which twenty-four 7 to 10-year-old children conversed with animated partners that embodied different TTS voices. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously adapt several basic acoustic-prosodic features of their speech 10--50p, with the largest adaptations involving utterance pause structure and amplitude. Children's speech adaptations were relatively rapid, bidirectional, and dynamically readaptable when introduced to new partners, and generalized across different types of users and TTS voices. Adaptations also occurred consistently, with 70--95p of children converging with their partner's TTS, although individual differences in magnitude of adaptation were evident. In the design of future conversational systems, users' spontaneous convergence could be exploited to guide their speech within system processing bounds, thereby enhancing robustness. Adaptive system processing could yield further significant performance gains. The long-term goal of this research is the development of predictive models of human-computer communication to guide the design of new conversational interfaces.
Article
Three experiments are reported, dealing with a situation in which subjects carried on a dialogue, via a computer terminal, with what they believed to be either a computer system or another person. Analysis of the ensuing protocols concentrated on the use of anaphor and on lexical choice. Systematic stylistic variations result from placing subjects in a situation where they believe their interlocutor to be a computer system. Subjects in this condition sustain a dialogue with a restricted window of focussed content and, as a result, compared to person-to-person dialogues, utterances are short; lexical choice is kept to a negotiated minimum; and the use of pronominal anaphor is minimised, even when referring to the discourse topic. These dialogue characteristics persist over lengthy periods of interaction. Lack of evidence to support presuppositions concerning the capabilities of a defined computer source does not lead to a change in style. Similarly, attempts to manipulate features of computer output by producing more friendly surface forms did not influence subjects' behaviour. However, within the limits of the measures taken, subjects learned as much and performed as well, or better, in interactions defined as being with a computer.
Article
Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Implementation involves repeated calls to normal theory procedures for REML estimation in variance components problems. By means of informal mathematical arguments, simulations and a series of worked examples, we conclude that PQL is of practical value for approximate inference on parameters and realizations of random effects in the hierarchical model. The applications cover overdispersion in binomial proportions of seed germination; longitudinal analysis of attack rates in epilepsy patients; smoothing of birth cohort effects in an age-cohort model of breast cancer incidence; evaluation of curvature of birth cohort effects in a case-control study of childhood cancer and obstetric radiation; spatial aggregation of lip cancer rates in Scottish counties; and the success of salamander matings in a complicated experiment involving crossing of male and female effects. PQL tends to underestimate somewhat the variance components and (in absolute value) fixed effects when applied to clustered binary data, but the situation improves rapidly for binomial observations having denominators greater than one.
Article
People design what they say specifically for their conversational partners, and they adapt to their partners over the course of a conversation. A comparison of keyboard conversations involving a simulated computer partner (as in a natural language interface) with those involving a human partner (as in teleconferencing) yielded striking differences and some equally striking similarities. For instance, there were significantly fewer acknowl- edgments in human/computer dialogue than in human/human. However, regardless of the conversational partner, people expected connectedness across conversational turns. In ad- dition, the style of a partner's response shaped what people subsequently typed. These results suggest some issues that need to be addressed before a natural language computer interface will be able to hold up its end of a conversation.
Article
All language is, to a varying extent, poetic. Investigating the relationship between conversational and literary discourse illuminates the workings of conversation. Past research suggests the pervasiveness of repetition, and its significance in questioning prior theoretical and methodological assumptions. Repetition functions in production, comprehension, connection, and interaction. The congruence of these levels provides a fourth, over-arching function in COHERENCE, which builds on and creates interpersonal involvement. Examples illustrate the pervasiveness, functions, and automatic nature of repetition in taped, transcribed conversation-supporting a view of discourse as relatively pre-patterned, rather than generated. Repetition is a resource by which speakers create a discourse, a relationship, and a world.
Article
A speech accommodation theory explanation for the interaction between a receiver's decoding ability and a speaker's voice tone on compliance with requests for help was tested. It was predicted that good decoders would speak faster than poor decoders. Speech accommodation theory predicts that given this speech style difference, good decoders would make more favorable interpretations of a fast request that converged toward their faster speech rate; whereas poor decoders would make more favorable interpretations of a slow request that converged toward their slower speech rate. Requests receiving more favorable evaluations should result in greater compliance, because compliance with requests for help was predicted to follow an identification process. An experiment involving 168 participants confirmed this explanation. Good decoders spoke faster than poor decoders. Moreover, good decoders rated the fast request as more intimate and immediate, while poor decoders rated the slow request as more intimate and immediate. Good decoders, in turn, complied more with the fast request, which they rated more intimate and immediate, whereas poor decoders complied more with the slow request, which they rated more intimate and immediate.
Article
This paper explores certain ways in which dialogue co-ordination skills develop in schoolchildren between 7 and 12 years of age. The analysis is based on a large corpus (from 80 pairs of speakers) of task-oriented dialogues elicited using Garrod and Anderson's (1987) cooperative maze game procedure. As in previous work with adult conversants, our analysis concentrates on how the speakers establish local description “languages” or schemes to describe their locations on the maze. The results indicate that there is strong pressure for speakers of all ages to co-ordinate on common description schemes. However, more detailed analysis of the dialogues suggests that there are two underlying principles of co-ordination operating. While the youngest conversants (7 to 8-year-olds) show evidence of superficial co-ordination processes establishing a limited common dialogue lexicon and particular common description scheme only the older speakers show evidence of deep co-ordination processes aimed at directly establishing the mutual intelligibility of the common scheme.
Article
Studied the extent to which 200 undergraduates' perceptions of diversity level (DL) and DL shifts were veridical. Ss completed 2 scales that measured their perceptions of the style of talk of 2 speakers interacting before an employment interview: Zander, whose lexical DL changed, and Johnson, whose lexical DL was constant. Results show that Ss were accurate as a group in perceiving changes in Zander's DL; Ss accurately perceived greater changes in Zander's style in large-magnitude as opposed to moderate-magnitude shifts. Ss appeared to more accurately estimate downward movement in DL. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The present studies were designed to test whether people are "polite" to computers. Among people, an interviewer who directly asks about him- or herself will received more positive and less varied responses than if the same question is posed by a 3rd party. Two studies were designed to determine if the same phenomenon occurs in human-computer interaction. In the 1st study, 30 Ss performed a task with a text-based computer and were then interviewed about the performance of that computer on 1 of 3 loci: (1) the same computer, (2) a pencil-and-paper questionnaire, or (3) a different (but identical) text-based computer. Consistent with the politeness prediction, same-computer participants evaluated the computer more positively and more homogeneously than did either pencil-and-paper or different computer participants. Study 2, with 30 participants, replicated the results with voice-based computers. ((c) 1999 APA/PsycINFO, all rights reserved)
Article
Speakers in conversation routinely engage in audience design. That is, they construct their utterances to be understood by particular addressees. Standard accounts of audi-ence design have frequently appealed to the notion of common ground. On this view, speakers produce well-designed utterances by expressly considering the knowledge they take as shared with addressees. This article suggests that conversational com-mon ground, rather than being a category of specialized mental representations, is more usefully conceptualized as an emergent property of ordinary memory pro-cesses. This article examines 2 separate but equally important processes: commonal-ity assessment and message formation. Commonality assessment involves the re-trieval of memory traces concerning what information is shared with an addressee, whereas message formation involves deciding how to use that information in conver-sation. Evidence from the CallHome English corpus of telephone conversations shows how each of these processes is rooted in basic aspects of human memory. The overall goal of this article is to demonstrate the need for a more cognitive psychologi-cal account of conversational common ground. Consider this excerpt from a conversation between two friends who have not spo-ken with each other for some time: (1) A: Oh first of all I have Shana's shower coming up that I have to do. B: Ah, that's right. A: That's going to be like a huge like three day effort with all the cooking and cleaning and like actually party [sic] that I have to do. B: Is there anyone you can get to help you? A: Um Jessica's going to help and Beth might because you see, Diane is here now. B: Oh okay. [#4913, 440.30] DISCOURSE PROCESSES, 40(1), 1–35 Copyright © 2005, Lawrence Erlbaum Associates, Inc.
Article
We report five experiments that investigate syntactic priming (Bock, 1986b) using a written completion task. Experiments 1 and 2 showed that priming occurs if the prime and target contain different verbs, but that stronger priming occurs if the verb is repeated. Experiment 1 also showed that priming occurs even if the detailed structure of prime and target differ. Experiments 3, 4, and 5 found that priming was unaffected by whether tense, aspect, or number of the verb stayed the same or differed between prime and target. We argue that these results provide evidence about the represen-tation of syntactic information within the lemma stratum. We use these results to extend the model proposed by Roelofs (1992, 1993). In particular, we argue that combinatorial information is phrasal in nature, is associated with the verb's lemma rather than a particular form of the verb, and is shared between different lemmas. © 1998 Academic Press In this paper, we are concerned with the representation of the syntactic information which underlies the ability to combine lexical entries to form complex structures in language production. There is substantial evidence that semantic and syntactic properties of lexical en-tries are accessed separately from phonological and morphological properties during language production. This evidence includes tip-of-the-tongue phenomena (Brown & McNeill,. In light of this evidence, Lev-elt, Roelofs, and Meyer (in press; cf. Levelt, 1989) argued that lexical entries include a lemma stratum, encoding syntactic information, and a word-form stratum, encoding morpholog-ical and phonological information. (Note that they assume that the lemma stratum does not include semantic information, in contrast to Kempen & Huijbers, 1983.) Below, we describe a model of the lemma stratum and present five experiments that use syntactic priming (Bock, 1986b) to test this model.
Article
Two experiments investigated the idea that mimicry leads to pro-social behavior. It was hypothesized that mimicking the verbal behavior of customers would increase the size of tips. In Experiment 1, a waitress either mimicked half her customers by literally repeating their order or did not mimic her customers. It was found that she received significantly larger tips when she mimicked her customers than when she did not. In Experiment 2, in addition to a mimicry- and non-mimicry condition, a baseline condition was included in which the average tip was assessed prior to the experiment. The results indicated that, compared to the baseline, mimicry leads to larger tips. These results demonstrate that mimicry can be advantageous for the imitator because it can make people more generous.
Article
Two people talking, as at a crowded party, may try to conceal all or part of what they mean from overhearers. To do this, it is proposed, they need to build what they wish to conceal on a private key, a piece of information, such as an event mentioned in an earlier conversation, that is common ground for the two of them and yet not inferable by the overhearers. What they say must be designed so that it cannot be understood without knowledge of that key. As evidence for the proposal, pairs of friends were required, as part of an arrangement task, to refer to familiar landmarks while concealing these references from overhearers. As predicted, the two of them used private keys, which they often concealed even further by using certain collaborative techniques. Still, the two partners weren't always successul.
Article
This study tested whether people can be shaped to use the vocabulary and phrase structure of a program's output in creating their own inputs. Occasional computer-users interacted with four versions of an inventory program ostensibly capable of understanding natural-language inputs. The four versions differed in the vocabulary and the phrase length presented on the subjects' computer screen. Within each version, the program's outputs were worded consistently and presented repetitively in the hope that subjects would use the outputs as a model for their inputs. Although not told so in advance, one-half of the subjects were restricted to input phrases identical to those used by their respective program (shaping condition), the other half were not (modeling condition). Additionally, one-half of the subjects communicated with the program by speaking, the other half by typing. The analysis of the verbal dependent variables revealed four noteworthy findings. First, users will model the length of a program's output. Second, it is easier for people to model and to be shaped to terse, as opposed to conversational, output phrases. Third, shaping users' inputs through error messages is more successful in limiting the variability in their language than is relying on them to model the program's outputs. Fourth, mode of communication and output vocabulary do not affect the degree to which modeling or shaping occur in person-computer interactions. Comparisons of pre- and post-experimental attitudes show that both restricted and unrestricted subjects felt significantly more positive toward computers after their interactions with the natural-language system. Other performance and attitude differences as well as implications for the development of natural-language processors are discussed.
Article
Speakers tend to repeat materials from previous talk. This tendency is experimentally established and manipulated in various question-answering situations. It is shown that a question's surface form can affect the format of the answer given, even if this form has little semantic or conversational consequence, as in the pair Q: (At) what time do you close. A: “(At)five o'clock.” Answerers tend to match the utterance to the prepositional (nonprepositional) form of the question. This “correspondence effect” may diminish or disappear when, following the question, additional verbal material is presented to the answerer. The experiments show that neither the articulatory buffer nor long-term memory is normally involved in this retention of recent speech. Retaining recent speech in working memory may fulfill a variety of functions for speaker and listener, among them the correct production and interpretation of surface anaphora. Reusing recent materials may, moreover, be more economical than regenerating speech anew from a semantic base, and thus contribute to fluency. But the realization of this strategy requires a production system in which linguistic formulation can take place relatively independent of, and parallel to, conceptual planning.
Conference Paper
ABSTRACT Weused,a Wizard-of-Oz paradigm ,to study ,effects of message,style on dialog,and on people's mental models ofcomputer,agents. People made ,airline reservations using a simulated ,reservation agent from which ,they received one of three message styles: Telegraphic, Fluent, or Anthropomorphic. The agent accepted any kind of language ,or command ,input people ,typed. When people took the initiative, they tended to model their inputs on the ,computer's messages. They expended,more effort in the Anthropomorphic ,than in the Fluent or Telegraphic ,conditions. We ,found no evidence that natural language messages ,caused higher expectations of intelligence than telegraphic messages. KEYWORDS: Natural language interfaces, error messages, agents, anthropomorphism, mental models
Conference Paper
In this work we examine user adaptation to a dialog system's choice of realiza- tion of task-related concepts. We ana- lyze forms of the time concept in the Let's Go! spoken dialog system. We find that users adapt to the system's choice of time form. We also find that user adaptation is affected by perceived system adapta- tion. This means that dialog systems can guide users' word choice and can adapt their own recognition models to gain im- proved ASR accuracy.
Article
A six-step, iterative, empirical human factors design methodology was used to develop CAL, a natural language computer application to help computer-naive business professionals manage their personal calenders. Input language is processed by a simple, nonparsing algorithm with limited storage requirements and a quick response time. CAL allows unconstrained English inputs from users with no training (except for a five minute introduction to the keyboard and display) and no manual (except for a two-page overview of the system). In a controlled test of performance, CAL correctly responded to between 86 percent and 97 percent of the storage and retrieval requests it received, according to various criteria. This level of performance could never have been achieved with such a simple processing model were it not for the empirical approach used in the development of the program and its dictionaries. The tools of the engineering psychologist are clearly invaluable in the development of user-friendly software, if that software is to accommodate the unruly language of computer-naive, first-time users. The key is elicit the cooperation of such users as partners in an iterative, empirical development process. 15 references.
Article
Participants took part in two speech tests. In both tests, a model speaker produced vowel-consonant-vowels (VCVs) in which the initial vowel varied unpredictably in duration. In the simple response task, participants shadowed the initial vowel; when the model shifted to production of any of three CVs (/pa/, /ta/ or /ka/), participants produced a CV that they were assigned to say (one of /pa/, /ta/ or /ka/). In the choice task, participants shadowed the initial vowel; when the model shifted to a CV, participants shadowed that too. We found that, measured from the model's onset of closure for the consonant to the participant's closure onset, response times in the choice task exceeded those in the simple task by just 26 ms. This is much shorter than the canonical difference between simple and choice latencies [100-150 ms according to Luce (1986)] and is near the fastest simple times that Luce reports. The findings imply rapid access to articulatory speech information in the choice task. A second experiment found much longer choice times when the perception-production link for speech could not be exploited. A third experiment and an acoustic analysis verified that our measurement from closure in Experiment 1 provided a valid marker of speakers' onsets of consonant production. A final experiment showed that shadowing responses are imitations of the model's speech. We interpret the findings as evidence that listeners rapidly extract information about speakers' articulatory gestures.
Article
There is strong evidence that we automatically simulate observed behavior in our motor system. Previous research suggests that this simulation process depends on whether we observe a human or a non-human agent. Measuring a motor priming effect, this study investigated the question of whether agent-sensitivity of motor simulation depends on the specific action observed. Participants saw pictures depicting end positions of different actions on a screen. All postures featured either a human or non-human agent. Participants had to produce the matching action with their left or right hand depending on the hand presented on the screen. Three different actions were displayed: a communicative action (emblem), a transitive (goal-directed) action and an intransitive action. We found motor priming effects of similar size for human and non-human agents for transitive and intransitive actions. However, the motor priming effect for communicative actions was present for the human agent, but absent for the non-human agent. These findings suggest that biological tuning of motor simulation is highly action-selective and depends on whether the observed behavior appears to be driven by a reasonable goal.
Article
For successful interpersonal communication, inferring intentions, goals or desires of others is highly advantageous. Increasingly, humans also interact with computers or robots. In this study, we sought to determine to what degree an interactive task, which involves receiving feedback from social partners that can be used to infer intent, engaged the medial prefrontal cortex, a region previously associated with Theory of Mind processes among others. Participants were scanned using fMRI as they played an adapted version of the Prisoner's Dilemma Game with alleged human and computer partners who were outside the scanner. The medial frontal cortex was activated when both human and computer partner were played, while the direct contrast revealed significantly stronger signal change during the human-human interaction. The results suggest a link between activity in the medial prefrontal cortex and the partner played in a mentalising task. This signal change was also present for to the computers partner. Implying agency or a will to non-human actors might be an innate human resource that could lead to an evolutionary advantage.
Article
Linear mixed-effects models are an important class of statistical models that are used directly in many fields of applications and also are used as iterative steps in fitting other types of mixed-effects models, such as generalized linear mixed models. The parameters in these models are typically estimated by maximum likelihood or restricted maximum likelihood. In general, there is no closed-form solution for these estimates and they must be determined by iterative algorithms such as EM iterations or general nonlinear optimization. Many of the intermediate calculations for such iterations have been expressed as generalized least squares problems. We show that an alternative representation as a penalized least squares problem has many advantageous computational properties including the ability to evaluate explicitly a profiled log-likelihood or log-restricted likelihood, the gradient and Hessian of this profiled objective, and an ECME update to refine this objective.
Article
Two pairs of studies examined effects of perspective taking in communication, using a 2-stage methodology that first obtained people's estimates of the recognizability to others of specific stimuli (public figures and everyday objects) and then examined the effects of these estimates on message formulation in a referential communication task. Ss were good at estimating stimulus identifiability but were biased in the direction of their own knowledge. The amount of information in a referring expression varied inversely with the perceived likelihood that addresses could identify the target stimulus. However, effects were less strong than anticipated. Although communicators do take others' knowledge into account, the extent to which they do so involves a trade-off with other sorts of information in the communicative situation.