Preprint

See what I'm saying? Comparing Intelligent Personal Assistant use for Native and Non-Native Language Speakers

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Limited linguistic coverage for Intelligent Personal Assistants (IPAs) means that many interact in a non-native language. Yet we know little about how IPAs currently support or hinder these users. Through native (L1) and non-native (L2) English speakers interacting with Google Assistant on a smartphone and smart speaker, we aim to understand this more deeply. Interviews revealed that L2 speakers prioritised utterance planning around perceived linguistic limitations, as opposed to L1 speakers prioritising succinctness because of system limitations. L2 speakers see IPAs as insensitive to linguistic needs resulting in failed interaction. L2 speakers clearly preferred using smartphones, as visual feedback supported diagnoses of communication breakdowns whilst allowing time to process query results. Conversely, L1 speakers preferred smart speakers, with audio feedback being seen as sufficient. We discuss the need to tailor the IPA experience for L2 users, emphasising visual feedback whilst reducing the burden of language production.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Conference Paper
Intelligent personal assistants (IPAs) are supposed to help us multitask. Yet the impact of IPA use on multitasking is not clearly quantified, particularly in situations where primary tasks are also language based. Using a dual task paradigm, our study observes how IPA interactions impact two different types of writing primary tasks; copying and generating content. We found writing tasks that involve content generation, which are more cognitively demanding and share more of the resources needed for IPA use, are significantly more disrupted by IPA interaction than less demanding tasks such as copying content. We discuss how theories of cognitive resources, including multiple resource theory and working memory, explain these results. We also outline the need for future work how interruption length and relevance may impact primary task performance as well as the need to identify effects of interruption timing in user and IPA led interruptions.
Full-text available
Preprint
Humanness is core to speech interface design. Yet little is known about how users conceptualise perceptions of humanness and how people define their interaction with speech interfaces through this. To map these perceptions n=21 participants held dialogues with a human and two speech interface based intelligent personal assistants, and then reflected and compared their experiences using the repertory grid technique. Analysis of the constructs show that perceptions of humanness are multidimensional, focusing on eight key themes: \textit{partner knowledge set, interpersonal connection, linguistic content, partner performance and capabilities, conversational interaction, partner identity and role, vocal qualities} and \textit{behavioral affordances}. Through these themes, it is clear that users define the capabilities of speech interfaces differently to humans, seeing them as more formal, fact based, impersonal and less authentic. Based on the findings, we discuss how the themes help to scaffold, categorise and target research and design efforts, considering the appropriateness of emulating humanness.
Full-text available
Article
Voice has become a widespread and commercially viable interaction mechanism with the introduction of voice assistants (VAs), such as Amazon’s Alexa, Apple’s Siri, Google Assistant, and Microsoft’s Cortana. Despite their prevalence, we do not have a detailed understanding of how these technologies are used in domestic spaces. To understand how people use VAs, we conducted interviews with 19 users, and analyzed the log files of 82 Amazon Alexa devices, totaling 193,665 commands, and 88 Google Home Devices, totaling 65,499 commands. In our analysis, we identified music, search, and IoT usage as the command categories most used by VA users. We explored how VAs are used in the home, investigated the role of VAs as scaffolding for Internet of Things device control, and characterized emergent issues of privacy for VA users. We conclude with implications for the design of VAs and for future research studies of VAs.
Full-text available
Article
Second/foreign language (L2) classrooms do not always provide opportunities for input and output practice [Lightbown, P. M. (2000). Classroom SLA research and second language teaching. Applied Linguistics, 21(4), 431–462]. The use of smart speakers such as Amazon Echo and its associated voice-controlled intelligent personal assistant (IPA) Alexa can help address this limitation because of its ability to extend the reach of the classroom, motivate practice, and encourage self-learning. Our previous study on the pedagogical use of Echo revealed that its use gave L2 learners ample opportunities for stress-free input exposure and output practice [Moussalli, S., & Cardoso, W. (2016). Are commercial ‘personal robots’ ready for language learning? Focus on second language speech. In S. Papadima-Sophocleous, L. Bradley, & S. Thouësny (Eds.), CALL communities and culture – short papers from EUROCALL 2016 (pp. 325–329). However, the results also suggested that beginner learners, depending on their levels of accentedness, experienced difficulties interacting with and being understood by Echo. Interestingly, this observation differs from findings involving human-to-human interactions, which suggest that a speaker’s foreign accent does not impede intelligibility. In this article, we report the results of a study that investigated Echo’s ability to recognize and process non-native accented speech at different levels of accentedness, based on the accuracy of its replies for a set of pre-established questions. Using a variety of analytical methods (i.e. judges’ ratings of learners’ pronunciation, learners’ ratings of Echo’s pronunciation, transcriptions of Echo’s interactions, surveys and interviews) and via a multidimensional analysis of the data collected, our results indicate that L2 learners have no problems understanding Echo and that it adapts well to their accented speech (Echo is comparable to humans in terms of comprehensibility and intelligibility). Our results also show that L2 learners use a variety of strategies to mitigate the communication breakdown they experienced with Echo.
Full-text available
Conference Paper
Social or interactive talk differs from task-based or instrumental interactions in many ways. Quantitative knowledge of these differences will aid the design of convincing human-machine interfaces for applications requiring machines to take on roles including social companions, healthcare providers, or tutors. We briefly review accounts of social talk from the literature. We outline a three part data collection of human-human, human-woz and human-machine dialogs incorporating light social talk and a guessing game. We finally describe our ongoing experiments on the corpus collected.
Full-text available
Article
This study aimed to examine the effect of robot assisted language learning (RALL) on the anxiety level and attitude in English vocabulary acquisition amongst Iranian EFL junior high school students. Forty-six female students, who were beginners at the age of 12, participated in this study and were randomly assigned into two groups of RALL (30 students) and non-RALL (16 students). The textbook, the materials, as well as the teacher were the same in the two groups. However in theRALLgroup, the treatmentwas given by a teacher accompanied by a humanoid robot assistant.Two questionnaires of anxiety and attitude were utilized to measure the students’ anxiety and attitude (Horwitz et al. 1986; Alemi and Alipour 2012). The results of descriptive and ttests indicated that there was lower anxiety and a more positive attitude towards English vocabulary acquisition in the RALL group compared with those in the non-RALL group. The study showed that the students in the RALL group had great fun in the learning process; they also believed theywere learning more effectively, which helped them boost their motivation in the long run. The present study provides new insights into the use of technology in the language classroom, suggesting that teachers and material developers should integrate technology into the language learning process.
Full-text available
Article
This study sought to determine the intelligibility of DECTalkTMsynthesized speech to both native and non-native speakers of English. The ability of the groups to understand DECTalk sentences delivered both in quiet and at a signal-to-noise (S/N) ratio of +10 dB was investigated. Results suggested that non-native speakers made significantly more errors transcribing DECTalk sentences than did native speakers. DECTalk was found to be significantly less intelligible in noise than in quiet for both groups, but non-native speakers experienced significantly more difficulty understanding DECTalk in noise than did native speakers. Implications of these findings and the need for further research are discussed.
Full-text available
Article
Thematic analysis is a poorly demarcated, rarely acknowledged, yet widely used qualitative analytic method within psychology. In this paper, we argue that it offers an accessible and theoretically flexible approach to analysing qualitative data. We outline what thematic analysis is, locating it in relation to other qualitative analytic methods that search for themes or patterns, and in relation to different epistemological and ontological positions. We then provide clear guidelines to those wanting to start thematic analysis, or conduct it in a more deliberate and rigorous way, and consider potential pitfalls in conducting thematic analysis. Finally, we outline the disadvantages and advantages of thematic analysis. We conclude by advocating thematic analysis as a useful and flexible method for qualitative research in and beyond psychology.
Full-text available
Conference Paper
Many vehicles today are equipped with navigation systems, and all of these systems use speech or a combination of speech and graphics to provide drivers with directions to their destinations. This study investigates the effect of gender of voice when providing driving instructions in English to drivers that are non-native speakers of English. In a 2(native/non-native) by 2(gender of voice) between participant study, 40 participants in age group 18-25 drove in a driving simulator for 25 minutes with navigation information system that gave drivers directions to a set destination. Results show that gender of voice did not affect native English speaking drivers. For non-native speakers, however, a female voice worked better for both female and male drivers. Non-native speakers consistently missed to act on navigational information give by the male voice. Design implications for voice systems are discussed.
Full-text available
Article
The authors induced tip-of-the-tongue states (TOTs) for English words in monolinguals and bilinguals using picture stimuli with cognate (e.g., vampire, which is vampiro in Spanish) and noncognate (e.g., funnel, which is embudo in Spanish) names. Bilinguals had more TOTs than did monolinguals unless the target pictures had translatable cognate names, and bilinguals had fewer TOTs for noncognates they were later able to translate. TOT rates for the same targets in monolinguals indicated that these effects could not be attributed to target difficulty. Two popular TOT accounts must be modified to explain cognate and translatability facilitation effects, and cross-language interference cannot explain bilinguals' increased TOTs rates. Instead the authors propose that, relative to monolinguals, bilinguals are less able to activate representations specific to each language.
Full-text available
Conference Paper
This study examines the effects of interacting with voice interfaces in an ingroup or an outgroup accent, for both native and non-native but competent English speakers. In a balanced, between-subjects experiment, (N = 96), ingroup and outgroup participants were randomly paired with one of two types of computer speech outputs: 1) computer speech output accent which matched the participants' accent or 2) computer speech output accent which mismatched with the participants' accent. The content of the output was identical in all the conditions. Participants' matched with accents similar to their own showed strong similarity-attraction effects. The matched users 1) disclosed socially undesirable behaviors they engage in, to a much larger extent, 2) found the interviewer to be endowed with more socially rich attributes, 3) perceived the interviewer to be more sociable. In short, similarity of accent is more important than `correctness'of the accent when interacting with a computer. We discuss implications of these results for HCI design. Keywords: Cross-cultural communication, speech interfaces, native and foreign accents, similarity-attraction effect.
Article
Speech interfaces are growing in popularity. Through a review of 99 research papers this work maps the trends, themes, findings and methods of empirical research on speech interfaces in the field of human–computer interaction (HCI). We find that studies are usability/theory-focused or explore wider system experiences, evaluating Wizard of Oz, prototypes or developed systems. Measuring task and interaction was common, as was using self-report questionnaires to measure concepts like usability and user attitudes. A thematic analysis of the research found that speech HCI work focuses on nine key topics: system speech production, design insight, modality comparison, experiences with interactive voice response systems, assistive technology and accessibility, user speech production, using speech technology for development, peoples’ experiences with intelligent personal assistants and how user memory affects speech interface interaction. From these insights we identify gaps and challenges in speech research, notably taking into account technological advancements, the need to develop theories of speech interface interaction, grow critical mass in this domain, increase design work and expand research from single to multiple user interaction contexts so as to reflect current use contexts. We also highlight the need to improve measure reliability, validity and consistency, in the wild deployment and reduce barriers to building fully functional speech interfaces for research. RESEARCH HIGHLIGHTS Most papers focused on usability/theory-based or wider system experience research with a focus on Wizard of Oz and developed systems Questionnaires on usability and user attitudes often used but few were reliable or validated Thematic analysis showed nine primary research topics Challenges identified in theoretical approaches and design guidelines, engaging with technological advances, multiple user and in the wild contexts, critical research mass and barriers to building speech interfaces
Chapter
Interactions with speech interfaces are growing, helped by the advent of intelligent personal assistants like Amazon Alexa and Google Assistant. This software is utilised in hardware such as smart home devices (e.g. Amazon Echo and Google Home), smartphones and vehicles. Given the unprecedented level of spoken interactions with machines, it is important we understand what is considered appropriate, desirable and attractive computer speech. Previous research has suggested that the overuse of humanlike voices in limited-communication devices can induce uncanny valley effects—a perceptual tension arising from mismatched stimuli causing incongruence between users’ expectations of a system and its actual capabilities. This chapter explores the possibility of verbal uncanny valley effects in computer speech by utilising the interpersonal linguistic strategies of politeness, relational work and vague language. This work highlights that using these strategies can create perceptual tension and negative experiences due to the conflicting stimuli of computer speech and ‘humanlike’ language. This tension can be somewhat moderated with more humanlike than robotic voices, though not alleviated completely. Considerations for the design of computer speech and subsequent future research directions are discussed.
Conference Paper
The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication experiment to identify the impact of accented speech on lexical choice. In particular, we focus on whether accented speech may encourage the use of lexical alternatives that are relevant to a partner's accent, and how this is may vary when in dialogue with a human or machine. We find that people are more likely to use American English terms when speaking with a US accented partner than an Irish accented partner in both human and machine conditions. This lends support to the proposal that synthesis design can influence partner perception of lexical knowledge, which in turn guide user's lexical choices. We discuss the findings with relation to the nature and dynamics of partner models in human machine dialogue.
Conference Paper
Amazon's Echo, and Apple's Siri have drawn attention from different user groups; however, these existing commercial VUIs support limited language options for users including native English speakers and non-native English speakers. Also, the existing literature about usability differences between these two distinct groups is limited. Thus, in this study, we conducted a usability study of the Google Home Smart Speaker with 20 participants including native English and non-native English speakers to understand their differences in using the Google Home Smart Speaker. The findings show that compared with their counterparts, the native English speakers had better and more positive user experiences in interacting with the device. It also shows that users' English language proficiency plays an important role in interacting with VUIs. The findings from this study can create insights for VUI designers and developers for implementing multiple language options and better voice recognition algorithms in VUIs for different user groups across the world.
Conference Paper
This paper reports on findings from a pilot usability study of the Google Home Smart Speaker undertaken with native English speakers and non-native English speakers to understand the differences in usability and user experiences of the two distinct groups. The study shows that, while both user groups felt satisfied in their use of the device, native English speakers had a better user experience overall than their counterparts. Importantly, preliminary findings from the study demonstrate that cultural distinctions in English expression and system engagement may be as significant to usability as English language proficiency. The findings provide a baseline for the next stages of this research and insights for developers and researchers in the design and use of Voice User Interfaces.
Conference Paper
Voice User Interfaces (VUIs) are becoming ubiquitously available, being embedded both into everyday mobility via smartphones, and into the life of the home via ‘assistant’ devices. Yet, exactly how users of such devices practically thread that use into their everyday social interactions remains underexplored. By collecting and studying audio data from month-long deployments of the Amazon Echo in participants’ homes—informed by ethnomethodology and conversation analysis—our study documents the methodical practices of VUI users, and how that use is accomplished in the complex social life of the home. Data we present shows how the device is made accountable to and embedded into conversational settings like family dinners where various simultaneous activities are being achieved. We discuss how the VUI is finely coordinated with the sequential organisation of talk. Finally, we locate implications for the accountability of VUI interaction, request and response design, and raise conceptual challenges to the notion of designing ‘conversational’ interfaces.
Article
The proliferation of smartphones has given rise to intelligent personal assistants (IPAs), software that helps users accomplish day-to-day tasks. However, little is known about IPAs in the context of second language (L2) learning. Therefore, the primary objectives of this case study were twofold: to assess the ability of Amazon's IPA, Alexa, to understand L2 English utterances and to investigate student opinions of the IPA. Four university students of English as a foreign language (EFL) in Japan participated in the study, which involved each participant interacting with Alexa in a 20-min session. Three sets of data were collected and analyzed to achieve the study's aims: learner-generated command performance, interactive storytelling performance, and interviews. The quantitative results showed that Alexa accurately understood only 50% of learner commands, whereas comprehensibility during the interactive storytelling skill, Earplay, was much higher (90%). Three themes were identified from the interviews based on criteria developed by Hubbard (2009): hindered learner efficiency due to the lack of first language (L1) support, improved learner effectiveness through indirect pronunciation feedback, and better access to conversational opportunities. These findings demonstrate that EFL learners perceive Alexa to be a potentially useful tool to enhance language learning and underscore the need for additional (L2) research of IPAs.
Article
Non-native English speakers (NNESs) often search in English due to the limited availability of information in their native language on the Web. Information seeking in a non-native language can present special challenges for users. Current research literature on non-native language search behavior is insufficient and even less is known about how online systems and tools may accommodate NNESs’ needs and assist their behaviors. To gain a better understanding of user behavior and the search process of NNESs, this paper presents a study of online searching in English as a foreign language (EFL) or second-language (L2). Particular attention is paid to language selection, search challenges, query formulation and reformulation, as well as user interaction with online systems and tools. Results from eight focus groups (36 participants) and 36 questionnaires indicate NNESs face a unique set of challenges that may not be present for native speakers when searching for information in English. A user interaction model is abstracted to address the iterative and spiral search process of NNESs. Implications for design of systems and tools to assist this particular user group are discussed.
Conference Paper
The past four years have seen the rise of conversational agents (CAs) in everyday life. Apple, Microsoft, Amazon, Google and Facebook have all embedded proprietary CAs within their software and, increasingly, conversation is becoming a key mode of human-computer interaction. Whilst we have long been familiar with the notion of computers that speak, the investigative concern within HCI has been upon multimodality rather than dialogue alone, and there is no sense of how such interfaces are used in everyday life. This paper reports the findings of interviews with 14 users of CAs in an effort to understand the current interactional factors affecting everyday use. We find user expectations dramatically out of step with the operation of the systems, particularly in terms of known machine intelligence, system capability and goals. Using Norman's 'gulfs of execution and evaluation' [30] we consider the implications of these findings for the design of future systems.
Article
This paper investigates the various waysspeakers manage problems and overcome difficulties in L2 communication. FollowingDörnyei and Scott (1997), we distinguish four main sources of L2 communicationproblems: (a) resource deficits, (b) processing time pressure, (c) perceived deficiencies inone's own language output, and (d) perceived deficiencies in the interlocutor'sperformance. In order to provide a systematic description of the wide range of copingmechanisms associated with these problem areas (e.g., communication strategies, meaningnegotiation mechanisms, hesitation devices, repair mechanisms), we adopt a psycholinguisticapproach based on Levelt's (1989, 1993, 1995) model of speech production.Problem-solving devices, then, are analyzed and classified according to how they are related tothe different pre- and post-articulatory phases of speech processing, and we illustrate the variousmechanisms by examples and retrospective comments taken from L2 learners' data.
Article
The growth of speech interfaces and speech interaction with computer partners has made it increasingly important to understand the factors that determine users’ language choices in human-computer dialogue. We report two controlled experiments that used a picture-naming-matching task to investigate whether users in human-computer speech-based interactions tend to use the same grammatical structures as their conversational partners, and whether such syntactic alignment can impact strong default grammatical preferences. We additionally investigate whether beliefs about system capabilities that are based on partner identity (i.e. human or computer) and speech interface design cues (here, voice anthropomorphism) affect the magnitude of syntactic alignment in such interactions. We demonstrate syntactic alignment for both dative structures (e.g., give the waitress the apple vs. give the apple to the waitress), where there is no strong default preference for one or other structure (Experiment 1), and noun phrase structures (e.g., a purple circle vs. a circle that is purple), where there is a strong default preference for one structure (Experiment 2). The tendency to align syntactically was unaffected by partner identity (human vs. computer) or voice anthropomorphism. These findings have both practical and theoretical implications for HCI by demonstrating the potential for spoken dialogue system behaviour to influence users’ syntactic choices in interaction. As well as verifying natural corpora findings, this work also highlights that priming and cognitive mechanisms that are unmediated by beliefs about partner identity could be important in understanding why people align syntactically in human-computer dialogue.
Conference Paper
Readers face many obstacles on today's Web, including distracting content competing for the user's attention and other factors interfering with comfortable reading. On today's primarily English-language Web, non-native readers encounter even more problems, even if they have some fluency in English. In this paper, we focus on the presentation of content and propose a new transformation method, Jenga Format, to enhance web page readability. To evaluate the Jenga Format, we conducted a user study on 30 Asian users with moderate English fluency and the results indicated that the proposed transformation method improved reading comprehension without negatively affecting reading speed. We also describe Froggy, a Firefox extension which implements the Jenga format.
Conference Paper
Blogs are an important platform for people to access and share information, particularly in corporate settings where users rely on these systems for their work. However, because a global enterprise is multilingual, not all employees can understand the shared information in these systems easily if the content is written in a user's non-native language. As a result, this research focuses on enhancing the readability of blogs in enterprise social software for this group of users. The pilot user study of Japanese and Chinese bloggers suggest there are two main challenges: finding an interesting blog post to read and encountering difficulties in reading blog posts as currently rendered. Based on these findings, we designed and implemented a Firefox extension, Clearly, which uses web customization techniques to improve these two levels of readability issues.
Article
This paper reports a series of investigations, which aim to test the appropriateness of voice recognition as an interaction method for mobile phone use. First, a KLM model was used in order to compare the speed of using voice recognition against using multi-tap and predictive text (the two most common methods of text entry) to interact with the phone menus and compose a text message. The results showed that speech is faster than the other two methods and that a combination of input methods provides the quickest task completion times. The first experiment used a controlled message creation task to validate the KLM predictions. This experiment also confirmed that the result was not due to a speed/accuracy trade off and that participants preferred to use the combination of input methods rather than a single method for menu interaction and text composition. The second experiment investigated the effect of limited visual feedback (when walking down the road or driving a car for example) on interaction, providing further evidence in support of speech as a useful input method. These experiments not only indicate the usefulness of voice in SMS input but also that users could also be satisfied with voice input in hands-busy, eyes-busy situations.
Article
We report an experiment designed to study whether models of human-human voice dialogues can be applied successfully to human-computer communication using natural spoken language. Two groups of six subjects were asked to obtain information about air travel via dialogue with a remote "travel agent". Subjects in the computer group were led to believe they were talking to a computer whereas subjects in the operator group were told they were talking to a human. Both groups of subjects actually talked to the same human experimenter. The study focuses on subjects' representations of interlocutor skill and knowledge, and differs from previous analogous studies in several respects: the task is more complex, giving rise to structured exchanges in natural language rather than to question/answer pairs in simplified language; specific attention has been paid to the design, which attempts to avoid biases that have flawed other studies (in particular, conditions are identical for both groups); the time factor has been taken into account (subjects take part in three sessions, at 1-week intervals).Some results confirm those of the literature, namely that subjects of the computer group tend to control and simplify their use of language more than those in the operator group. However, most observations are either new or in contradiction with previous results: subjects in the computer group produce more utterances but no significant differences were observed with respect to most structural and pragmatic features of language; the time factor plays a dual role. Subjects in both groups tend to become more concise. Operator group strategies differ significantly across sessions as regards scenario processing (problem solving) whereas computer group strategies remain stable.These differences in behavior between groups are ascribed to differences in representations of interlocutor ability.
Article
Five experiments examined the extent to which speakers' alignment (i.e., convergence) on words in dialog is mediated by beliefs about their interlocutor. To do this, we told participants that they were interacting with another person or a computer in a task in which they alternated between selecting pictures that matched their 'partner's' descriptions and naming pictures themselves (though in reality all responses were scripted). In both text- and speech-based dialog, participants tended to repeat their partner's choice of referring expression. However, they showed a stronger tendency to align with 'computer' than with 'human' partners, and with computers that were presented as less capable than with computers that were presented as more capable. The tendency to align therefore appears to be mediated by beliefs, with the relevant beliefs relating to an interlocutor's perceived communicative capacity.
Article
Three experiments are reported, dealing with a situation in which subjects carried on a dialogue, via a computer terminal, with what they believed to be either a computer system or another person. Analysis of the ensuing protocols concentrated on the use of anaphor and on lexical choice. Systematic stylistic variations result from placing subjects in a situation where they believe their interlocutor to be a computer system. Subjects in this condition sustain a dialogue with a restricted window of focussed content and, as a result, compared to person-to-person dialogues, utterances are short; lexical choice is kept to a negotiated minimum; and the use of pronominal anaphor is minimised, even when referring to the discourse topic. These dialogue characteristics persist over lengthy periods of interaction. Lack of evidence to support presuppositions concerning the capabilities of a defined computer source does not lead to a change in style. Similarly, attempts to manipulate features of computer output by producing more friendly surface forms did not influence subjects' behaviour. However, within the limits of the measures taken, subjects learned as much and performed as well, or better, in interactions defined as being with a computer.RésuméOn décrit trois expériences dans lesquelles des sujets poursuivent, par l'intermédiaire d'un terminal, un dialogue avec ce qu'ils croient être soit un ordinateur soit une autre personne. L'analyse des dialogues a porté de façon privilégiée sur deux facteurs: l'emploi de l'anaphore et le choix lexical. Il apparaît que, lorsque les sujets croient parler á un ordinateur, le style de leurs interventions varie systématiquement. Les sujets dans cette situation dialoguent avec une fenêtre restreinte de contenu focalisé, et donc leurs énoncés sont courts, par rapport à ce qui se passe dans le dialogue de personne à personne; le choix lexical est limité autant que possible; et l'on évite de recourir à l'anaphore pronominale, même lorsque c'est au thème du discours qu'il est fait référence. Ces caractéristiques perdurent lorsque l'interaction se prolonge. L'absence d'indices venant étayer les présuppositions initiales concernant les capacités de l'ordinateur n'entraîne pas de changement stylistique. De même, l'introduction dans les énoncés attribués à l'ordinateur de formes de surface plus amicales n'a pas eu d'influence sur le comportement des sujets. Toutefois, dans la limite de nos mesures, les sujets ont appris autant et la communication s'est déroulée aussi bien, voire mieux, lorsqu'ils interagissaient avec ce qu'ils croyaient être un ordinateur.
Article
Fragile error handling in recognition-based systems is a major problem that degrades their performance, frustrates users, and limits commercial potential. The aim of the present research was to analyze the types and magnitude of linguistic adaptation that occur during spoken and multimodal human-computer error resolution. A semiautomatic simulation method with a novel error-generation capability was used to collect samples of users' spoken and pen-based input immediately before and after recognition errors, and at different spiral depths in terms of the number of repetitions needed to resolve an error. When correcting persistent recognition errors, results revealed that users adapt their speech and language in three qualitatively different ways. First, they increase linguistic contrast through alternation of input modes and lexical content over repeated correction attempts. Second, when correcting with verbatim speech, they increase hyperarticulation by lengthening speech segments and pauses, and increasing the use of final falling contours. Third, when they hyperarticulate, users simultaneously suppress linguistic variability in their speech signal's amplitude and fundamental frequency. These findings are discussed from the perspective of enhancement of linguistic intelligibility. Implications are also discussed for corroboration and generalization of the Computer-elicited Hyperarticulate Adaptation Model (CHAM), and for improved error handling capabilities in next-generation spoken language and multimodal systems.
Article
Using synthesized and digitized speech in electronic communication devices may greatly benefit individuals who cannot produce intelligible speech. However, multiple investigations have demonstrated that synthesized speech is not always sufficiently intelligible for its listeners. Listening to synthesized speech may be particularly problematic for listeners for whom English is a second language. We compared native and non-native English-speaking adults' listening accuracy for English sentences in natural voice and synthesized voice conditions. Results indicated a disproportionate disadvantage for the non-native English-speaking group when listening to synthesized speech compared to their native English-speaking age peers. There was, however, significant variability in performance within the non-native English group, and this was strongly related to independent measures of English language skill. Specifically, a large portion of the variance in performance on the synthesized speech task was predicted by participants' receptive vocabulary scores.
What Makes a Good Conversation?
  • Leigh Clark
  • Nadia Pantidi
  • Orla Cooney
  • Philip Doyle
  • Diego Garaialde
  • Justin Edwards
  • Brendan Spillane
  • Christine Murad
  • Cosmin Munteanu
  • Vincent Wade
  • Benjamin R Cowan
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Christine Murad, Cosmin Munteanu, Vincent Wade, and Benjamin R. Cowan. 2019. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. arXiv:1901.06525 [cs] (Jan. 2019). https://doi.org/10.1145/3290605.3300705 arXiv: 1901.06525.
What can i help you with?: infrequent users' experiences of intelligent personal assistants
  • Nadia Benjamin R Cowan
  • David Pantidi
  • Kellie Coyle
  • Peter Morrissey
  • Sara Clarke
  • David Al-Shehri
  • Natasha Earley
  • Bandeira
Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can i help you with?: infrequent users' experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 43.
The competition model: Crosslinguistic studies of online processing
  • Antonella Devescovi
  • D' Simonetta
  • Amico
Antonella Devescovi and Simonetta D'Amico. 2004. The competition model: Crosslinguistic studies of online processing. In Beyond Nature-Nurture. Psychology Press, 215-242.
A Survey Investigating Usage of Virtual Personal Assistants
  • Mateusz Dubiel
  • Martin Halvey
  • Leif Azzopardi
Mateusz Dubiel, Martin Halvey, and Leif Azzopardi. 2018. A Survey Investigating Usage of Virtual Personal Assistants. CoRR abs/1807.04606 (2018). arXiv:1807.04606 http://arxiv.org/abs/1807.04606
Processes of Repair in Non-Native-Speaker Conversation
  • Barbara Hoekje
Barbara Hoekje. 1984. Processes of Repair in Non-Native-Speaker Conversation. (March 1984). https://eric.ed.gov/?id=ED250922
Are commercial 'personal robots' ready for language learning? Focus on second language speech. CALL communities and culture-short papers from
  • Souheila Moussalli
  • Walcir Cardoso
Souheila Moussalli and Walcir Cardoso. 2016. Are commercial 'personal robots' ready for language learning? Focus on second language speech. CALL communities and culture-short papers from EUROCALL (2016), 325-329.
Voice report: Consumer adoption of voice technology and digital assistants
  • Christie Olson
  • Kelli Kemery
Christie Olson and Kelli Kemery. 2019. 2019 Voice report: Consumer adoption of voice technology and digital assistants. Technical Report. Microsoft.
The effect of age and native speaker status on synthetic speech intelligibility
  • Catherine Watson
  • Wei Liu
  • Bruce Macdonald
Catherine Watson, Wei Liu, and Bruce MacDonald. 2013. The effect of age and native speaker status on synthetic speech intelligibility. In Eighth ISCA Workshop on Speech Synthesis.