Michelle Dana Cohn

Michelle Dana Cohn
University of California, Davis | UCD · Department of Linguistics

Ph.D. Linguistics

About

29
Publications
4,304
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
144
Citations

Publications

Publications (29)
Conference Paper
Full-text available
The current study tests subjects' vocal alignment toward female and male text-to-speech (TTS) voices presented via three systems: Amazon Echo, Nao, and Furhat. These systems vary in their physical form, ranging from a cylindrical speaker (Echo), to a small robot (Nao), to a human-like robot bust (Furhat). We test whether this cline of personificati...
Conference Paper
Full-text available
The current study explores the extent to which humans vocally align to digital device voices (i.e., Apple's Siri) and human voices. First, participants shadowed word productions by 4 model talkers: a female and a male digital device voice, and a female and a male real human voice. Second, an independent group of raters completed an AXB task assessi...
Conference Paper
Full-text available
This study tests the effect of cognitive-emotional expression in an Alexa text-to-speech (TTS) voice on users' experience with a social dialog system. We systematically introduced emotionally expressive interjections (e.g., "Wow!") and filler words (e.g., "um", "mhmm") in an Amazon Alexa Prize socialbot, Gunrock. We tested whether these TTS manipul...
Conference Paper
Full-text available
Humans are now regularly speaking to voice-activated artificially intelligent (voice-AI) assistants. Yet, our understanding of the cognitive mechanisms at play during speech interactions with a voice-AI, relative to a real human, interlocutor is an understudied area of research. The present study tests whether top-down guise of "apparent humanness"...
Article
Full-text available
The current study investigates the intelligibility of face-masked speech while manipulating speaking style, presence of visual information about the speaker, and level of background noise. Speakers produced sentences while in both face-masked and non-face-masked conditions in clear and casual speaking styles. Two online experiments presented the se...
Article
This study examined how speaking style and guise influence the intelligibility of text-to-speech (TTS) and naturally produced human voices. Results showed that TTS voices were less intelligible overall. Although using a clear speech style improved intelligibility for both human and TTS voices (using “newscaster” neural TTS), the clear speech effect...
Article
Full-text available
Millions of people engage in spoken interactions with voice activated artificially intelligent (voice-AI) systems in their everyday lives. This study explores whether speakers have a voice-AI-specific register, relative to their speech toward an adult human. Furthermore, this study tests if speakers have targeted error correction strategies for voi...
Article
Full-text available
This study tests whether individuals vocally align toward emotionally expressive prosody produced by two types of interlocutors: a human and a voice-activated artificially intelligent (voice-AI) assistant. Participants completed a word shadowing experiment of interjections (e.g., “Awesome”) produced in emotionally neutral and expressive prosodies b...
Conference Paper
Full-text available
The current study explores whether perception of coarticulatory vowel nasalization differs by speaker age (adult vs. child) and type of voice (naturally produced vs. synthetic speech). Listeners completed a 4IAX discrimination task between pairs containing acoustically identical (both nasal or oral) vowels and acoustically distinct (one oral, one n...
Article
Full-text available
The current study tests whether individuals (n = 53) produce distinct speech adaptations during pre-scripted spoken interactions with a voice-AI assistant (Amazon’s Alexa) relative to those with a human interlocutor. Interactions crossed intelligibility pressures (staged word misrecognitions) and emotionality (hyper-expressive interjections) as con...
Article
Two studies investigated the influence of conversational role on phonetic imitation toward human and voice-AI interlocutors. In a Word List Task, the giver instructed the receiver on which of two lists to place a word; this dialogue task is similar to simple spoken interactions users have with voice-AI systems. In a Map Task, participants completed...
Article
Full-text available
This study investigates the impact of wearing a fabric face mask on speech comprehension, an underexplored topic that can inform theories of speech production. Speakers produced sentences in three speech styles (casual, clear, positive-emotional) while in both face-masked and non-face-masked conditions. Listeners were most accurate at word identifi...
Article
Full-text available
This paper investigates users’ speech rate adjustments during conversations with an Amazon Alexa socialbot in response to situational (in-lab vs. at-home) and communicative (ASR comprehension errors) factors. We collected user interaction studies and measured speech rate at each turn in the conversation and in baseline productions (collected prior...
Article
This study investigates the perception of coarticulatory vowel nasality generated using different text-to-speech (TTS) methods in American English. Experiment 1 compared concatenative and neural TTS using a 4IAX task, where listeners discriminated between a word pair containing either both oral or nasalized vowels and a word pair containing one ora...
Article
Full-text available
Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adult...
Conference Paper
Full-text available
This study tests speech-in-noise perception and social ratings of speech produced by different text-to-speech (TTS) synthesis methods. We used identical speaker training datasets for a set of 4 voices (using AWS Polly TTS), generated using neural and concatenative TTS. In Experiment 1, listeners identified target words in semantically predictable a...
Conference Paper
Full-text available
The present study compares how individuals perceive gradient acoustic realizations of emotion produced by a human voice versus an Amazon Alexa text-to-speech (TTS) voice. We manipulated semantically neutral sentences spoken by both talkers with identical emotional synthesis methods, using three levels of increasing 'happiness' (0 %, 33 %, 66 % 'hap...
Conference Paper
Full-text available
Increasingly, people are having conversational interactions with voice-AI systems, such as Amazon's Alexa. Do the same social and functional pressures that mediate alignment toward human interlocutors also predict align patterns toward voice-AI? We designed an interactive dialogue task to investigate this question. Each trial consisted of scripted,...
Conference Paper
Full-text available
More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing 'social' presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these di...
Conference Paper
Full-text available
The current study explores whether the top-down influence of speaker age guise influences patterns of compensation for coarticulation. /u/-fronting variation in California is linked to both phonetic and social factors: /u/ in alveolar contexts is fronter than in bilabial contexts and /u/-fronting is more advanced in younger speakers. We investigate...
Preprint
Full-text available
In this study, we test two questions of how users perceive neural vs. concatenative text-to-speech (TTS): 1) does the TTS method influence speech intelligibility in adverse listening conditions? and 2) does a user’s ratings of the voice’s social attributes shape intelligibility? We used identical speaker training datasets for a set of 4 speakers (u...
Preprint
Full-text available
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis....
Conference Paper
Full-text available
We examined the phonetic realization of oral and pre-nasal /ae/ in speakers from three distinct sub-regions of California: Southern California, the Bay Area, and the Central Valley. Four acoustic variables were measured: two midpoint formant values (F1, F2), diphthongization, and acoustic nasality. Results show that speakers from the sub-regions ex...
Conference Paper
Full-text available
Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis....
Article
Full-text available
Does listeners’ musical experience improve their ability to perceive speech-in-speech? In the present experiment, musicians and nonmusicians heard two sentences played simultaneously: a target and a masker sentence that varied in terms of fundamental frequency (f0) separation. Results reveal that accuracy in identifying the target sentence was high...

Network

Cited By