Technical ReportPDF Available

Comma gets a cure: A diagnostic passage for accent study

Authors:
© 2000 Douglas N. Honorof, Jill McCullough & Barbara Somerville. All rights reserved.
Comma Gets A Cure
COPYRIGHT & CITATION: Comma Gets a Cure and derivative works may be used freely for any purpose
without special permission provided the present sentence and the following copyright notification accompany the
passage in print, if reproduced in print, and in audio format in the case of a sound recording: Copyright (©) 2000
Douglas N. Honorof, Jill McCullough & Barbara Somerville. All rights reserved.
Well, here's a story for you: Sarah Perry was a veterinary nurse who had been working daily at
an old zoo in a deserted district of the territory, so she was very happy to start a new job at a
superb private practice in North Square near the Duke Street Tower. That area was much
nearer for her and more to her liking. Even so, on her first morning, she felt stressed. She ate a
bowl of porridge, checked herself in the mirror and washed her face in a hurry. Then she put on
a plain yellow dress and a fleece jacket, picked up her kit and headed for work.
When she got there, there was a woman with a goose waiting for her. The woman gave Sarah
an official letter from the vet. The letter implied that the animal could be suffering from a rare
form of foot and mouth disease, which was surprising, because normally you would only expect
to see it in a dog or a goat. Sarah was sentimental, so this made her feel sorry for the beautiful
bird.
Before long, that itchy goose began to strut around the office like a lunatic, which made an
unsanitary mess. The goose's owner, Mary Harrison, kept calling, "Comma, Comma," which
Sarah thought was an odd choice for a name. Comma was strong and huge, so it would take
some force to trap her, but Sarah had a different idea. First she tried gently stroking the
goose's lower back with her palm, then singing a tune to her. Finally, she administered ether.
Her efforts were not futile. In no time, the goose began to tire, so Sarah was able to hold onto
Comma and give her a relaxing bath.
Once Sarah had managed to bathe the goose, she wiped her off with a cloth and laid her on her
right side. Then Sarah confirmed the vet's diagnosis. Almost immediately, she remembered an
effective treatment that required her to measure out a lot of medicine. Sarah warned that this
course of treatment might be expensive-either five or six times the cost of penicillin. I can't
imagine paying so much, but Mrs. Harrison-a millionaire lawyer-thought it was a fair price for
a cure.
ACKNOWLEDGMENTS: We thank the following for helpful comments on an early draft: Alice Faber of
Haskins Laboratories, Paul Meier of the University of Kansas, Rudy Troike of the University of Arizona, Ginny
Kopf of the University of Central Florida and Enid Parsons, Pronunciation Editor for the Random House
Dictionaries. All shortcomings remain the sole responsibility of the editor and authors. More information about a
number of the keywords used here (comma, cure, nurse, happy, start, north, square, near, face, dress, fleece, kit,
goose, letter, foot, mouth, goat, strut, choice, force, trap, palm, bath, cloth, lot, price) may be found in J. C. Wells
(1982) Accents of English, Vol. 1., Cambridge University Press. The editor acknowledges support from NIH Grant
DC-03782 to Haskins Laboratories during the preparation of the present technical report. Comments to: Douglas
N. Honorof (dialectdoug@gmail.com).
... While this is not ideal for a fine-grained acoustic analysis, it is reasonable for this study since it is better representation of the type of speech data a commercial ASR system encounters. All speech data is taken from talkers reading the passage " Comma Gets a Cure "[4], which was designed to include Wells Standard Lexical Set[5]. The discussion of acoustic dialectal differences observed in the data below shows that these are four distinct varieties of American English, and that they include phonetic features which have been established as being associated with these varieties in the sociophonetic literature. ...
... As can been seen inFigure 7, for both systems, error rates were lowest for white talkers as a group, and higher for African American and mixed race talkers. As with dialect, differences in WER between races were not significant for Bing (F[4,31]= 1.21, p = 0.36), but were significant for YouTube's automatic captions (F[4,34]= 2.86, p < 0.05). All talkers were native English speakers. ...
... As can been seen inFigure 7, for both systems, error rates were lowest for white talkers as a group, and higher for African American and mixed race talkers. As with dialect, differences in WER between races were not significant for Bing (F[4,31]= 1.21, p = 0.36), but were significant for YouTube's automatic captions (F[4,34]= 2.86, p < 0.05). All talkers were native English speakers. ...
... For the reading passage task, speakers read a modified version of "Comma gets a cure" 10,11 (Honorof et al. 2000). Speakers were asked to read the passage to themselves once before reading it aloud at their own pace. ...
... Reading passage: "Comma gets a cure" (revised from the original Honorof et al. 2000 passage) 1 Well, here's a story for you: Sarah Perry was a veterinary nurse who had been working daily at an old zoo in a deserted and dull part of town, so she was very happy to start a new job at a superb private practice around the bend from the light rail station. That area was much nearer to downtown and more to her liking. ...
Thesis
Full-text available
Research on social meaning, which links language variation to the wider social world, often bases claims about the social meanings of linguistic forms on production (i.e., speakers’ situational use of meaningful forms). In the case of the California Vowel Shift (CVS), an ongoing restructuring of the vowel system of California English that takes place below the level of conscious awareness, previous production research has suggested that the CVS carries social meanings of carefreeness, femininity, and privilege. Left unclear in these production-based claims is whether listeners actually pick up on and recognize the social meanings that speakers apparently utilize the CVS to transmit. In this research, a dialect recognition task with matched guises (California-shifted vs. conservative) forms the basis for exploring Californian listeners’ reactions to the CVS, and how these reactions are mediated by perceptions of dialect geography. In short, this research focuses on listeners’ reactions to the CVS in order to address a more fundamental question: How do listeners and speakers together participate in the construction of social meaning? Stimuli for the main study task were drawn from excerpts of sociolinguistic interviews with 12 lifelong California English speakers from three regions of the state: the San Francisco Bay Area, Lower Central Valley, and Southern California. Guises were created from interview excerpts by modifying the F2 of each TRAP and GOOSE token via source-filter resynthesis methods. Californian guises featured backed TRAP and fronted GOOSE; conservative guises featured fronted TRAP and backed GOOSE. Ninety-seven Californians participated in a perceptual task in which they attempted to identify speakers’ regional origin and rated speakers on affective scales. The results indicated that Californians recognize the CVS as Californian, as California-shifted guises were less likely to be identified as from outside California (but more likely to be identified as from Southern California). Listeners rated California-shifted guises higher on the scales Californian, sounds like a Valley girl, and confident, indicating a core of social meanings indexed by the CVS. Among listeners from the San Francisco Bay Area, the CVS indexes masculinity, but among Southern California listeners, the CVS indexes femininity. Listeners from across California also rated speakers who they believed to be from the same region as them higher on Californian, familiar, and sounds like me. This research demonstrates that the social meanings of linguistic forms do not reside only in speakers’ situational use of these forms, as listeners did not associate the CVS with carefreeness, femininity, or privilege, the social meanings of the CVS suggested by previous studies of California English production; instead, I propose an account of the indexical field that links perception and production by placing the core social meanings of the CVS uncovered by this research (Californian identity, sounding like a Valley girl, and confidence) at the center of the CVS’s indexical field. This research also contributes to theory in perceptual dialectology and language change. In order to explain this study’s finding that the CVS is associated with Southern California, this research introduces the perceptual-dialectological process of centrality: the identification of speakers who are believed to most exemplify the speech of a given region. Finally, this research suggests an attitudinal stance that allows changes from below such as the CVS to flourish: speakers are aware of the change in the community (at a tacit level, if not consciously) but do not believe that they are participating in the change.
... The speech sample stimuli were comprised of ten speakers reading the story Comma Gets a Cure (Honorof et al. 2000), focusing the listeners specifically on accent and avoiding the possible problems in comprehension. Although Mc-Kenzie (2010) highlighted the benefits of using spontaneous speech recordings as auditory stimuli, for this research, a scripted passage was selected to eliminate the influence of other lexical and grammatical variations (Martens 2020). ...
Article
Full-text available
This study explores the identification and evaluation of English accents by non-native English speakers, specifically Czech and Slovak undergraduate students majoring in English as a Foreign Language (EFL). The research aims to determine how these students perceive and rate ten English accents, including native and non-native varieties. Using questionnaires, the study examines the correlation between the ability to identify the speakers’ native language and the evaluation of their English pronunciation quality. The findings reveal that Czech and Slovak students generally share similar evaluations of English accents, with significant differences primarily in identifying and evaluating accents related to their native languages. This research contributes to understanding how related linguistic backgrounds influence the perception and judgment of English accents, providing insights for language teaching and accent training in EFL contexts.
... The first is the article, the following four news and the last five are the reading materials. Title Category Happiness is a Journey [7] article Microsoft says Teams and Xbox fixed in UK and Europe 3 news UK economy flatlines as higher interest rates bite 4 news Elon Musk tells Rishi Sunak AI will put an end to work 5 news US and China reach 'some agreements' on climate -John Kerry 6 news Voice and articulation drillbook [22] reading materials Comma gets a cure [36] reading materials The North Wind and the Sun [92] reading materials The Story of Arthur the Rat [84] reading materials Motor Speech Disorders [12] reading materials ...
Preprint
Speech enhancement is crucial in human-computer interaction, especially for ubiquitous devices. Ultrasound-based speech enhancement has emerged as an attractive choice because of its superior ubiquity and performance. However, inevitable interference from unexpected and unintended sources during audio-ultrasound data acquisition makes existing solutions rely heavily on human effort for data collection and processing. This leads to significant data scarcity that limits the full potential of ultrasound-based speech enhancement. To address this, we propose USpeech, a cross-modal ultrasound synthesis framework for speech enhancement with minimal human effort. At its core is a two-stage framework that establishes correspondence between visual and ultrasonic modalities by leveraging audible audio as a bridge. This approach overcomes challenges from the lack of paired video-ultrasound datasets and the inherent heterogeneity between video and ultrasound data. Our framework incorporates contrastive video-audio pre-training to project modalities into a shared semantic space and employs an audio-ultrasound encoder-decoder for ultrasound synthesis. We then present a speech enhancement network that enhances speech in the time-frequency domain and recovers the clean speech waveform via a neural vocoder. Comprehensive experiments show USpeech achieves remarkable performance using synthetic ultrasound data comparable to physical data, significantly outperforming state-of-the-art ultrasound-based speech enhancement baselines. USpeech is open-sourced at https://github.com/aiot-lab/USpeech/.
... This was followed -in an order which varied between sessions and participants -by a longer reading, The Rainbow Passage (long version) (Fairbanks, 1960) a timed picture description and two repetitions of three held vowels, /a/, /o/ and /i/. Participants completed the two readings from the first session and the held vowels, in their second and third sessions, with an additional long reading in each, one of Your Rate of Reading (Fairbanks, 1940) and Comma gets a Cure (Honorof et al., 2000). Elicitation task order was similarly varied between participants and sessions in the second and third sessions. ...
Preprint
Full-text available
Background Speech-based biomarkers have potential as a means for regular, objective assessment of symptom severity, remotely and in-clinic in combination with advanced analytical models. However, the complex nature of speech and the often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in studies investigating speech-based health assessment Objective To develop and apply an exemplar protocol to generate a pilot dataset of healthy speech with detailed metadata for the assessment of factors in the speech recording-analysis pipeline, including device choice, speech elicitation task and non-pathological variability. Methods We developed our collection protocol and choice of exemplar speech features based on a thematic literature review. Our protocol includes the elicitation of three different speech types. With a focus towards remote applications, we also choose to collect speech with three different microphone types. We developed a pipeline to extract a set of 14 exemplar speech features. Results We collected speech from 28 individuals three times in one day, repeated at the same times 8-11 weeks later, and from 25 healthy individuals three times in one week. Participant characteristics collected included sex, age, native language status and voice use habits of the participant. A preliminary set of 14 speech features covering timing, prosody, voice quality, articulation and spectral moment characteristics were extracted that provide a resource of normative values. Conclusions There are multiple methodological factors involved in the collection, processing and analysis of speech recordings. Consistent reporting and greater harmonisation of study protocols are urgently required to aid the translation of speech processing into clinical research and practice.
... One such metric related to consonant distribution and was defined so as to quantify the number of consonants present in a passage relative to the number of phonotactically relevant consonant positions possible in English. It was found that The Rainbow Passage had the second most diverse consonant distribution among the set, approximately 1 SD above the mean, following only the passage Comma Gets a Cure (Honorof et al., 2000). In contrast, My Grandfather had a consonant distribution that was approximately average, and The Northwind and the Sun had among the least diverse consonant distributions, approximately 1 SD below the mean, tied with The Farm Passage (Crystal & House, 1982). ...
Article
Full-text available
Purpose A common way of eliciting speech from individuals is by using passages of written language that are intended to be read aloud. Read passages afford the opportunity for increased control over the phonetic properties of elicited speech, of which phonetic balance is an often-noted example. No comprehensive analysis of the phonetic balance of read passages has been reported in the literature. The present article provides a quantitative comparison of the phonetic balance of widely used passages in English. Method Assessment of phonetic balance is carried out by comparing the distribution of phonemes in several passages to distributions consistent with typical spoken English. Data regarding the distribution of phonemes in spoken American English are aggregated from the published literature and large speech corpora. Phoneme distributions are compared using Spearman rank order correlation coefficient to quantify similarities of phoneme counts in those sources. Results Correlations between phoneme distributions in read passages and aggregated material representative of spoken American English ranged from .70 to .89. Correlations between phoneme counts from all passages, literature sources, and corpus sources ranged from .55 to .99. All correlations were statistically significant at the Bonferroni-adjusted level. Conclusions Passages considered in the present work provide high, but not ideal, phonetic balance. Space exists for the creation of new passages that more closely match the phoneme distributions observed in spoken American English. The Caterpillar provided the best phonetic balance, but phoneme distributions in all considered materials were highly similar to each other.
... The speech material consisted of the publicly-available text Comma Gets a Cure (Honorof et al., 2000) (Appendix 1). The text is composed of four paragraphs and a total of 375 words as follows, Paragraph (1) 5 sentences, 110 words; Paragraph (2) 4 sentences, 73 words; Paragraph (3) 7 sentences, 107 words; Paragraph (4) 5 sentences, 85 words. ...
Article
Prosodic variation between African American English and General American English has been attested to in numerous works, yet few studies have collected measures of F0 in African American English and fewer have examined F0 beyond the word level. Additionally, the analysis of prosodic variation in regional dialects of American English is not well studied. F0 movement at the level of the Intonational Phrase (IP) is known to convey both local and global information. Research on F0 movement in General American English has analyzed combinations of H(igh) and (L)ow pitch accents as categorical markers of prosodic alignment to the segmental string. Understanding the alignment of F0 contours provides key information on phonetic realization and phonological alignment in the creation of intonational categories. This pilot data explores the interaction of F0, vowel duration and word duration of prenuclear and nuclear pitch accents in the read speech of Black and White southern women. This study seeks to determine if group differences exists in the expression of pitch accents between the regionally defined socio-ethnic dialects used by the two groups. Results will be discussed in terms of dialect variation.
... All participants read five standardized passages that contained words and sentences designed to provide a representative sample of sounds as they occur in the English language. The passages included, "The North Wind and the Sun" (NWS; Aesop, 1999); "The Grandfather Passage" (GP; Van Riper, 1963); "The Rainbow Passage" (RP; Fairbanks, 1960); "Arthur the Rat" (AR; Cassidy, 1985); and "Comma Gets a Cure" (CGC; Honorof et al., 2000). The passages ranged in length from 119 to 593 words each. ...
Article
Full-text available
A corpus of recordings of deaf speech is introduced. Adults who were pre- or post-lingually deafened as well as those with normal hearing read standardized speech passages totaling 11 h of .wav recordings. Preliminary acoustic analyses are included to provide a glimpse of the kinds of analyses that can be conducted with this corpus of recordings. Long term average speech spectra as well as spectral moment analyses provide considerable insight into differences observed in the speech of talkers judged to have low, medium, or high speech intelligibility.
Thesis
The thesis addresses the issue of automatic speech recognition and claims that today it is almost on par with the accuracy of human speech recognition. The first part of the work discusses human speech recognition. The main theory that explains its efficiency and the effect of a number of sociophonetic factors is the exemplar theory. The second part of the thesis discusses the most advanced automatic speech recognition currently available and the technology it is based on. Due to a lack of academic research on the performance of ASR systems and on how modern systems cope with language variation the third part of the thesis is an experiment that aims to fill the gap in research and test the ASR systems developed by Google and Apple. The results of the experiment demonstrate that overall these systems are sensitive to language variation and some of the sociophonetic factors that this variation is associated with. There is a bias towards specific dialect region, namely, a positive bias towards the Western states of the USA, and also recording quality and gender biases, with a positive bias towards female speakers in the Google ASR (the latter is statistically insignificant). The age did not appear to have any effect when it was in the numeric continuous format, but there was a slight bias against the older (60+ y.o.) participants. The binary factor of knowledge of other languages did not demonstrate any effect. These findings support the initial hypothesis that ASR systems are currently sensitive to language variation with its sociophonetic factors and have room for further improvement.
Article
This study evaluates F0 declination and reset in read speech produced by African American and White women speakers of American English as an active, linguistically-controlled process. The results demonstrate that African American women have less change in F0 over the duration of the breath group unit than White peers. There was no evidence of the use of final level or rising pitch as documented in informal African American English. The results indicate that in formal interactions, such as those expected in the educational and therapy settings, adult female African American speakers will use F0 declination and reset in sentence level breath groups in a manner consistent with White peers. Implications and directions for future research are discussed.
ResearchGate has not been able to resolve any references for this publication.