Chapter

Disfluencies and the Perspective of Prosodic Fluency

Authors:
  • Inesc-ID/CLUL
  • Inesc-ID / IST, Lisbon, Portugal
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This work explores prosodic cues of disfluent phenomena. We have conducted a perceptual experiment to test if listeners would rate all disfluencies as disfluent events or if some of them would be rated as fluent devices in specific prosodic contexts. Results pointed out significant differences (p vs. disfluency. Distinct prosodic properties of these events were also significant (p < 0.05) in their characterization as fluent devices. In an attempt to discriminate which linguistic features are more salient in the classification of disfluencies, we have also used CART techniques on a corpus of 3.5 hours of spontaneous and prepared non-scripted speech. CART results pointed out 2 splits: break indices and contour shape. The first split indicates that disfluent events uttered at breaks 3 and 4 are considered felicitous. The second one indicates that these events must have plateau or ascending contours to be considered as such; otherwise they are strongly penalized. The results obtained show that there are regular trends in the production of disfluencies, namely, prosodic phrasing and contour shape.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For European Portuguese, much has been said for silent pauses, filled pauses and prolongations (e.g., [13,14]), whereas the other categories are poorly described (with the exception of [15,16]). We know that different filled pauses tend to occur in different prosodic contexts (e.g., aam at major intonational phrase boundaries and mm in coda position). ...
... Segmental prolongations are more likely found at internal clause boundaries and at intonational phrase boundaries. In [15,16], prosodic properties, mainly prosodic phrasing and contour shape, of all types of disfluencies are studied in their relation with an evaluation task regarding fluency/disfluency distinctions. The main results reported are that disfluencies may behave and even be rated as fluent communicative devices, when different segmental and suprasegmental aspects are monitored. ...
... For European Portuguese, much has been said for silent pauses, filled pauses and prolongations (e.g., [13,14]), whereas the other categories are poorly described (with the exception of [15,16]). We know that different filled pauses tend to occur in different prosodic contexts (e.g., aam at major intonational phrase boundaries and mm in coda position). ...
... Segmental prolongations are more likely found at internal clause boundaries and at intonational phrase boundaries. In [15,16], prosodic properties, mainly prosodic phrasing and contour shape, of all types of disfluencies are studied in their relation with an evaluation task regarding fluency/disfluency distinctions. The main results reported are that disfluencies may behave and even be rated as fluent communicative devices, when different segmental and suprasegmental aspects are monitored. ...
Conference Paper
Full-text available
This work explores prosodic cues of disfluencies in a cor-pus of university lectures. Results show three significant (p < 0.001) trends: pitch and energy slopes are signif-icantly different between the disfluency and the onset of fluency; those features are also relevant to disfluency type differentiation; and they do not seem to be a speaker-effect. The best combination of linguistic features one can use to better predict the onset of fluency are pitch and energy resets as well as the presence of a silent pause im-mediately before a repair. Our results, thus, point out to a strategy of prosodic contrast rather than of parallelism. With this work we hope to contribute to the analysis of the prosodic behaviors in the production of the so called disfluencies and in the fluency repair in European Por-tuguese.
... Previous studies on different languages and speech styles have shown that hesitation markers, like lengthenings and fillers, are characterized by specific pitch features. Typically, a flat pitch contour distinguishes hesitant prolongations from non-disfluent ones, the latter being realized with a higher pitch range and rising contour [1,2]. Similarly, filled pauses were reported as occurring with low and flat or falling contour relative to the adjacent prosodic context [3,4,5]. ...
... Previous studies for EP (Mata, 1999(Mata, , 2012Moniz, Trancoso & Mata, 2010;Moniz, 2013) have shown cross-corpora (university lectures, high school presentations, map-task dialogues, broadcast news), intra-corpora (spontaneous vs. prepared (un)scripted) and interspeaker (age, gender, status) variation. Building on these studies, we expect the specificity of typical school presentations to affect the distribution of final intonation patterns. ...
Chapter
Full-text available
Intonational Grammar in Ibero-Romance: Approaches across linguistic subfields is a volume of empirical research papers incorporating recent theoretical, methodological, and interdisciplinary advances in the field of intonation, as they relate to the Ibero-Romance languages. The volume brings together leading experts in Catalan, Portuguese, and Spanish, as well as in the intonation of Spanish in contact situations. The common thread is that each paper examines a specific topic related to the intonation of at least one Ibero-Romance language, framing the analysis in an experimental setting. The novel findings of each chapter hinge on critical connections that are made between the study of intonation and its related fields of linguistic inquiry, including syntax, pragmatics, sociophonetics, language acquisition and special populations. In this sense, the volume expands the traditional scope of Ibero-Romance intonation, including in it work on signed languages (LSC), individuals with autism spectrum disorder and individuals with Williams Syndrome. This volume establishes the precedent for researchers and advanced students who wish to explore the complexities of Ibero-Romance intonation. It also serves as a showcase of the most up-to-date methodologies in intonational research.
... Previous studies for EP (Mata, 1999(Mata, , 2012Moniz, Trancoso & Mata, 2010;Moniz, 2013) have shown cross-corpora (university lectures, high school presentations, map-task dialogues, broadcast news), intra-corpora (spontaneous vs. prepared (un)scripted) and interspeaker (age, gender, status) variation. Building on these studies, we expect the specificity of typical school presentations to affect the distribution of final intonation patterns. ...
Chapter
Full-text available
The present study aims to investigate intonation contours in phrase-final position, in a corpus of spontaneous and prepared unscripted presentations from teenagers (14-15 years old) and adults, collected in a school context. Taking into account the differences between phrasing levels (ToBI breaks 3 and 4), we show that the frequency of low/falling vs. high/rising contours – mainly (H+)L* L and (L+)H* H – varies across oral presentation types. Adults and teenagers follow distinct strategies, though cross-gender differences are also a source of variation. We interpret these changes as an adaptation effect to the speaking styles specifically required at school, which call for the speaker´s effort to speak clearly and to keep the listeners attention, and ultimately as " intelligibility-oriented " speaking style changes.
Article
Full-text available
This study presents the results of two perception experiments aimed at evaluating the effect that specific patterns of disfluencies have on people listening to synthetic speech. We consider the particular case of Cultural Heritage presentations and propose a linguistic model to support the positioning of disfluencies throughout the utterances in the Italian language. A state-of-the-art speech synthesizer, based on Deep Neural Networks, is used to prepare a set of experimental stimuli and two different experiments are presented to provide both subjective evaluations and behavioural assessments from human subjects. Results show that synthetic utterances including disfluencies, predicted by a linguistic model, are identified as more natural and that the presence of disfluencies benefits the listeners’ recall of the provided information.
Article
Full-text available
The subject matter of the dissertation is the prosodic word. It bears on the organization of grammar and phonology, its interface with morphology and syntax, and the nature of phonological representations. Despite the reference to various other languages, it primarily focuses on European Portuguese (EP). https://doi.org/10.5334/jpl.53
Chapter
Full-text available
This second volume contains detailed surveys of the intonational phonology of fourteen typologically diverse languages, described in the Autosegmental-Metrical framework. Unlike the first volume, half of the languages, which vary in their word prosody as well as their geographic distribution, are understudied languages or researched through fieldwork. All chapters provide the prosodic structure and intonational categories of the language as well as a description of focus prosody. The book concludes with a chapter on the methodology of studying intonation from data collection to analysis and a chapter which proposes a new way of characterizing the intonation of the world's languages.
Article
Full-text available
This paper introduces the concept of speech management (SM), which refers to processes whereby a speaker manages his or her linguistic contributions to a communicative interaction, and which involves phenomena which have previously been studied under such rubrics as “planning”, “editing”, “(self-)repair”, etc. It is argued that SM phenomena exhibit considerable systematicity and regularity and must be considered part of the linguistic system. Furthermore, it is argued that SM phenomena must be related not only to such intraindividual factors as planning and memory, but also to interactional factors such as turntaking and feedback, and to informational content. Structural and functional taxonomies are presented together with a formal description of complex types of SM. The structural types are exemplified with data from a corpus of SM phenomena.
Article
Full-text available
We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well as features describing contextual prosodic information in the vicinity of filled pauses may facilitate the detection of deceit in speech.
Conference Paper
Full-text available
This paper explores the results of a previous experiment concerning listeners' ratings of different types of (dis)fluencies and extends the analysis of such phenomena to a corpus of university lectures. Results suggest that, although not all disfluency types are equally tolerated by listeners, such differences may be overridden by an adequate control of tonal scaling and pause length, at least. Index Terms: disfluencies, prosody, fluency ratings.
Conference Paper
Full-text available
Disfluent speech synthesis is necessary in some applications such as automatic film dubbing or spoken translation. This pa- per presents a model for the generation of synthetic disfluent speech based on inserting each element of a disfluency in a con- text where they can be considered fluent. Prosody obtained by the application of standard techniques on these new sentences is used for the synthesis of the disfluent sentence. In addition, local modifications are applied to segmental units adjacent to disfluency elements. Experiments evidence that duration fol- lows this behavior, what supports the feasibility of the model. Index Terms: speech synthesis, disfluent speech, prosody, dis- fluencies. the potential fluent sentences associated with the disfluent ut- terance in conjunction with the local modifications produced by the insertion of the editing term. These local modifications can affect speech prosody and the quality of the original delivery. We show the relevance of these local modifications by studying the impact of disfluencies on the duration of the syllables that surround the editing terms. First we introduce the disfluent speech generation model. Second experimental procedure for the application of this model is presented reflecting the impact on the duration of the syllables that surround the editing terms. Third, we discuss the future work to be done in this ongoing research and the paper ends with conclusions.
Conference Paper
Full-text available
This paper reports preliminary results from a study of disfluencies in European Portuguese, based on a corpus of prepared (non-scripted) and spontaneous oral presentations in high school context. We will focus on the contextual distribution and temporal patterns of filled pauses and segmental prolongations, as well as on the way those are rated by listeners. Results suggest that filled pauses and segmental prolongations behave alike, have similar functions and may be considered in complementary distribution, obeying general syntactic and prosodic constraints. Index Terms spontaneous speech, disfluencies, prosody.
Conference Paper
Full-text available
We study the problem of detecting linguistic events at interword boundaries, such as sentence boundaries and disfluency loca tions, in speech transcribed by an automatic recognizer. Recovering such events is crucial to facilitate speech understanding a nd other natural language processing tasks. Our approach is based on a combination of prosodic cues modeled by decision trees, and word-based event N-gram language models. Several model com- bination approaches are investigated. The techniques are eval- uated on conversational speech from the Switchboard corpus. Model combination is shown to give a significant win over in- dividual knowledge sources.
Conference Paper
Full-text available
One use of text-to-speech synthesis (TTS) is as a component of speech-to-speech translation systems. The output of automatic machine translation (MT) can vary widely in quality, however. A synthetic voice that is extremely intelligible on naturally-occurring text may be far less intelligible when asked to render text that is automatically generated. In this paper, we compare the quality of synthesis of naturally-occurring text and its MT counterpart. We find that intelligibility of TTS on MT output is significantly lower than on either naturally-occurring text or semantically un- predictable sentences, and explore the reasons why. Index Terms: Speech synthesis, Speech-to-speech translation, TTS evaluation, TTS intelligibility
Conference Paper
Full-text available
This paper describes the corpus of university lectures that has been recorded in European Portuguese, and some of the recognition experiments we have done with it. The highly specific topic domain and the spontaneous speech nature of the lectures are two of the most challenging problems. Lexical and language model adaptation proved difficult given the scarcity of domain material in Portuguese, but improvements can be achieved with unsupervised acoustic model adaptation. From the point of view of the study of spontaneous speech characteristics, namely disflluencies, the LECTRA corpus has also proved a very valuable resource.
Article
Full-text available
The occurrence of disfluencies in fully natural speech poses difficult challenges for spoken language understanding systems. For example, although self-repairs occur in about 10% of spontaneous utterances, they are often unmodeled in speech recognition systems. This is partly due to the fact that little is known about the extent to which cues in the speech signal may facilitate automatic repair processing. In this paper, acoustic and prosodic cues to self-repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair correction. The relative contributions of these speech-based cues, as well as other text-based repair cues, are examined in a statistical model of repair site detection that achieves a precision rate of 91% and recall of 86% on a prosodically labeled corpus of repair utterances.
Article
Full-text available
Speakers often repeat the first word of major constituents, as in, "I uh I wouldn't be surprised at that." Repeats like this divide into four stages: an initial commitment to the constituent (with "I"); the suspension of speech; a hiatus in speaking (filled with "uh"); and a restart of the constituent ("I wouldn't."). An analysis of all repeated articles and pronouns in two large corpora of spontaneous speech shows that the four stages reflect different principles. Speakers are more likely to make a premature commitment, immediately suspending their speech, as both the local constituent and the constituent containing it become more complex. They plan some of these suspensions from the start as preliminary commitments to what they are about to say. And they are more likely to restart a constituent the more their stopping has disrupted its delivery. We argue that the principles governing these stages are general and not specific to repeats.
Article
Full-text available
Speakers are often disfluent, for example, saying "theee uh candle" instead of "the candle." Production data show that disfluencies occur more often during references to things that are discourse-new, rather than given. An eyetracking experiment shows that this correlation between disfluency and discourse status affects speech comprehensions. Subjects viewed scenes containing four objects, including two cohort competitors (e.g., camel, candle), and followed spoken instructions to move the objects. The first instruction established one cohort as discourse-given; the other was discourse-new. The second instruction was either fluent or disfluent, and referred to either the given or new cohort. Fluent instructions led to more initial fixations on the given cohort object (replicating Dahan et al., 2002). By contrast, disfluent instructions resulted in more fixations on the new cohort. This shows that discourse-new information can be accessible under some circumstances. More generally, it suggests that disfluency affects core language comprehension processes.
Conference Paper
Full-text available
The study aims to test quantitatively whether filled pauses (FPs) may highlight discourse structure. More specifically it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, the FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses
Article
Full-text available
Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, as well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation
Article
Portuguese emerged from vulgar Latin during the course of the third century. Influential in its development were successive invasions by Germanic peoples, Visigoths, and Moors, the latter of whom were finally evicted in the thirteenth century. As a consequence of the newly-independent kingdoms imperial achievements, Portuguese is the national language of Brazil and the official language of several African countries. Maria Helena Mateus and Ernesto dAndrade present a broad description and comparative analysis of the phonetics and phonology of European and Brazilian Portuguese. They begin by introducing the history of Portuguese and its principal varieties. Chapter 2 describes the phonetic characteristics of consonants, vowels, and glides, and Chapter 3 looks at prosodic structure. Chapters 4 and 5 present the general characteristics of Portuguese nominal and verbal systems, the former considering inflectional and the latter derivational processes. Chapter 6 examines stress, main, secondary, and echo, and Chapter 7 describes phonological processes that are not related to the morphological structure of the word, including the peculiar process of nazalization. The authors deploy current theoretical models to explain the rich variety of Portuguese phonology and interrelated aspects of morphology. This is by far the most comprehensive account of the subject to have appeared in English, and the most up-to-date in any language.
Book
In Speaking, Willem "Pim" Levelt, Director of the Max-Planck-Institut für Psycholinguistik, accomplishes the formidable task of covering the entire process of speech production, from constraints on conversational appropriateness to articulation and self-monitoring of speech. Speaking is unique in its balanced coverage of all major aspects of the production of speech, in the completeness of its treatment of the entire speech process, and in its strategy of exemplifying rather than formalizing theoretical issues. Bradford Books imprint
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Book
This volume brings together the invited papers and selected participants’ contributions presented at the COST 2102 International Workshop on “Verbal and Nonverbal Communication Behaviours”, held in Vietri sul Mare, Italy, March 29–31, 2007. The workshop was jointly organized by the Faculty of Science and the Faculty of Psychology of the Second University of Naples, Caserta, Italy, and the International Institute for Advanced Scientific Studies “Eduardo R. Caianiello”(IIASS), Vietri sul Mare, Italy. The workshop was a COST 2102 event, and it was mainly sponsored by the COST (European Cooperation in the Field of Scientific and Technical Research) Action 2102 in the domain of Information and Communication Technologies (ICT), as well as by the above-mentioned organizing Institutions. The main theme of the workshop was to discuss the fundamentals of verbal and nonverbal communication features and their relationships with the identification of a person, his/her socio-cultural background and personal traits. In the past decade, a number of different research communities within the psychological and computational sciences have tried to characterize human behaviour in face-to-face communication by several features that describe relationships between facial, prosodic/voice quality, formal and informal communication modes, cultural differences, individual and socio-cultural variations, stable personality traits and degrees of expressiveness and emphasis, as well as the individuation of the interlocutor’s emotional and psychological states.
Article
"A unique view of language studies throughout the 20th and into the 21st centuries: where the mainstream emphasis has been, what has been missing, and what remedies are needed. In other words, this book is a call for a paradigm shift in the study of oral communication. It is a must read for people interested in language use, as well as for specialists in language studies." Camelia Suleiman, Ph.D., Florida International University, Miami, FL, USA "The authors have identified crucial theoretical and methodological assumptions that have hampered scholarship on language use. Their critical assessment is grounded in nuanced theoretical analysis and rigorous empirical studies. As a result, they reveal the complexity, elegance, and moral aspects of day to day dialogical communication." Kevin P. Weinfurt, Ph.D., Duke University, Durham, NC, USA In contrast to traditional approaches of mainstream psycholinguists, the authors of Communicating with One Another approach spontaneous spoken discourse as a dynamic process, rich with structures, patterns, and rules other than conventional grammar and syntax. Daniel C. O’Connell and Sabine Kowal thoroughly critique mainstream psycholinguistics, proposing instead a shift in theoretical focus from experimentation to field observation, from monologue to dialogue, and from the written to the spoken. They invoke four theoretical principles: intersubjectivity, perspectivity, open-endedness, and verbal integrity. Their analyses of historical and original research raise significant questions about the relationship between spoken and written discourse, particularly with regard to transcription and punctuation. With emphasis on political discourse, media interviews, and dramatic performance, the authors review both familiar and unexplored characteristics of spontaneous spoken communication, including: • The speaker’s use of prosody. • The functions of interjections. • What fillers do for a living. • Turn-taking: Smooth and otherwise. • Laughter, applause, and booing: from individual listener to collective audience. • Pauses, silence, and the art of listening. The paradigm shift proposed in Communicating with One Another will interest and provoke readers concerned about communicative language use – including psycholinguists, sociolinguists, and anthropological linguists.
Article
This chapter presents an overview of a number of perspectives on individual differences in language ability and language behavior from a linguistic point of view, together with a preliminary formulation of a set of research questions that these perspectives bring to mind. The two senses of competence known to people that is familiar with recent theorizing on language parallel two approaches to the study of individual differences in language behavior. The chapter discusses many different ways of being fluent in a language. It also discusses the problem of devising tests for measuring degrees of fluency. The word fluency seems to cover a wide range of language abilities; these are best described with terms like articulateness, volubility, eloquence, wit, garrulousness, etc.
Article
This contribution provides a cross-language study on the acoustic and prosodic characteristics of vocalic hesitations.One aim of the presented work is to use large corpora to investigate whether some language universals can be found. A complementary point of view is to determine if vocalic hesitations can be consid-ered as bearing language-specific information. An additional point of interest con-cerns the link between vocalic hesitations and the vowels in the phonemic inventory of each language. Finally, the gained insights are of interest to research in acoustic modeling in for automatic speech, speaker and language recognition. Hesitations have been automatically extracted from large corpora of journalistic broadcast speech and parliamentary debates in three languages (French, American English and European Spanish). Duration, fundamental frequency and formant val-ues were measured and compared. Results confirm that vocalic hesitations share (potentially universal) properties across languages, characterized by longer dura-tions and lower fundamental frequency than are observed for intra-lexical vowels in the three languages investigated here. The results on vocalic timbre show that while the measures on hesitations are close to existing vowels of the language, they do not necessarily coincide with them. The measured average timbre of vocalic hesita-tions in French is slightly more open than its closest neighbor (/oe/). For American English, the average F1 and F2 formant values position the vocalic hesitation as a mid-open vowel somewhere between /2/ and /ae/. The Spanish vocalic hesitation almost completely overlaps with the mid-closed front vowel /e/.
Article
This study aims to test whether filled pauses (FPs) may highlight discourse structure. This question is tackled from the perspectives of both the speaker and the listener. More specifically, it is first investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically different from those at shallower breaks. Analyses of twelve spontaneous monologues (Dutch) show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phrase-initially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also differ with respect to the distribution of neighbouring silent pauses. Finally, a general linear model reveals that discourse structure can to some extent be predicted from characteristics of the FPs.
Article
In spontaneous speaking, the is normally pronounced as thuh, with the reduced vowel schwa (rhyming with the first syllable of about). But it is sometimes pronounced as thiy, with a nonreduced vowel (rhyming with see). In a large corpus of spontaneous English conversation, speakers were found to use thiy to signal an immediate suspension of speech to deal with a problem in production. Fully 81% of the instances of thiy in the corpus were followed by a suspension of speech, whereas only 7% of a matched sample of thuhs were followed by such suspensions. The problems people dealt with after thiy were at many levels of production, including articulation, word retrieval, and choice of message, but most were in the following nominal. © 1997 Elsevier Science B.V. All rights reserved.
Article
Hesitation phenomena are intricately connected with prospective and retrospective speech-production tasks and mark critical points in processing. They are also causally related to types of quality control which can be expressed as conversational postulates governing wellformedness criteria. Corresponding to the concepts of forestalled versus committed errors (error-free or error-full output), two major hesitation categories suffice: stalls and repair. Supported by a corpus of English and German, the new taxonomy captures previously uncategorized information: the grammatical locus of repair operations and the structural changes they cause.
Article
Abstract THE COMMUNICATIVE VALUE OF FILLED PAUSES IN SPONTANEOUS SPEECH by RALPH LEON ROSE Filled pauses (FPs, e.g. er, erm) and other hesitation phenomena are ever-present ele-
Article
Unlike read or laboratory speech, spontaneous speech contains high rates of disfluencies (e.g. repetitions, repairs, filled pauses, false starts). This paper aims to promote ‘disfluency awareness’ especially in the field of phonetics –which has much to offer in the way of increasing our understanding of these phenomena. Two broad claims are made, based on analyses of disfluencies in different corpora of spontaneous American English speech. First, an Ecology Claim suggests that disfluencies are related to aspects of the speaking environments in which they arise. The claim is supported by evidence from task effects, location analyses, speaker effects and sociolinguistic effects. Second, an Acoustics Claim argues that disfluency has consequences for phonetic and prosodic aspects of speech that are not represented in the speech patterns of laboratory speech. Such effects include modifications in segment durations, intonation, voice quality, vowel quality and coarticulation patterns. The ecological and acoustic evidence provide insights about human language production in real-world contexts. Such evidence can also guide methods for the processing of spontaneous speech in automatic speech recognition applications.
Article
Because there have been few attempts to specify precisely what fluency is, Heidi Riggenbach has culled an impressive list of linguistic scholars and researchers representing the disciplines of psycholinguistics, socio-linguistics, and speech communication, for example, to write original papers for Perspectives on Fluency. This volume offers a historical overview of fluency and, in seeking to better define the term, focuses on both native speaker and nonnative speaker fluency. Section 1, What Is Fluency? presents papers that all describe fluency, but in different ways. The articles in Section 2, Essential Components of Fluency, consider features or components that contribute to impressions of fluency. Section 3, Cognitive Processes Underlying Fluency, is devoted to an exploration of the psycholinguistic factors underlying fluency. Three studies are presented in Section 4, Empirical Studies on Nonnative Fluency, and they exemplify the range of approaches to characterizing learners as fluent or nonfluent in their target language. One objective of Perspectives on Fluency is to provide a starting point for language researchers interested in exploring the concept of fluency, a foundation that, until the arrival of this volume, did not exist. The book can be useful to those approaching fluency from a language assessment perspective, and those interested in the relationship of fluency to oral proficiency.
Article
The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in "and-uh"), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word.
Article
Clark and Fox Tree (2002) have presented empirical evidence, based primarily on the London-Lund corpus (LL; Svartvik & Quirk, 1980), that the fillers uh and um are conventional English words that signal a speaker's intention to initiate a minor and a major delay, respectively. We present here empirical analyses of uh and um and of silent pauses (delays) immediately following them in six media interviews of Hillary Clinton. Our evidence indicates that uh and um cannot serve as signals of upcoming delay, let alone signal it differentially: In most cases, both uh and um were not followed by a silent pause, that is, there was no delay at all; the silent pauses that did occur after um were too short to be counted as major delays; finally, the distributions of durations of silent pauses after uh and um were almost entirely overlapping and could therefore not have served as reliable predictors for a listener. The discrepancies between Clark and Fox Tree's findings and ours are largely a consequence of the fact that their LL analyses reflect the perceptions of professional coders, whereas our data were analyzed by means of acoustic measurements with the PRAAT software (www.praat.org). A comparison of our findings with those of O'Connell, Kowal, and Ageneau (2005) did not corroborate the hypothesis of Clark and Fox Tree that uh and um are interjections: Fillers occurred typically in initial, interjections in medial positions; fillers did not constitute an integral turn by themselves, whereas interjections did; fillers never initiated cited speech, whereas interjections did; and fillers did not signal emotion, whereas interjections did. Clark and Fox Tree's analyses were embedded within a theory of ideal delivery that we find inappropriate for the explication of these phenomena.
Conference Paper
Both filled and unfilled (silent) hesitation types of pauses in a widely used speech database were examined for both unintended and intended pauses. A distinction is made between grammatical pauses (at major syntactic boundaries) and ungrammatical ones (within minor syntactic phrases). While unfilled pauses cannot be reliably thus separated based on silence duration alone, grammatical pauses tended to be longer. In the prepausal word before ungrammatical pauses, there were few continuation rises in fundamental frequency (F0), whereas 70% of the grammatical pauses were accompanied by a prior F0 rise. Identifying the syntactic function of such pauses could improve the performance of an automatic speech recognizer, by eliminating from consideration some hypotheses based on spectral analysis. Results are given which could allow simple identification of most filled and unfilled pauses and their syntactic function