Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Previous intonational research on Mandarin has mainly focused on the prosody modeling of statements or the prosody analysis of interrogative sentences. To support related speech technologies, e.g., Text-to-Speech, the quantitative modeling of intonation of interrogative sentences with a large-scale corpus still deserves attention. This paper summarizes our work on the quantitative prosody modeling of interrogative sentence in Mandarin. A large-scale natural speech corpus was used in this study. By extracting the pitch contours and fitting the intonation curves, we found that F0 declination and final lowering both existed in interrogative sentences, while they were claimed to be absent in Mandarin in some previous studies. In addition, the declination function could be modeled linearly, and the bearing unit of final lowering in Mandarin was found to be the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range. It was argued in this study that the difference between this finding and the commonly believed rising intonation of the interrogative sentences resulted from the nonlinear relationship between prosody production and perception. The underlying mechanism for the existence of F0 declination and final lowering in interrogative sentences is also discussed.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When multiple communicative functions are simultaneously conveyed by f 0 , confusion can arise (Liu et al., 2016); e.g. the common question marker final rise could be confused with a lexical rising tone in tone languages. Two acoustic cues are particularly relevant to the distinction between questions vs. statements, namely f 0 declination and final lowering, both of which mark statements across languages (though note Li et al., 2017, who argued that they also marked questions in Mandarin). f 0 declination is the tendency of f 0 to decline over the course of an utterance, and it is argued to be the result of the gradual decrease in subglottal air pressure (e.g. ...
Chapter
This chapter surveys issues related to the production of tone in the world’s languages. Here the term ‘tone’ refers to the localised (within-syllable) use of fundamental frequency that contrasts lexical meanings (thus excluding pitch accent and stress languages). A comprehensive review of tonal phonetics is presented covering the acoustic correlates of tone, contextual tonal variation, methods used in tone production research, as well as recent research topics in tonal phonetics. We offer suggestions for teaching and learning of tone as a phonetics topic and the chapter concludes with suggestions for future directions for tone production research.
Article
Full-text available
The current study aims to examine prosodic sensitivity in Chinese children with dyslexia and its relation to Chinese reading in children with and without dyslexia. A total of 172 Chinese children from third grade to sixth grade in Taiwanese primary schools were recruited. Thirty (14 male) children were identified as having dyslexia, and the remaining children (N = 142; 67 male) were typically developing children matched with those with dyslexia as carefully as possible with respect to school, grade, and gender. Our results indicated that group differences were found for all three types of prosodic sensitivity. Moderation analyses showed that group had no significant interaction with prosodic sensitivity in predicting Chinese reading, so the participants in the two groups were combined in the following analyses. The results of the stepwise regression analyses showed that only lexical tone awareness could significantly predict Chinese character reading after controlling for phonological awareness, while only intonation awareness could significantly predict reading comprehension after controlling for Chinese character reading. The results provide preliminary evidence on the issue of prosodic sensitivity in Chinese children with dyslexia and its role in Chinese reading, which might provide a novel approach to the teaching of Chinese languages.
Article
Full-text available
In order to improve the low accuracy of the traditional algorithm, a quantitative land state network situation awareness algorithm based on neutral statistics is proposed. According to the definition of neutral statistical awareness algorithm, the quantitative awareness model of land state cyberspace situation is constructed from the perspectives of attack event and security vulnerability awareness, security situation modeling and associated threat assessment. Determine the malicious nodes, analyze the state function, and design a quantitative perception process. From the simulation results, we can see that the method proposed in this paper can fully adapt to various information processing work, and has the advantages of low computational complexity, short evaluation time, high evaluation accuracy, and the algorithm situation quantitative is basically consistent with the actual value, the error is also less than 0.1, which has good practical application effect.
Article
Full-text available
Synthesizing tones plays an important role in text-to-speech systems of tonal languages. To accomplish this, the two important steps are to determine the pitch markers of voice utterances and synthesize F0 trajectories for lexical tones. In this paper, we propose two efficient algorithms, one of them is to locate the pitch markers at the peaks of the cumulative signal of each voiced part of the input utterance and the other is to generate F0 trajectories of tones with quantitative target approximation (qTA) parameters of Xu model. The experimentation has shown that the proposed algorithms present pitch markers with high accuracy which has enabled us to generate tones with complex shapes.
Conference Paper
Full-text available
Despite the discovery of final lowering effect in widespread language, its origin and realization in different phonological environments still needs exploration. In this article, with a large dialogue corpus, three experiments are conducted to examine how phonological factors (such as prosodic units, sentence stresses and boundary pitch movement) would influence the realization of final lowering in Chinese Mandarin. The results show that: I) The bearing unit of final lowering in Chinese is the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range in a physiological way. II) The position of the sentence stress has an influence on the presence/absence of final lowering. To be specific, final lowering tends to be triggered by sentence stresses on the penultimate and last third prosodic word, and suppressed by sentences stresses prior to the last third prosodic word. III) Final lowering effect would be pushed leftward by sentence stresses and high boundary tones in final positions. This article lends support to the phonological origin of final lowering, and introduces a cross-linguistic framework of prosodic structure to analyze its specific realization under different conditions of stress positions and boundary pitch movements.
Conference Paper
Full-text available
To support text-to-speech with detailed prosody rules and to generate natural prosody, the paper studied the pitch variation near the end of sentences based on a Chinese Mandarin natural dialogue corpus. An additional lowering effect on the last prosodic word was found in both questions and statements, and proved to be independent of tone influence. Nevertheless, this effect, which is referred to as final lowering in other languages, was claimed to be absent in Chinese by some previous experimental studies. Such a contradiction is very likely to be caused by the difference between experimental speech versus natural speech. Based on this observation, the paper proposed a combination of the two methods in intonation studies, in which experimental speech served as an entry point to develop new topics, while natural speech served as a necessary extension to revise and apply prosody rules.
Article
Full-text available
Declination has been studied intensively from production data and from perceptual experiments. There is a clear trend in many languages that Fo values tend to drift down in an utterance, particularly in declarative sentences (’t Hart and Cohen, 1973; Maeda, 1976; Thorsen, 1980a; Cohen, Collier &’ t Hart, 1982. Perceptually, listeners compensate for such a downtrend: given two peaks of equal Fo value, the later one is interpreted as having higher prominence, or for two peaks to be perceived as equal, the second peak should have lower Fo value than the first (Pierrehumbert, 1979; Gussenhoven and Rietveld, 1988; Terken, 1991; Ladd, 1993; Terken, 1993). There is also an extensive literature offering physiological explanations of declination (Lieberman, 1967; Titze, 1989; Strik and Boves, 1995). Nonetheless, there are some obvious difficulties in interpreting declination data. Most of the languages being investigated have constraints on accent combination so that the declination slope has to be estimated from sparsely located data points such as Fo peaks or valleys, while the phonological status of the observed peaks and valleys may be unclear. In addition, an observable Fo contour of a sentence is the result of the combination of many factors. There is no unique solution in decomposing an observed complex Fo pattern into individual effects. To study one effect one often needs to make strong assumptions about others, which turns out to be a major cause of disagreements in the intonation literature.
Book
Full-text available
There is an online scan available to borrow at: https://archive.org/details/japanesetonestru00pier The description there is: Japanese Tone Structure provides a thorough, phonetically grounded description of accent and intonation in Tokyo Japanese and uses it to develop an explicit account of surface phonological representation. The unusual amount of quantitative phonetic data analyzed and its testing in a detailed model make this an important new study for theoretical phonologists, phoneticians, and specialists in Japanese. The authors' broader purpose, however, is to develop a general theory of surface representation that can capture salient facts about prosodic structure in all languages and provide a suitable input to phonetic rules. The theory integrates autosegmental principles into a metrical account of prosodic structures in an explicit formalism. The work establishes phonology and phonetics as a productive area in cognitive science.
Article
Full-text available
Two experiments were designed to examine final lowering in English and Greek. In both, the number of unstressed syllables between the last two accents varied between two and three syllables (English), and two and four syllables (Greek). In English the final accent was one or three syllables from the end of the utterance; in Greek the distance was zero, one or two syllables. Final lowering was evident in both languages, while the extra syllables between the last two accents did not produce additional lowering, clearly showing that final lowering is independent of declination. In both languages the final accent was scaled lower when closer to the end of the utterance, though the effect was not consistent across speakers. This result suggests that final lowering is not phonologically controlled, but more research is necessary to draw a firm conclusion on this point.
Article
Full-text available
Current advances in tone research are rather uneven. The major obstacles to faster progress in the area are not in the lack of technological means, but in the mindset of our discipline. This paper discusses ways to improve tone research by considering a set of basic principles both in research methodology and in theoretical thinking.
Article
Full-text available
This paper addresses higher level organization in discourse prosody. Fluent speech prosody of text reading illustrated higher level speech planning above phrases and prosody segments above intonation units. Adopting a top-down perspective allowed clearer reflection of scope and unit involved. We examined large amount of speech data via a corpus approach, studied read discourse through perceived boundaries, analyzed prosodic characteristics of between-boundary units, and found evidence of higher prosodic specifications above phrase intonation. Through tailored quantitative analyses corresponding to a multi-layer prosodic hierarchy, we found how different prosodic levels contribute separately to prosody output, and how cumulative contributions added up to output prosody. The prosody hierarchy specifies that speech paragraphs are immediate constituents of discourse; phrases immediate constitute of speech paragraphs. Lower level nodes are subjacent units subject to higher level constraints; sister constituents bear association to one another. Hence central to discourse prosody is higher level specification as well as cross-phrase association in addition to discrete intonation patterns. Cross-phrase cadence templates could be derived to account for the melody, rhythm, loudness and boundary breaks of fluent speech. Further, evidence of cross-paragraph discourse association is also found. We believe in addition to advance understanding of discourse prosody; the knowledge is also directly applicable to speech technology development, especially speech synthesis.
Article
Full-text available
The present paper demonstrates that much can be learned about intonation through the study of the contribution of lexical tones to the f0 contour of speech utterances. A fundamental principle and several basic mechanisms of tone production and perception are proposed based on studies of both tone and intonation. Their implications for intonation in general are discussed.
Article
Full-text available
Lists of bean names with two to five items were elicited from speakers of Mainstream American English (MAE) and Standard British English (SBE), and three methods for detecting final lowering in these data were used: comparisons of the scaling of final and penultimate peaks which have the same order in the lists (e.g., third peak), modelling of the data as exponential decay followed by comparisons of attested and predicted final peak scaling, and comparison of interaccent drops in F 0 . The first two measures showed that final lowering was present in both MAE and SBE, while the comparison of F 0 drops across peaks showed only very weak evidence for final lowering. Further, the results showed that final peak scaling was not affected either by durational differences in the interval between penultimate and final peaks, or by the distance of the final peak from the end of the utterance. Together, these results suggest that final lowering is independent of declination and targets the final accent of utterances, indicating that final lowering is grammaticalized in the linguistic varieties examined here. Finally, the differences between the methods used to detect final lowering show that its effect on peak scaling is very small and thus caution is needed when choosing a method for its detection and when interpreting the results.
Conference Paper
Full-text available
Modeling prosodic rhythm is of great importance for both speech synthesis and speech understanding, and it requires a large enough corpus with precise prosodic boundary labels. This paper proposes a maximum entropy (ME) based hierarchical model, which utilizes both text and acoustic features, to automatically label Mandarin prosodic boundaries. Results of comparative experiments show that, for the task of prosodic boundary detection, ME model obviously outperforms classification and regression tree (CART), and the bottom-up hierarchical framework is also significantly superior to the flat single-level framework.
Article
Full-text available
The current state-of-the-art hidden Markov model (HMM)-based text-to-speech (TTS) can produce highly intelligible, synthesized speech with decent segmental quality. However, its prosody, especially at phrase or sentence level, still tends to be bland. This blandness is partially due to the fact that the state-based HMM is inadequate in capturing global, hierarchical suprasegmental information in speech signals. In this paper, to improve the TTS prosody, longer units are first explicitly modeled with appropriate parametric distributions. The resultant models are then integrated with the state-based baseline models in generating better prosody by maximizing the joint probability. Experimental results in both Mandarin and English show consistent improvements over our baseline system with only state-based prosody model. The improvements are both objectively measurable and subjectively perceivable.
Conference Paper
Full-text available
The generation of naturally-sounding F0 contours in TTS enhances the intelligibility and perceived naturalness of synthetic speech. In earlier works the first author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0, and an automatic procedure for extracting the parameters from the F0 contour which, however, was specific to German. As has been shown by Fujisaki and his co-workers, parametrization of F0 contours of Mandarin requires negative tone commands, as well as a more precise control of F0 associated with the syllabic tones. This paper presents an approach to the automatic parameter estimation for Mandarin, as well as first results concerning the accuracy of estimation. The paper also introduces a recently developed tool for editing Fujisaki parameters featuring resynthesis which will soon be publicly available.
Conference Paper
Full-text available
The paper deals with a prosodic comparison of spontaneous and read-aloud speech. More specifically, the study reports data on F<sub>0 </sub> declination in these two speaking modes using Swedish materials. For both speaking styles the analysis revealed negative slopes, a steepness-duration dependency with declination being less steep in longer utterances than in shorter ones and resetting at utterance boundaries. However, there was a difference in degree of declination between the two speaking styles, read-aloud speech in general having steeper slopes, a more apparent time dependency and stronger resetting than spontaneous speech
Article
Full-text available
Previous work examining prosodic cues in online spoken-word recognition has focused primarily on local cues to word identity. However, recent studies have suggested that utterance-level prosodic patterns can also influence the interpretation of subsequent sequences of lexically ambiguous syllables (Dilley, Mattys, & Vinke, Journal of Memory and Language, 63:274-294, 2010; Dilley & McAuley, Journal of Memory and Language, 59:294-311, 2008). To test the hypothesis that these distal prosody effects are based on expectations about the organization of upcoming material, we conducted a visual-world experiment. We examined fixations to competing alternatives such as pan and panda upon hearing the target word panda in utterances in which the acoustic properties of the preceding sentence material had been manipulated. The proportions of fixations to the monosyllabic competitor were higher beginning 200 ms after target word onset when the preceding prosody supported a prosodic constituent boundary following pan-, rather than following panda. These findings support the hypothesis that expectations based on perceived prosodic patterns in the distal context influence lexical segmentation and word recognition.
Article
Full-text available
The history of research on speech perception and speech production is replete with examples of nonlinearities between articulation and acoustics, and between acoustics and perception. These nonlinearities are useful for communication. They allow 1) adequate production of speech sounds and words despite people having different vocal tracts with different resonance capabilities, and 2) adequate word recognition despite variation in the acoustic signal across speakers, emphasis, background noise, etc. Yet context and the listener’s expectancies often strongly influence what is perceived; perception is dynamic, influenced by multiple factors that change slowly or quickly as speech goes on. In this chapter we present a selected history of demonstrations of nonlinearities in speech and attempt to exploit the nonlinearities in order to uncover the dynamics of both perception and production of speech.
Conference Paper
Full-text available
In this study, the scaling of utterance-initial f0 values and H initial peaks are examined in several Romance languages as a function of phrasal length, measured in number of pitch accents (1 to 3 pitch accents) and in number of syllables (3 to 15). The motivation for this study stems from contradictory claims in the literature regarding whether the height of the initial f0 values and peaks is governed by a look-ahead or preplanning mechanism. A total of ten speakers of five Romance language varieties (Catalan, Italian, Standard and Northern European Portuguese, and Spanish) read a total of 3720 declarative utterances (744 utterances per language) of varying length in number of pitch accents and syllables. The data reveal that the majority of speakers tend to begin higher in longer utterances. Results thus confirm recent findings about the need for a certain amount of global preplanning in tonal production ([1], [2]). The failure to find a correlation between phrase length and initial scaling for all speakers within languages shows that we are dealing with soft preplanning (in Liberman & Pierrehumbert's terms [3]), that is, an optional production mechanism that may be overridden by other tonal features.
Article
Full-text available
This paper reports the development of a quantitative target approximation (qTA) model for generating F(0) contours of speech. The qTA model simulates the production of tone and intonation as a process of syllable-synchronized sequential target approximation [Xu, Y. (2005). "Speech melody as articulatorily implemented communicative functions," Speech Commun. 46, 220-251]. It adopts a set of biomechanical and linguistic assumptions about the mechanisms of speech production. The communicative functions directly modeled are lexical tone in Mandarin and lexical stress in English and focus in both languages. The qTA model is evaluated by extracting function-specific model parameters from natural speech via supervised learning (automatic analysis by synthesis) and comparing the F(0) contours generated with the extracted parameters to those of natural utterances through numerical evaluation and perceptual testing. The F(0) contours generated by the qTA model with the learned parameters were very close to the natural contours in terms of root mean square error, rate of human identification of tone, and focus and judgment of naturalness by human listeners. The results demonstrate that the qTA model is both an effective tool for research on tone and intonation and a potentially effective system for automatic synthesis of tone and intonation.
Article
Synthesis by rule of a limited set of prosodic features of Southern English has been attempted as an extension of a previously reported system for synthesis of segmental phonemes. Methods used for synthesis of intonation features, pausal features and prominence are described.
Chapter
A difference three-way [condition×lobe×hemisphere] ANOVA revealed a main effect of condition (F (1, 12) = 23.924, p = 0.000) indicating that the lexical tone evoked a larger negative deflection than the intonation (effect magnitude: -1.879 uV), a main effect of lobe (F (2, 24) = 7.677, p = 0.013) due to the fact that a larger negative deflection existed at frontal and central than parietal sites (effect magnitude: -0.929 uV, -0.893 uV, respectively), and a main effect of hemisphere (F (1, 12) = 5.691, p = 0.034) due to the fact that a larger negative deflection existed for right than left hemisphere (effect magnitude: -0.296 uV). In addition, there was a significant interaction between condition and lobe (F (2, 24) = 8.459, p = 0.002). Subsequently simple effect analyses showed that a larger negative deflection existed over frontal and central than parietal sites only in the tone condition (F (2, 24) = 11.71, p = 0.000). However, the difference ANOVA with peak latency as dependent variable found no significant effect.
Article
The purpose of this contribution is to investigate the similarities in form and function of prosody among diverse languages. All speakers, regardless of their specific language, are equipped with the same production and perception apparatus, and consequently have the same capabilities and must face the same physiological constraints. Such similarities should be reflected in the acoustic production of any speaker. The first specific aim of this contribution is to review a number of striking acoustic similarities in the suprasegmental aspects of neutral sentences in different languages, together with possible physiological explanations for them.
Article
Acoustic analysis of recordings by four Standard Danish speakers reveals that each declarative sentence in a read text is associated with its own declining intonation contour, but together two or three such contours describe an overall falling slope. Individual sentence intonation contours are steeper, and demonstrate greater amounts of resetting between them, in a succession of declarative terminal sentences than in a corresponding string of coordinate main clauses. In other words, the closer relation between coordinate structures is reflected in a more coherent or less segregated intonational structure. The results are compared with other languages, and the implications for the abstract representation of Danish intonation are discussed.
Article
In this study, the scaling of peak fundamental frequency (f0) values in Mexican Spanish downstepping contours is examined as a function of the following linguistic factors: (1) phrasal length; (2) temporal distance between pitch accents; (3) phrasal position; and (4) f0value of preceding peak. The motivation for this study stems from contradictory claims in the literature regarding whether downtrend is governed by local or global factors. Three speakers of Mexican Spanish read a total of 540 declarative utterances (2304 target pitch accents) of varying length (from two to five pitch accents) and varying distance between H* pitch accents (from two to three intervening unstressed syllables). The data reveal that the f0value of the previous peak (as opposed to phrasal position) is the most important predictor of peak height. In our data, between 65 and 80% of the variance of the data is predicted by exclusively using a localdownstep ratioor constant reduction in the previous peak's pitch value. Neither phrasal length nor distance between adjacent pitch accents has a significant effect on the height of a given f0peak. Utterance-final peaks are best predicted by using a particular ratio of decay (higher than the downstep ratio)anda phrasal length factor: the use of the latter factor reflects a tendency for final peaks in longer utterances to remain at a relatively high f0level.
Article
Local pitch contours belonging to the same perceptual or phonological class vary significantly as a result of the structure (i.e., the segments and their durations) of the syllables they are associated with. For example, in nuclear rise-fall pitch accents in declaratives, peak location (measured from stressed syllable start) can vary systematically between 150 and 300 ms as a function of the durations of the associated segments (van Santen & Hirschberg, 1994). Yet, there are temporal changes in local pitch contours that are phonologically significant even though their magnitudes do not appear to be larger than changes due to segmental effects (e.g., KÖhler, 1990; D’Imperio & House, 1997).
Article
In an attempt to determine the response characteristics of the larynx in voluntary pitch change, five adult male subjects were instructed to execute a variety of continuous pitch changes, as rapidly as possible, within the range 90–220 Hz. For a given pitch interval, there was a marked tendency for all upward pitch change to take longer than a downward pitch change. Also, unexpectedly, there was no marked tendency for a change involving a wide pitch interval to take longer than a change involving. a smaller interval. Speculations on the physiological reasons for these relations, as well as their possible relevance to the phonology of tone and intonation, will be offered. [Supported by the National Science Foundation and a University of California Faculty Research Grant.]
Article
In many of the Indo-European languages (Öhman, 1967; Isačenko & Schädlich, 1966;’t Hart, 1966; Maeda, 1974; Vaissière-Maeda, 1980) as well as in the Japanese language (Fujisaki & Nagashima, 1969), the contour of the voice fundamental frequency (henceforth F 0 contour) plays an important role in transmitting not only linguistic information but also nonlinguistic information such as naturalness, emotion, and speaker idiosyncrasy. Because of difficulties in accurate analysis and in quantitative description, the relationships between the linguistic-nonlinguistic information and the F 0 -contour characteristics have not been fully clarified. The elucidation of these relationships requires, first, the selection of characteristic parameters that are capable of describing the essential features of an F 0 contour, and second, a method for extracting these parameters from an observed F 0 contour. In other words, an analytical formulation (i.e., a model) of the control process of voice fundamental frequency is indispensable for the quantitative analysis and linguistic interpretation of F 0 -contour characteristics.
Article
One explanation for the current controversy over pitch declination is that different observers have studied different kinds of material. In order to explore the conjecture that voice fundamental frequency (F 0) falls consistently in certain types of sentences and not others, we measured F 0 contours in three distinctively different modes of speaking—a list of 43 short unrelated sentences, an essay, and a free conversation between two people. We found evidence of declination in only half of the short isolated sentences; and in very few sentences from essay readings and conversation. We conclude that declination observed in isolated sentences comes from two sources: (1) a signal for a new idea marked by high F 0 at the beginning of the sentence; and (2) lack of any particular emphasis in a random sentence, and hence a dry and mechanical reading.
Article
In many languages, fundamental frequency shows a marked decrease utterance-finally or phrase-finally. Ladefoged (1982) generalises that ‘in nearly all languages the completion of a grammatical unit such as a normal sentence is signaled by a falling pitch’. Bolinger (1978) also writes that ‘the most widely diffused intonational phenomenon seems to be the tendency to “go down at the end”’. These sorts of abrupt decreases which affect only the end of the utterance (known as FINAL LOWERING) are distinct from gradual decreases in fundamental frequency over the course of the entire utterance (known as DECLINATION). Bolinger (1978) notes the same distinction, characterising it as the difference between ‘a rapid downward motion at the very end, usually if not always associated with a terminal accent’ and ‘downward drift from a high beginning’. Final lowering as distinct from declination is documented in Japanese by Poser (1984) and by Pierrehumbert & Beckman (1988); in English by Liberman & Pierrehumbert (1984); in Dutch by Gussenhoven & Rietveld (1988); in Danish by Thorsen (1985); in Yoruba by Connell & Ladd (1990) and by Laniran (1992); and in Kikuyu by Clements & Ford (1981) (although not all authors use the exact terminology presented here). Analyses of final lowering range from attributing final lowering to changes in tonal categories (discussed below in §4) to attributing final lowering to compression of the pitch range in the last section of the sentence (discussed below in §5.1). Tone languages provide an interesting testing ground for analyses of final lowering. Careful experimental study, controlling for the position of a tone from the beginning and from the end of a sentence, is one way to begin to sort out the effects of various factors such as declination and final lowering on fundamental frequency.
Article
One of the most widespread – and widely studied – properties of speech fundamental frequency (Fo) is a tendency to decline gradually during the course of utterances. This tendency has been given a variety of names, of which the best known is probably DECLINATION. This is the term I shall use in this paper. The purpose of the paper is to review past work on declination as it affects the phonological and phonetic modelling of Fo contours, and to outline some ideas for the empirical resolution of the issues that emerge from the discussion.
Article
A great many languages of the world exhibit phenomena of FO DOWNTREND – phenomena whereby, other things being equal, the fundamental frequency (Fo) of the speaking voice declines over the course of an utterance. That much is uncontroversial; further details are either simply unknown or the subject of considerable debate. The purpose of the study reported here was to shed light on some of these unknown or uncertain matters by the controlled investigation of pitch realisation in Yoruba.
Article
Two variables: pitch accent and boundary tone in Chinese intonation have been found by acoustical analysis and listening test of echo questions in read speech and yes-no questions in spontaneous speech. Identification test is adopted to verify the acoustic manifestations of boundary tone, and to find out that the register of the ending-point (or the slope) of the F 0 curve in boundary tone plays a more important role than the register of its starting-point in differentiating between question and statement and the identification function about question and statement is not categorical, but continuous. It is advocated that features of boundary tone is "high " and "low". Pitch (F0) patterns of boundary tone in Standard Chinese is given. Whether tone-1, tone-2, tone-3, or tone-4, pitch pattern in the boundary tone with question keeps the citation form,. It is single-directionally and hierarchically that intonation acts upon on tones. In the pitch space of five-point values, intonation is represented mainly by register and range of the F0 curve, but tone is represented by its F0 contour.
Article
This study investigates F 0 declination in broadcast news speech in English and Mandarin Chinese. The results demonstrate a strong relationship between utterance length and declination slope. Shorter utterances have steeper declination even after excluding the initial rising and final lowering effects. Both topline and baseline show declination, but they are independent. The topline and baseline have different patterns in Mandarin Chinese, whereas in English their patterns are similar. Mandarin Chinese has more and steeper declination than English, as well as wider pitch range and more F 0 fluctuations.
Article
A hypothesis about the relationship between prosodic structure and discourse structure in English was tested by taking into consideration explicit models of both intonational structure and discourse structure in constructing the experimental corpus. In the experiment, a corpus of discourses with controlled structure was read by speakers. Matched pairs of utterances were extracted from different positions in the discourses. These pairs were played to listeners to test whether the discourse position was distinguishable. The utterances were then ToBI transcribed in order to test whether the same dominance relationships in the discourses are reflected by the same edge tones. Then, pairs which were perceptually distinct were examined for tonal similarity. Finally, pairs of utterances which were perceptually distinct and which had similar tones were analyzed acoustically for phonetic differences. Several trends emerged in the acoustic analysis, including differences in fundamental frequency, root mean square amplitude, and duration. These phonetic factors were thus bearing the functional load of indicating more global aspects of discourse structure.
Article
This article argues that a factor contributing to final lowering in intonation is that H tones preceding downstep are scaled higher than H tones not preceding downstep, all else being equal. Final lowering is argued to be (in part) the absence of this effect with the last element of a sequence of downstepped tones. The evidence comes from intonation data recorded with speakers of German from Southern Germany and Austria. In two structures, a sequence of downstepped prenuclear pitch accents PA1!PA2…!PAn−1 is continued with a nuclear pitch accent PAn that is not downstepped. In both structures, final lowering is found on the last downstepped pitch accent PAn−1, crucially in penultimate position of the intonation phrase. Here potential triggers for final lowering in the environment are not present (phrase-final position) or present only in one structure (a following L% boundary tone). The application of final lowering in these cases shows that there is a factor contributing to final lowering that affects accentual tones not followed by downstep, regardless of the environment.
Article
This study examines interacting factors in tone production in Yoruba, a tone language with three tone levels, high (H), mid (M), and low (L). Its primary goals are to confirm the existence of downstep, a principle which causes successive H tones separated by L tones to step down in pitch, and to examine the interaction between downstep and H tone raising, a principle which raises H tones to extra-high values before L tones. Controlled comparisons of data from four speakers reveal that both of these principles apply to H tones satisfying their conditions. As a result of H raising, the first H tone in downstepping sequences of the form HLHLH… is raised well above its expected value, while the following downstepped H tones are kept from descending into the frequency band reserved for M tones. This study also examines the strategies used for economizing pitch space in longer downstepping sequences. The main strategy used by all speakers is H tone resetting; however, some speakers are also found to raise initial H tones to extra-high values in anticipation of downsteps occurring four syllables away. Other interacting factors in Yoruba tone production include tone-specific declination (“downdrift”) operating in the background and local carry-over assimilation from a H tone to the following L tone. These various observations support a compositional model of tone production in which competing factors culminate on individual tones to produce functionally motivated “compromise” f0 patterns.
Article
A set of simple new procedures has been developed to enable the real-time manipulation of speech parame- ters. The proposed method uses pitch-adaptive spec- tral analysis combined with a surface reconstruction method in the time-frequency region, and an excita- tion source design based on group delay manipulation. It also consists of a fundamental frequency (F0) ex- traction method using instantaneous frequency calcu- lation based on a new concept called 'fundamental- ness'. The proposed procedures preserve the details of time-frequency surfaces while almost perfectly remov- ing fine structures due to signal periodicity. This close- to-perfect elimination of interferences and smooth F0 trajectory allow for over 600% manipulation of such speech parameters as pitch, vocal tract length, and speaking rate, while maintaining high reproduction quality.
Article
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 1980. MICROFICHE COPY AVAILABLE IN ARCHIVES AND HUMANITIES. Bibliography: leaves 246-253. by Janet Breckenridge Pierrehumbert. Ph.D.
Article
Thesis (Ph. D.)—Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1976. Includes bibliographical references (p. 322-332). This electronic version was scanned from a copy of the thesis on file at the Speech Communication Group. The certified thesis is available in the Institute Archives and Special Collections. National Institutes of Health (No. NS04332). Ph. D.
Article
Thesis. 1976. Ph.D.--Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Microfiche copy available in Archives and Engineering. Vita. Bibliography: leaves 403-416. Ph.D.
Article
Proefschrift : Sociale wetenschappen : Nijmegen : 1998.
Article
Supervised by Morris Halle. Thesis (Ph. D.)--Massachusetts Institute of Technology, 1984. Vita. Includes bibliographical references (leaves 352-373). Photocopy.