Article

Speed of Pitch Change

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In an attempt to determine the response characteristics of the larynx in voluntary pitch change, five adult male subjects were instructed to execute a variety of continuous pitch changes, as rapidly as possible, within the range 90–220 Hz. For a given pitch interval, there was a marked tendency for all upward pitch change to take longer than a downward pitch change. Also, unexpectedly, there was no marked tendency for a change involving a wide pitch interval to take longer than a change involving. a smaller interval. Speculations on the physiological reasons for these relations, as well as their possible relevance to the phonology of tone and intonation, will be offered. [Supported by the National Science Foundation and a University of California Faculty Research Grant.]

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Few people would actually assume that we can change pitch at whatever speed we want. However, the actual limit on speed of pitch change was seriously investigated earlier only twice, as far as I am aware, first by Ohala and Ewan [19] and then by Sundberg [24]. Both studies used similar methods -asking the subject to change from one pitch level to another as fast as possible. ...
... Because the model pitch undulation patterns are faster than what human speakers could achieve, there was virtually no steady-state plateaus in the F0 contours produced by the subjects, as shown in Figure 2. This permitted the measurement of the entire duration of each pitch shift as opposed to the time corresponding to only 75% of the pitch change as measured in previous studies [19], [24]. As it turned out, it took our subjects nearly twice as long to complete an entire pitch shift as to execute the middle 75% of the shift. ...
... Mandarin third tone sandhi, where T3 is converted to a rising tone when followed by another T3, may also lead to the confusion between T2 and T3 (Chao, 1951;Li and Thompson, 1977;Clumeck, 1980;Chen, 2000;Zhu and Dodd, 2000). In addition, rising tones may reflect greater physiological efforts than level and falling tones in production (Ohala and Ewan, 1973;Li and Thompson, 1977). These corroborate the result of previous studies that T1 and T4 were found to be successfully acquired earlier than T2 and T3 (Li and Thompson, 1977;Clumeck, 1980;Zhu, 2002). ...
... Even as a neutral tone, T3N can be realized as both mid-level and mid-rising. Additionally, the rising shape of T3N may take greater physiological efforts than the falling shape of T1N, T2N, and T4N (Ohala and Ewan, 1973). These reasons may partially explain why a large proportion of incorrect T3N tokens were mispronunciations. ...
Article
Full-text available
Speakers with autism spectrum disorder (ASD) are found to exhibit atypical pitch patterns in speech production. However, little is known about the production of lexical tones (T1, T2, T3, T4) as well as neutral tones (T1N, T2N, T3N, T4N) by tone-language speakers with ASD. Thus, this study investigated the height and shape of tones produced by Mandarin-speaking children with ASD and their age-matched typically developing (TD) peers. A pronunciation experiment was conducted in which the participants were asked to produce reduplicated nouns. The findings from the acoustic analyses showed that although ASD children generally produced both lexical tones and neutral tones with distinct tonal contours, there were significant differences between the ASD and TD groups for tone height and shape for T1/T1N, T3/T3N, and T4/T4N. However, we did not find any difference in T2/T2N. These data implied that the atypical acoustic pattern in the ASD group could be partially due to the suppression of the F0 range. Moreover, we found that ASD children tended to produce more errors for T2/T2N, T3/T3N than for T1/T1N, T4/T4N. The pattern of tone errors could be explained by the acquisition principle of pitch, similarities among different tones, and tone sandhi. We thus concluded that deficits in pitch processing could be responsible for the atypical tone pattern of ASD children, and speculated that the atypical tonal contours might also be due to imitation deficits. The present findings may eventually help enhance the comprehensive understanding of the representation of atypical pitch patterns in ASD across languages.
... The first issue is based on physiological considerations. The durational effects of tone reported by [3], {F,H} < {M,L}, are the opposite of what f0 control mechanisms would predict; as it has been reported that f0 falls are faster/shorter than f0 rises [6], [7]. If the effects of tone on duration are related to f0 control, the M and L tones, which only have a falling component, should be associated with shorter vowel durations. ...
... A more promising way to interpret our results is in terms of a transparent relationship between the durational effects of tone on vowels and f0 control mechanisms. It has been reported that, at least for untrained speakers, i.e., non-singers, f0 falls are produced faster than f0 rises [6], [7]. If the production of vowels and tones is temporally coordinated, so that their executions are time-locked or closely timed, our durational findings can be understood as follows. ...
Conference Paper
Full-text available
We investigated tonal effects on vowel duration in two experiments with 37 speakers of Bangkok Thai. For long vowels in open syllables, the pattern of tonal effects on vowel duration is {Mid,Low} < {Falling,High,Rising}; for closed syllables with short vowels and sonorant codas, the pattern is {Rising} < {Mid,Low} < {High} < {Falling}. Our results do not align with hypothesized universal patterns or with previous reports on Bangkok Thai. Our findings are better understood by referring to f0 control mechanisms. Finally, we found that the tonal effects are mediated by syllable structure in line with diachronic changes in vowel length. Index Terms: tone, vowel duration, vowel length, word duration, fundamental frequency, Thai
... Notably, while there were some individual variations in choosing different boundary tones, some 3 Note that we attempted to perform statistical modelling with tonal type as an additional predictor, especially considering that, among the complex tones, rising tones are generally expected to have a longer temporal extent than falling tones (e.g., Ohala & Ewan, 1973;Myers, 2003;Kentner, Franz, Knoop, & Menninghaus, 2023;Li, Kim, & Cho, 2023). However, we decided not to report the results for the following reasons. ...
Article
Full-text available
This study examined preboundary lengthening and other kinematic characteristics of articulatory gestures in CV.CV and CV.CVC before prosodic boundaries in Korean. Preboundary lengthening was found to be extended to initial syllables in both CV.CVand CV.CVC, while its magnitude was largest on the final syllable. The preboundary lengthening effect was also reflected in the time-to-peak velocity (acceleration duration), but only on gestures of the final syllable. Preboundary lengthening was accompanied by substantial increase in both displacement and peak velocity, showing domain-final articulatory strengthening. This articulatory strengthening effect on preboundary gestures (at the right edge of prosodic constituent) was largely dovetailed with the notion of an edge-prominence language where boundary marking is assumed to be closely related with prominence lending. These results were compared in two different conditions driven by information structure (‘new’ vs. ‘given’) and were discussed to understand the observed kinematic pattern in dynamical terms in the theoretical framework of the π-gesture model.
... It is important to note that in CD the tones undergoing tone sandhi (T1 and T4) are rising, whereas those without any tone sandhi (T2 and T3) are falling. The reason for the different tone sandhi behavior between T1 and T4 on the hand and T2 and T3 on the other in CD, we suggest, is that it takes more time to realize the former two than the latter two, since pitch lowering is faster than pitch elevation [5] [6] [7]. ...
... Note that in CD the tones undergoing tone sandhi (T1, T4) are rising, whereas those without any tone sandhi (T2, T3) are falling. The reason for the different tone sandhi behavior between T1 and T4 on the hand and T2 and T3 on the other in CD, we suggest, is that it takes more time to realize the former two than the latter two, since pitch lowering is faster than pitch elevation [9,10,11]. Then why is T3 changed to a high level tone on the initial syllable of a PW, which should be long enough for it to fully realize its underlying high falling tone? We assume that this is caused by a constraint in CD forbidding a high falling tone PW-initially. ...
... On the other hand, little f 0 movement is expected for a T22 target tone in the same context. Given that dynamic tones take longer to realize than non-dynamic ones (Ohala and Ewan, 1973;Xu and Sun, 2002), the greater degree of f 0 movements in a target tone in the fixed-context condition, like in the case of a T55 flanked by two T22 tones, might require the speaker to take more time to realize the target tone than in a tonal context where little context-induced f 0 movement is expected (e.g., a T55 flanked by two T55 tones). Such durational adjustments induced by the tonal context might obfuscate the underlying relationship between tone and duration. ...
Article
Full-text available
Phonetic typological studies suggest that syllable duration is inversely correlated with the accompanying tone's approximate average f 0 , and tones with dynamic f 0 movement tend to be in longer syllables rather than shorter ones. Systematic instrumental investigations on tone-duration interaction remain scant, however; existing studies might be confounded as tonal context may impact duration realization due to phonetic constraints on tonal movement. This study investigates the effect of tonal environment on the durational realization of tones in Cantonese, showing that tone-dependent duration variation is governed by the tonal context. Implications of these findings for existing phonetic typology concerning tone-duration interaction are discussed.
... Since producing pitch excursions takes time, syllables bearing boundary tones are expected to be longer than syllables not bearing such tones. In addition, rising contours have been found to be significantly longer in duration than falling ones (Myers, 2003;Ohala & Ewan, 1973;Sundberg, 1979;Xu & Sun, 2002). Moreover, the pitch range that speakers can exploit for producing tonal events decreases as the sentence or utterance progresses (the so-called declination effect; see Cohen, Collier, & 't Hart, 1982;Collier & Gelfer, 1983;Ladd, 1984). ...
Article
Full-text available
Phrase-final syllable duration and pauses are generally considered to be positively correlated: The stronger the boundary, the longer the duration of phrase-final syllables, and the more likely or longer a pause. Exploring a large sample of complex literary prose texts read aloud, we examined pause likelihood and duration, pre-boundary syllable duration, and the pitch excursion at prosodic boundaries. Comparing these features across six predicted levels of boundary strength (level 0: no break; 1: simple phrase break; 2: short comma phrase break; 3: long comma phrase break; 4: sentence boundary; 5: direct speech boundary), we find that they are not correlated in a simple monotonic fashion. Whereas pause duration monotonically increases with boundary strength, both pre-boundary syllable duration and the pitch excursion on the pre-boundary syllable are largest for level-2 breaks and decrease significantly through levels 3 to 5. Our analysis suggests that pre-boundary syllable duration is partly contingent on the tonal realization, which is subject to f0 declination as the utterance progresses. We also surmise that pre-boundary syllable duration reflects differences in planning complexity for the different prosodic and syntactic boundaries. Overall, this study shows that a simple monotonic correlation between pause duration and pre-boundary syllable duration is not valid.
... How quickly is too quickly for F0 to change across successive samples? To answer this question, we draw on speech production research that quantifies the maximum rate of change in F0 during speech production (Ohala and Elwan, 1973;Sundberg, 1973;Xu and Sun, 2002). The method of eliciting rapid F0 changes in these studies is to ask speakers to produce an oscillating glissando between high and low F0. ...
Article
Full-text available
An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measures, is introduced. The method computes sample-to-sample differences in F0 and, based on a user-defined threshold, determines whether a difference is larger than naturally produced F0 velocities, thus, flagging it as an error. Various parameter settings are evaluated on a corpus of 30 American English speakers producing different intonational patterns, for which F0 tracking errors were manually checked. The paper concludes in recommending settings for the algorithm and ways in which it can be used to facilitate analyses of F0 in speech research.
... As said at the end of section 1, our perceptual compensation theory presupposes independent explanations for durational differences in the production of f 0 contours. The shorter duration of falls than rises has been widely discussed and investigated since [12]. No explanation has been proposed for the long duration of low tones than that of high tones. ...
... Peak delay occurs when the peak f0 of a high tone is delayed onto the beginning of the following syllable instead of being realized on the syllable to which it is associated. This delay has been attributed to the longer time required for rises in f0 than for falls (Ohala, 1972;Ohala & Ewan, 1973). ...
... Recall that low tone syllables bearing the Medʉmba high phrase accent are realized as a lowhigh rising tone. It is interesting to note, given the duration differences noted above, that rising tones are also associated with longer duration cross-linguistically due to the fact that they take longer to articulate than level or falling tones (Ohala & Ewan, 1973;Sundberg, 1979;Xu & Sun, 2002). In a number of tone languages, rising contours also appear to be relegated to phrase-final position (Coupe, 2007;Goldsmith, 1988;Michaud & Vaissière, 2015; see also Ou & Guo, 2020), perhaps owing to the facilitative effects of prosodic boundary lengthening on the production of rising tones. ...
Article
Characterizing prosodic prominence relations in African tone languages is notoriously difficult, as typical acoustic cues to prominence (changes in F0, increases in intensity, etc.) can be difficult to distinguish from those which mark tonal contrasts. The task of establishing prominence is further complicated by the fact that tone, an important cue to syllable prominence and prosodic boundaries cross-linguistically, plays many roles in African languages: tones often signal lexical contrasts, can themselves be morphemes, and can also interact in key ways with prosody. The present study builds on phonological generalizations about tonal patterns in Medʉmba, a Grassfields Bantu language, and uses the speech cycling paradigm to investigate relative timing of syllables varying in phrase-level prominence. Specifically, we investigate timing asymmetries between syllables hypothesized to occur at the edge of a phonological phrase, which carry a high phrase accent, and those in phrase-medial position, which do not. Results indicate significant differences in the temporal alignment of accented versus non-accented syllables, with accented syllables occurring significantly closer to positions established as prominence-attracting in previous speech cycling research. We show that these findings cannot be attributed to differences in tone alone. Findings demonstrate the importance of relative temporal alignment as a correlate of prosodic prominence. Findings also point to increased duration as a phonetic property which distinguishes between syllables bearing phrasal prominence from those which do not.
... Nous employons FIGURE 1 -caractéristiques de la ligne de déclinaison : baisse globale au cours d'une séquence, mouvements descendants et montants au niveau local entre deux lignes globalement descendantes(ligne supérieure et inférieure), resetting de la F0 entre deux séquences le terme séquence par la suite en essayant de nous approcher de la notion de la phrase qui relie l'unité de sens et unité intonative et à laquelle la déclinaison donne une notion de cohérence (Cruttenden, 1987;Jun et Fougeron, 2000;Vaissière, 1983). La tendance globale de la F0 à décliner est liée à la pression sousglottique (Lieberman, 1967), à la traction de la trachée (Maeda, 1976) et aux mouvements des muscles laryngés (Ohala et Ewan, 1973). Pourtant certaines incertitudes subsistent : l'aspect de la déclinaison est-il dépendant de la langue ou est-il controlé par le locuteur ? ...
Conference Paper
Full-text available
L'objectif de cette étude est d'explorer la déclinaison de la F0 au cours de séquences comprises entre pauses en français et en allemand à l'aide de grands corpus journalistiques transcrits et segmentés automatiquement (au total environ 80.000 séquences de plus de 1000 locuteurs). Deux méthodes différentes ont été appliquées : (i) une analyse de régression simple pour calculer la déclinaison globale de la F0 et (ii) un algorithme de type convex hull afin de localiser les pics et les vallées de F0 et ainsi obtenir un contour des lignes inférieures et supérieures. Les résultats montrent des aspects communs aux deux langues : La tendence globale de la F0 à baisser d'environ 2,5 st par seconde ainsi que des prédicteurs communs pour l'amplitude de la pente, tels que la durée de la séquence et la valeur du resetting, de l'intercept et du pic le plus haut. Néanmoins nous constatons une partie de la pente propre à chaque langue dans les mouvements des lignes supérieures et inférieures. ABSTRACT F0-declination : a comparison between French and German journalistic speech The aim of the present study is to investigate F0-declination over the course of utterances in French and German journalistic speech by using large transcribed and automatically segmented corpora (a total of about 80,000 utterances of more than 1,000 speakers). Two different methods were applied : (i) regression-analysis in order to calculate the overall downtrend of F0 and (ii) convex-hull to detect local peaks and valleys in order to calculate the top-and bottom lines. The results show similar characteristics for both languages of the slope : there is an overall declining tendency for the F0 of about 2.5 st per second as well as the same predictors for the amplitude of the slope like utterance duration and the F0-value of the resetting, the intercept and the highest peak. Nevertheless we found language-specific parts of the slope in the mouvements of the top-and bottom lines. MOTS-CLÉS : intonation, ligne de déclinaison, F0, régression, modelisation, inter-langue, resetting.
... Producing a rise in pitch generally takes longer than a fall of an equivalent excursion. Moreover, it has been argued that there are physiological limitations on the maximum speed of pitch change (Ohala & Ewan, 1973, Sundberg, 1979, Xu & Sun, 2002, restricting high/rising tones more than low/falling tones. Additionally, there are known correlations between pitch level / pitch movement and perceived segment duration. ...
Article
Full-text available
In recent years there has been increasing recognition of the vital role of intonation in speech communication. While contemporary models represent intonation—the tune—and the text that bears it on separate autonomous tiers, this paper distils previously unconnected findings across diverse languages that point to important interactions between these two tiers. These interactions often involve vowels, which, given their rich harmonic structure, lend themselves particularly well to the transmission of pitch. Existing vowels can be lengthened, new vowels can be inserted and loss of their voicing can be blocked. The negotiation between tune and text ensures that pragmatic information is accurately transmitted and possibly plays a role in the typology of phonological systems.
... ms, sd = 37.31) and H (203.52 ms, sd = 54.32) tone, while being equivalent to that of LH (229.25 ms, sd = 55.99), which has been shown to be longer than falling shapes for intrinsic reasons (Ohala & Ewan 1973, Xu & Sun 2002. That is, even if the extra duration is explained as a by-product of the flanking movements, part of the lengthening of L is independent of the f0 contour, as observed with reference to Standard Mandarin Tone 3 by Gussenhoven & Zhou (2013). ...
Article
Full-text available
A multi-speaker acoustic study on citation tones in Kaifeng Mandarin, referred to as LH, HL, H and L, shows that L is realized as three different subtypes by different speakers, i.e. dipping, falling and falling with lengthening, while generally being longer than the other three tones and frequently spoken with creaky voice in part of the vowel. This inter-speaker variation is reflected in the different transcriptions of Kaifeng L that have been given in the literature. We argue that a L-tone is intrinsically less salient than a H-tone, due to a lack of phonetic space in the low pitch range as well as to a potential ambiguity between contextual low pitch around f0 peaks and low pitch due to L-tones, and thus more likely to be enhanced.
... Generally it takes some time to fully realize tonal targets. What needs to be taken into account for T2 (rising tone) and T4 (falling tone) is that a speaker needs more time to raise the pitch than to lower it [31][32][33]. Last but not the least factor is the duration of silence between syllables that frequently break the specific pattern of tonal coarticulation. ...
... Producing a rise in pitch generally takes longer than a fall of an equivalent ex- cursion. Moreover, it has been argued that there are physiological limitations on the maxi- mum speed of pitch change (Ohala & Ewan, 1973, Sundberg, 1979, Xu & Sun, 2002), re- stricting high/rising tones more than low/falling tones. Additionally, there are known correla- tions between pitch level / pitch movement and perceived segment duration. ...
Preprint
In recent years there has been increasing recognition for the vital role of intonation in speech communication. While contemporary models of intonation represent intonation – the tune – and the text that bears it on separate autonomous tiers, this paper distils previously uncon-nected findings across diverse languages that point to important interactions between these two tiers. These interactions often involve vowels, which, given their rich harmonic structure, lend themselves particularly well to the transmission of pitch. Existing vowels can be length-ened, new vowels can be inserted and their loss of voicing can be blocked. The negotiation between tune and text ensures that pragmatic information is accurately transmitted and pos-sibly plays a role in the typology of phonological systems.
... More complex tunes (rise-fall-rise) need more time to be realised than simple tunes (fall), thus schwa is more likely to be inserted in questions than in statements, and if it is inserted, it is longer. Likewise, rising tunes, all other things being equal, take longer to execute than falling tunes (Ohala and Ewan, 1973;Xu and Sun, 2002), thus schwa is more likely to be needed in list items bearing rising tunes (non-final and prefinal) than those bearing falling ones (final position). The pressure to insert a schwa is less acute in disyllabic words, possibly accounting for the mixed picture in the disyllabic list data set. ...
Article
In order to convey pragmatic functions, a speaker has to select an intonation contour (the tune) in addition to the words that are to be spoken (the text). The tune and text are assumed to be independent of each other, such that any one intonation contour can be produced on different phrases, regardless of the number and nature of the segments they are made up of. However, if the segmental string is too short, certain tunes—especially those with a rising component—call for adjustments to the text. In Italian, for instance, loan words such as “chat” can be produced with a word final schwa when this word occurs at the end of a question. This paper investigates this word final schwa in the Bari variety in a number of different intonation contours. Although its presence and duration is to some extent dependent on idiosyncratic properties of speakers and words, schwa is largely conditioned by intonation. Schwa cannot thus be considered a mere phonetic artefact, since it is relevant for phonology, in that it facilitates the production of communicatively relevant intonation contours.
... More complex tunes (rise-fall-rise) need more time to be realised than simple tunes (fall), thus schwa is more likely to be inserted in questions than in statements, and if it is inserted, it is longer. Likewise, rising tunes, all other things being equal, take longer to execute than falling tunes (Ohala & Ewan 1973, Xu & Sun 2002, thus schwa is more likely to be needed in list items bearing rising tunes (non-final and pre-final) than those bearing falling ones (final position). The pressure to insert a schwa is less acute in disyllabic words, possibly accounting for the mixed picture in the disyllabic list data-set. ...
Preprint
Full-text available
In order to convey pragmatic functions, a speaker has to select an intonation contour (the tune) in addition to the words that are to be spoken (the text). The tune and text are assumed to be independent of each other, such that any one intonation contour can be produced on different phrases, regardless of the number and nature of the segments they are made up of. However, if the segmental string is too short, certain tunes-especially those with a rising component-call for adjustments to the text. In Italian, for instance, loan words such as "chat" can be produced with a word final schwa when this word occurs at the end of a question. This paper investigates this word final schwa in the Bari variety in a number of different intonation contours. Although its presence and duration is to some extent dependent on idiosyncratic properties of speakers and words, schwa is largely conditioned by intonation. Schwa cannot thus be considered a mere phonetic artefact, since it is relevant for phonology, in that it facilitates the production of communicatively relevant intonation contours.
... Whereas lengthening is perceptually-driven, [a]-reduction is best explained in articulatory terms. There is a substantial body of evidence showing that rising tones are better expressed on longer vowels (Ohala & Ewan, 1973;Gandour, 1977;Zhang, 2002Zhang, , 2004. Since low vowels are intrinsically longer, they make better carriers of tone than mid or high vowels. ...
Book
This book develops a theory of the interaction of phonological tone with segmental quality. Some authors explicitly deny a possibility of a systematic phonological interrelation between tone and sonority (Hombert et al., 1979; Hombert, 1977; Schuh, 1978; Fox, 2000; de Lacy, 2007). In particular, it has been suggested that in many cases of tone-vowel interactions, tone affects vocalic quality indirectly through syllable structure, foot structure or duration (Jiang-King, 1999; Gussenhoven & Driessen, 2004; Kehrein, to appear; Köhnlein, to appear). The present book takes an opposite stand and argues that tone can interact directly with vowel quality without the mediating factors such as syllable structure or duration. This assumption is substantiated by the analysis of vowel neutralisation in East Slavic.
... A common secondary use of the Frequency Code is increased final syllable lengthening, at the expense of the other syllables in the phrase (e.g. Abolhasanizadeh, Bijankhan, and Gussenhoven 2012), due to the fact that rising pitch takes longer to produce than level or falling pitch (Ohala and Ewan 1973;Xu and Sun 2002). In turn, the shortening of non-final syllables may cause questions to have faster speech rates, as found by van Heuven and van Zanten (2005). ...
... In the perspective of speech production, there would be automatic by-products of respiratory activities, such as a drop in subglottal air pressure, decreased lung volume, and lowered sternum and larynx, which cause decreased vocal fold tension and F 0 fall. Ohala andEwan (1973)suggested that it is more difficult to produce an F 0 rise than a fall, because the laryngeal gesture requires more effort and takes more time, which is in line with the explanation of the " laziness principle " . Therefore, to maintain or accelerate a rising pitch to convey doubt seems to be an effortful way to communication. ...
Article
Previous intonational research on Mandarin has mainly focused on the prosody modeling of statements or the prosody analysis of interrogative sentences. To support related speech technologies, e.g., Text-to-Speech, the quantitative modeling of intonation of interrogative sentences with a large-scale corpus still deserves attention. This paper summarizes our work on the quantitative prosody modeling of interrogative sentence in Mandarin. A large-scale natural speech corpus was used in this study. By extracting the pitch contours and fitting the intonation curves, we found that F0 declination and final lowering both existed in interrogative sentences, while they were claimed to be absent in Mandarin in some previous studies. In addition, the declination function could be modeled linearly, and the bearing unit of final lowering in Mandarin was found to be the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range. It was argued in this study that the difference between this finding and the commonly believed rising intonation of the interrogative sentences resulted from the nonlinear relationship between prosody production and perception. The underlying mechanism for the existence of F0 declination and final lowering in interrogative sentences is also discussed.
... Our approach to the annotation of pitch for this study is illustrated in figure 3. As established in previous research into physiological limits of pitch production (Ohala and Ewan, 1973;Sundberg, 1979;Xu and Sun, 2002), a complex pitch pattern can be decomposed in several stages which, for a fall, includes a deceleration phase, a fast glide and ends in a low plateau (cf. fig. ...
Article
Russian and German have been previously described as 'truncating', or cutting off target frequencies of the phrase-final pitch trajectories when the time available for voicing is compromised. However, supporting evidence is rare and limited to only a few pitch categories. The present paper reports a production study conducted to document pitch adjustments to linguistic materials in which the amount of voicing available for the realization of a pitch pattern varies from relatively long to extremely short. Productions of nuclear H+L*, H* and L*+H pitch accents followed by a low boundary tone were investigated in the two languages. The results of the study show that speakers of both 'truncating languages' do not exclusively utilise truncation when accommodating to different segmental environments. On the contrary, they employ several strategies – among them is truncation but also compression and temporal realignment – to produce the target pitch categories under increasing time pressure. Given that speakers can systematically apply all three adjustment strategies to produce some pitch patterns (H* L% in German and Russian) while not using truncation in others (H+L* L% particularly in Russian), we question the usefulness of the typological classification as 'truncating' for these two languages. Moreover, phonetic detail of truncation varies considerably both across and within the two languages, indicating that truncation cannot easily be modelled as a unified phenomenon. The results further suggest that the phrase-final pitch adjustments are crucially sensitive to the phonological composition of the tonal string and the status of a particular tonal event (associated vs. boundary tone), and do not apply to falling vs. rising pitch contours across the board, as previously put forward for German. Implications for the intonational phonology and prosodic typology are addressed in the discussion.
... MH < 3 semitones; LM < 2 semitones). This is not surprising, given the well-known phenomenon of f0 declination (Cohen, Collier and 't Hart 1982) and the fact that upward pitch change tends to take longer than a downward pitch change for a given pitch interval (Ohala and Ewan 1973). Thus, the order of tones should be taken into account when investigating the height relationship between two tones. ...
Article
Full-text available
While previous studies on the speaker-discriminatory power of static f0 parameters abound, few have focused on the dynamic and linguistically structured aspects of f0. Lexical tone offers a case in point for this endeavour. This article reports an exploratory study on the speaker-discriminatory power of individual lexical tones and of the height relationship of level tone pairs in Cantonese, and the effects of voice level and linguistic condition on their realisation. Twenty native Cantonese speakers produced systematically controlled words either in isolation or in a carrier sentence under two voice levels (normal and loud). Results show that f0 height and f0 dynamics are separate dimensions of a tone and are affected by voice level and linguistic condition in different ways. Moreover, discriminant analyses reveal that the contours of individual tones and the height differences of level tone pairs are useful parameters for characterising speakers.
... Rejecting these explanations, we suggest that penultimate shortening may apply in interrogatives as a result of an utterance-internal higher tempo implemented by way of compensation for longer final syllables. The longer final syllable itself can be explained by the fact that rising pitch takes longer to produce than falling pitch (Ohala & Ewan 1973), a durational difference that may be maintained in the absence of any actual pitch differences (Smith 2002). ...
Article
Full-text available
Varieties of Malay, including Indonesian, have been variously described as having word stress on the penultimate syllable, as having variable word stress and as having a phrase-final pitch accent without word stress. In Ambonese Malay, the alignment of sentence-final pitch peaks fails to support the existence of either word stress or phrase-final pitch accents. Also, the shape of its pitch peaks fails to vary systematically with the information status of the phrase-final word. The two intonation melodies of the language include phrase-final boundary-tone complexes which do not associate with any syllables. The declarative rise-fall would appear to be timed so as to occur within the last word of the sentence. Minimal stress pairs presented in earlier descriptions show a contrast between /a/ and a segmentally distinct weak /ă/, a contrast that also appears in positions that have not been claimed to have stress. A preliminary phonological analysis concludes the account.
... Peak delay occurs when the peak f0 of a high tone is delayed onto the beginning of the following syllable instead of being realized on the syllable to which it is associated. This delay has been attributed to the longer time required for rises in f0 than for falls (Ohala, 1972;Ohala & Ewan, 1973). ...
Article
Abstract In Hanoi Vietnamese, the rising and falling tones are frequently confused before the (high) level tone, even though they are clearly distinct in other contexts. In this paper, we conduct production and perception experiments designed to assess the source of this confusion. We argue that the peak of the rising tone is normally delayed onto the initial portion of the following tone, but that this peak delay lacks acoustic and perceptual salience when this following tone is a level tone. As a result, the rising tone before a level tone is often perceived as a falling tone. Although we rule out the possibility that the tonal pattern under investigation is a phonological alternation, we propose that the complex coarticulatory and perceptual mechanisms that underlie it could account for the development of other instances of regressive tone sandhi.
... [N] accent position. A position of an accent on a frequency scale represented by number N expressed in ST (N = 3, 6,9,12,15,18,21,24,27,30,33,36). Semitones are related to F min = 60 Hz. ...
... As said at the end of section 1, our perceptual compensation theory presupposes independent explanations for durational differences in the production of f 0 contours. The shorter duration of falls than rises has been widely discussed and investigated since [12]. No explanation has been proposed for the long duration of low tones than that of high tones. ...
Article
Full-text available
The shape of pitch contours has been shown to have an effect on the perceived duration of vowels. For instance, vowels with high level pitch and vowels with falling contours sound longer than vowels with low level pitch. Depending on whether the comparison is between level pitches or between level and dynamic contours, these findings have been interpreted in two ways. For inter-level comparisons, where the duration results are the reverse of production results, a hypercorrection strategy in production has been proposed [1]. By contrast, for comparisons between level pitches and dynamic contours, the longer production data for dynamic contours have been held responsible. We report an experiment with Dutch and Chinese listeners which aimed to show that production data and perception data are each other's opposites for high, low, falling and rising contours. We explain the results, which are consistent with earlier findings, in terms of the compensatory listening strategy of [2], arguing that the perception effects are due to a perceptual compensation of articulatory strategies and constraints, rather than that differences in production compensate for psycho-acoustic perception effects.
... Falling tones are less marked than rising tones. This is also supported, first, by the fact that falling tones are more common in Chinese tonal inventories (Cheng 1973); and second, through the studies of Ohala & Ewan (1973) and Sundberg (1973), which show rising tones taking longer to produce (and perceive; Ohala 1978) than falling ones. ...
Article
This study examines the tonal chain shifts in pre-neutral toned syllables of Jiaoxian compound words, using the framework of Optimality Theory. The tonal chain shifts are motivated by reducing articulatory effort according to the tonal markedness scale. When a tone shifts to a more marked one, it is because of the anti-neutralization nature, which can be captured by Preserving Contrasts (PC). PC keeps feature contrast in the output. The maintenance of PC comes from transderivational anti-faithfulness, and the output-to-output anti-faithfulness constraints evaluate a pair of related words and require dissimilarity between them.
Article
This paper is the first to explore tonal phonotactics in the world's natural languages. Zhangzhou Southern Min is theoretically assumed to have 7320 possible syllables but more than 71% of them are not empirically attested. Each lexical tone is logically possible to generate 915 syllables; however, the attested number only ranges from 98 syllables under tone 8 to 392 under tone 1. This study bases on a large corpus to explore how individual tones behave in the formation of attestable syllables, in what way tonal phonotactics occur and what mechanisms have trigged phonotactic constraints from both synchronic and diachronic factors. This study substantially stretches and advances our knowledge of tonal phonotactics as an important phonology phenomenon in this language. The exploration is supposed to serve as a model for thorough investigations of tonal phonotactics in Sinitic languages, shedding important light on the generalization of areal characteristics in Asia that possess rich and complex tonal contrasts. The study also contributes vital linguistics data to the typology of phonotactics in human languages, while enlightening a research direction of using experimental methods to model phonotactic restrictions in speakers' mental grammar and language practice.
Article
East Tusom is a Tibeto-Burman language of Manipur, India, belonging to the Tangkhulic group. While it shares some innovations with the other Tangkhulic languages, it differs markedly from “Standard Tangkhul” (which is based on the speech of Ukhrul town). Past documentation is limited to a small set of hastily transcribed forms in a comparative reconstruction of Tangkhulic rhymes ( Mortensen & Miller 2013 ; Mortensen 2012 ). This paper presents the first substantial sketch of an aspect of the language: its (descriptive) phonetics and phonology. The data are based on recordings of an extensive wordlist (730 items) and one short text, all from one fluent native speaker in her mid-twenties. We present the phonetic inventory of East Tusom and a phonemicization, with exhaustive examples. We also present an overview of the major phonological patterns and generalizations in the language. Of special interest are a “placeless nasal” that is realized as nasalization on the preceding vowel unless it is followed by a consonant, and numerous plosive-fricative clusters (where the fricative is roughly homorganic with the following vowel) that have developed from historical aspirated plosives. A complete wordlist, organized by gloss and semantic field, is provided as appendices.
Thesis
This project concerns a language documentation project covering the Mixtepec-Mixtec variety of Mixtec (ISO 639-3: mix). Mixtepec-Mixtec is an Oto-Manguean spoken by roughly 9000- 10000 people in San Juan Mixtepec Municipality in the Juxtlahuaca district of Oaxaca, Mexico and by several thousand speakers living in Baja California, Tlaxiaco, Santiago Juxtlahuaca. There are also significant populations in the United States, most notably in California, around Santa Maria and Oxnard, as well as in Oregon, Florida, and Arkansas.The core facets of the work are: the creation a body of linguistic resources for the MIX language and community; the evaluation the current tools, standards and practices used in language documentation; an account of how the TEI and related XML technologies can be used as the primary encoding, metadata, and annotation format for multi-dimensional linguistic projects, including under-resourced languages. The concrete resources produced are: a multilingual TEI dictionary; a collection of audio recordings published and archived on Harvard Dataverse; a corpus of texts derived from a combination of spoken language transcriptions and texts encoded and annotated in TEI, as well as linguistic and lexicographic descriptions and analyses of the Mixtepec-Mixtec language.Due to the array of different data and resources produced, this project has components that equally fall within the fields of: digital humanities, language documentation, language description and corpus linguistics. Because of this overlapping relevance, over the processes of attempting to carry out this work in line with best practices in each sub-field, this work addresses the need to further bring together the intersecting interests, technologies, practices and standards relevant to, and used in each of these related fields.
Chapter
It is a well‐known truism that no utterance is ever produced in a strict monotone; all utterances, in all languages, show some pitch modulation. Such changes in pitch – impressionistically described as rises and falls – are due to changes in fundamental frequency or F0, the physical property of the speech signal that is determined by the basic rate of vibration of the vocal folds and gives rise to the percept of pitch.
Article
This is the first variationist sociotonetic study to use free-speech data for exploring tone. Due to the challenges of analyzing tone in free-speech data, prior work on sociotonetics has been limited to relatively formal speech styles: word lists, sentence frames, and phrase lists. But connected speech styles, including free speech and reading passages, are important for segmental sociophonetics and most other linguistic variables. Will free-speech data always be out of reach for sociotonetics? Can tone variation in connected speech data be normalized and meaningfully analyzed for sociolinguistic research questions? Using field data from the Sui language of China, this paper develops a practical approach for analyzing tone variation in connected speech data, and then applies it to a specific research question about dialect contact in exogamous Sui villages. Results show that some types of intra- and inter-speaker tone variation in connected speech can be effectively analyzed, although other types of tone variables are neutralized in this speech style.
Conference Paper
Full-text available
Otomanguean languages, spoken in Southern Mexico, are well-known for possessing complex lexical tone inventories (DiCanio & Bennett 2019). Contrastive lexical tone is reconstructed within Proto-Otomanguean, which was spoken approximately 4,000-6,000 years ago (Rensch 1976). Even at the sub-family level 3 level tones are reconstructed, e.g. Proto-Mixtecan possessed three level tones (Longacre 1957) and was spoken approximately 2,000 years ago (Josserand 1983). Though a growing body of research has investigated the recent evolution of incipient tones (tonogenesis), such as in Afrikaans (Coetzee et al 2018), Korean (Silva 2006), Kammu (Svantesson & House 2009), Kurtöp (Hyslop 2009), and Tamang (Mazaudon & Michaud 2008), the greater time depth and diversification of tone in Otomanguean largely prevents researchers from investigating its phonetic precursors. Yet despite the focus on incipient tonogenesis in the recent literature, tones in Otomanguean languages continue to diversify and their conditioning environments can be synchronically observed by examining low-level variation in production in closely related language varieties. In this talk, I examine how processes of speech reduction co-occur with patterns of tonal variation within two distantly-related Mixtecan languages (Yoloxóchitl Mixtec and Itunyoso Triqui). In the first study, we examined the effects of prosodic boundaries on tone production in Yoloxóchitl Mixtec. We observed two of the rising tones (/13, 14/) to be variably leveled (to /2/ and /3/, respectively). Though leveling is pervasive in tone languages (cf. Hyman 2007), for certain speakers this particular process applies in both reduced, durationally short contexts, which tend to disfavor rising tones (Zhang 2004), and long duration contexts, which do not impede contour tone production. Comparative data from a closely-related language (Alcozauca Mixtec) shows a later stage of this leveling process where cognates with the low rising tone /13/ have been leveled to /2/ (Mendoza Ruiz 2016). The Yoloxóchitl pattern closely resembles the process by which reduced speech variants may begin to occur in non-reduced contexts, leading to sound change (cf. Parrell and Narayanan 2018). In the second study, we examined variation in the production of the 2nd person singular (2S) clitic in a spoken corpus of texts in Itunyoso Triqui. This clitic is variably produced as either /=ɾeʔ1/ or /=ɾ/, the latter missing the syllable's rime. Depending on the stem tone, the 2S clitic either induces (a) no stem tonal change, (b) pre-clitic tonal raising, or (c) low tone spreading (DiCanio 2016). The corpus data show the reduced clitic to be more frequent specifically in contexts where it conditions tonal changes on the preceding stem. This finding suggests that morphologically- conditioned tonal change facilitates segmental reduction. If speakers can predict the clitic from its effect on the stem's tone, reducing its segmental content is less detrimental to interpreting the morphological content of the utterance. In fact, a parallel and more advanced pattern is observed in the closely related Chicahuaxtla Triqui language, where the 2S clitic has been morphologized as /-t/ (Hernández Mendoza 2017). Together these findings demonstrate how tonal change may be induced by durational contraction/reduction (c.f. Cheng & Xu 2015) and redundancy of morphological cues (multiple exponence). While these triggers are not specific to tonal change, when they occur in languages with elaborated tonal systems as we observe in Otomanguean, they produce tonal variation and change leading to greater tonal diversification in the family.
Chapter
This chapter discusses the interaction between segments and tone. It discusses the effect of the prevocalic consonant types on the fundamental frequency and the pitch of the neighboring vowel. The chapter also discusses how the vowel quality affects the fundamental frequency and pitch of a vowel. A major motivation for the research on these questions is the attempt to discover an explanation for widely attested historical changes. An understanding of intrinsic effects may provide such an explanation implying that the acoustic signal intended by the speaker may become distorted by the time it is perceived by the listener and that such distortions may give rise to changes over time. The chapter discusses both the relevant production phenomena and the acoustic (auditory) or perception phenomena, both being important to account for what has been observed in historical change.
Article
This paper reports the results of an acoustical study of the relation between vowel duration and the six tones of Cantonese. It is found that of the six tones, the longest is the high rising tone. of the four level tones, the mid level tone is the longest, the mid-low level tone is intermediate, and the high level tone and low level tone are the shortest. When the fundamental frequencies are analysed and related to the vowel length, it is found that there is a mid point in the F0 range from the high level tone to the low level tone, in which the vowel of that tone is the longest, from which the higher the F0, the shorter is the vowel, and from which the lower the F0, the shorter is the vowel of that tone.
Article
Full-text available
It is well known, that tones of Lithuanian language for the first time have been described and illustrated using musical notes in the mid of 19th century by Friedrich Kurschat (lt. Frydrichas Kuršaitis). No single attempt was made in order to analyze in-depth what specific prosodic features were covered by this material since the article of prof. A. Girdenis appeared in 2008. He concluded, that tone distribution is generally based on pitch range and contour differences (i.e. rising vs falling tone opposition). An alternative way of reconstruction proposed here reveals additional features which should be taken in to account as well. Measurements of prosodic parameters (i.e. pitch range, shape and vowel length) show that tones differ in terms of rate of pitch change and duration. Rising tone (or to be more precise – sustained tone) can be defined as having longer duration and less intensive rate of pitch change while the falling tone has opposite parameters – shorter duration and abrupt, steep pitch change. The phonetic nature of Lithuanian language tones therefore should be considered as a complex one.
Chapter
Between speaker variations in the rate of vibration of the vocal cords (pitch) are associated with differences in the size of the vocal tract and, thereby, with differences in the formant frequencies. Pitch is considered incidental to production because the same speaker can speak at different pitches and produce intelligible productions with no vocal cord vibration (whispering). The former observation assumes that subjects do not move their articulators when speaking at different pitches and the latter that there is no pitch in whispered speech. Subjects can, however, whisper at different pitches. In an experiment subjects produced vowels at different pitches and acoustic analyses were performed to determine the speech output and articulatory configurations used to produce them. The implications of the results for the nature of the specified target in speech are discussed.
Chapter
This chapter presents a survey and research on the phonetics of tone perception and discusses its implications for models of tone perception and linguistic theory in general. It is generally assumed that the principal phonetic features of tone are found in the domain of pitch. The term tone (linguistic) refers to a particular way in which pitch is utilized in language; the term pitch (nonlinguistic, perceptual), on the other hand, refers to how a hearer places a sound on a scale going from low to high without considering the physical properties of the sound. Its primary acoustic correlate is fundamental frequency. The term fundamental frequency (acoustic) refers to the frequency of repetition of a sound wave of which, when analyzed into its component frequencies, the fundamental is the highest common factor of the component frequencies. The function of the ear is to receive the acoustic signal, convert it to electro-chemical energy, and transmit the signal via nerve impulses to the brain.
Chapter
This chapter discusses the aspects of tone production that may be relevant to an understanding of tonal phenomena. It discusses laryngeal anatomy and physiology; reviews the controversial issues in tone production; and provides an introduction to the literature in this area. The larynx is a valve and a sound producer. As a valve, it regulates the flow of air into and out of the lungs and keeps food and drink out of the lungs. The two functions are accomplished by a relatively complex arrangement of cartilages, muscles, and other tissues. The hard structure of the larynx consists of four principal cartilages: the thyroid, the cricoid, and a pair of arytenoid cartilages. The thyroid and cricoid are connected and pivot about a transverse axis. The two arytenoid cartilages are connected to the cricoid cartilage via a ligamentous hinge and sit atop its rear rim. Each can rotate on the rim of the cricoid in such a way as to bring their front projections towards or away from the midline. The two vocal cords or, more appropriately, the vocal folds, are basically ligaments that stretch between the inner lower front surface of the thyroid cartilage and the front faces of the separate arytenoid cartilages. It is the rotation of the arytenoid cartilages that enables the vocal cords to be brought together toward the midline for voicing or breath-holding or to be separated from each other.
Article
Based on firsthand acoustic data, this paper aims to determine how many phonologically contrastive falling tones exist in tonal languages, and what kinds of distinctive features are needed to specify them. These goals are achieved by using a tonal model called the Multi-Register and Four-Level Model, which represents tones along four parameters: register, length, height, and contour. Having excluded a quasi-falling tone, this paper identifies seven Falling Tonotypes in the M Register: High, Low, MS-High, MS-Low, Deferred-High, Deferred-Low, and Slight Falling. Four of these also occur in the L Register with special voice qualities. In total, there are eleven Falling Tonotypes, which can be specified according to five distinctive features. © 2015 by The Journal of Chinese Linguistics. All rights reserved.
Article
This study investigated the ability to discriminate the middle and low tone contrasts in Thai by two groups of native English (NE) speakers and a control group of native Thai (NT) speakers. The first group was comprised of NE speakers who had no prior experience with Thai, whereas subjects in the second group were experienced learners of Thai (EE). The variables under investigation were experience with Thai, discrimination of open versus closed syllables, and the interstimulus interval (ISI) of the presentation (500 vs 1500 ms). The results obtained indicated that the NT group obtained higher discrimination scores than the NE or EE groups, the EE group obtained higher discrimination scores than the NE group, all three groups of subjects found open syllables to be more difficult to discriminate than closed syllables, and subjects in the EE group obtained higher discrimination scores for open syllables in the shorter than the longer ISI condition.
ResearchGate has not been able to resolve any references for this publication.