Article

Segmental Influences on the Perception of High Pitch Accent Scaling in American English

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Researchers investigating a broad array of questions in spoken language prosody routinely base their arguments on measurements taken from the F0 contours of representative speech samples. These analyses, however, frequently involve abstracting F0 contours away from the segmental strings that bear them, potentially overlooking in the process the role played by segmental qualities such as sonority or periodicity in the realization of F0 patterns by speakers and their interpretation by listeners. This paper reports the results of two experiments investigating how perception of F0 contours is affected by the segmental string over which those contours are realized. The first focuses on gaps in F0 contours created by voiceless obstruents such as stops and fricatives, while the second investigates F0 intervals spanning lower-sonority voiced segments, such as nasals and voiced fricatives. While these two scenarios might at first seem unrelated, we argue that listeners treat both with a single mechanism in perception, namely, by reducing (potentially to zero) the amount of weight accorded to those portions of the contour for determination of the speaker’s intended F0 scaling level. We present an account of both effects within a unified model of F0 scaling perception called TCoG-F, with discussion of its implications for phonetic and phonological intonation research going forward.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Two conflicting views have been advanced of what defines ‘default’ high pitch accents in various West Germanic languages, including English: One equates these accents fundamentally with a rise to a high turning point, while the other focuses on the fall from it. Both views arise from the assumption within Autosegmental-Metrical theory that the phonological representations of intonational categories can be discerned more-or-less directly from the string of intentional-seeming changes of direction in the F0 curve, identified as production ‘targets’. Two perceptual experiments reveal that, at least in American English, this view critically oversimplifies how pitch accents containing High tones are defined and distinguished: instead, both the shape of the rise and the shape of the fall are seen to contribute to the alignment of the overall bulk of the high region, defined by the rise-fall shape, with the segmental string, and thus to its categorization by listeners as an early, mid or late rise-fall (H + !H*, L + H*, or L* + H). These findings are consistent with the view that the Tonal Center of Gravity (TCoG) of the rise-fall shape as a whole, rather than an F0 turning point per se, is what speakers align with segmental content to distinguish different pitch accent categories. Questioning the primacy of the turning points as the phonetic targets for these pitch accents, in turn, seriously problematizes standard assumptions about the nature of phonological representations of intonation and their relation to the signal.
Article
Full-text available
We investigate the phonotactic behaviour of nasal consonants in a database of over 200 languages. Our findings challenge the common classification of nasals as intermediate between obstruents and liquids on the sonority hierarchy. Instead, we propose that there are two types of nasal consonants, one group with lower sonority than liquids and one with higher sonority. We propose that these two types of nasals differ in the presence or absence of a value for the feature [±continuant].
Poster
Full-text available
Poster for the 2nd Prosody Visualisation Challenge (PVC2)
Conference Paper
Full-text available
This paper aims to strengthen the link between acoustic and perceptual representations of intonation, a link that has been weakened by the over-reliance on the F0 trajectory, which can only be interpreted in relation to landmarks in the segmental string, placed manually or semi-automatically at a separate stage in the analysis. Only then can F0 events be identified as linguistically relevant (e.g. early, medial or late peaks, accentual tones or edge tones etc.). We provide an analysis and visualization of two acoustic dimensions contributing towards the perceived pitch contour, F0 over time and, crucially, periodic energy. Periodic energy reflects the degree to which pitch is intelligible, a higher value representing a stronger F0 signal that is consequently more easily perceived. A representation of F0 that includes periodic energy is thus able to flag portions of the speech signal that are relevant for the analysis of intonation, without the need for a separate segmentation of the signal into phones and syllables. Index Terms: intonation, pitch perception, periodic energy, tonal alignment, segmentation, data visualization, sonority
Article
Full-text available
In normal modally voiced utterances, voiceless fricatives like [s], [ʃ], [f], and [x] vary such that their aperiodic pitch impressions mirror the pitch level of the adjacent F0 contour. For instance, if the F0 contour creates a high or low pitch context, then the aperiodic pitch impression of the fricative in this context will also be high or low. This context-matching effect has been termed “segmental intonation”. While there is accumulating evidence for segmental intonation in speech production, less is known about if and how segmental intonation is actually integrated in the perception of utterance tunes. This question is addressed here in a perception experiment in which listeners identified target words ending in either [ʃ] or [s]. The two sibilants inherently create low or high aperiodic pitch impressions in listeners due to their characteristically different spectral energy distributions. The sibilants were preceded by high or low F0 contexts in the target words. Results show a clear F0-context effect. The context effect triggered more [ʃ] identifications in high-F0 and/or more [s] identifications in low-F0 contexts. The effect was larger for sibilants that were less clearly identifiable as either /ʃ/ or /s/. The effect represents strong supporting evidence that listeners in fact perceive the segmental intonation of fricatives and integrate its aperiodic pitch with the F0-based pitch when perceiving utterance intonation. Thus, the term “segmental intonation” is perceptually appropriate. Furthermore, the results are discussed with respect to reaction-time measurements and an additional effect of the quality of the adjacent vowel phoneme on sibilant identification.
Article
Full-text available
A model for the generation of fundamental frequency contours (F0 contours) of spoken, sentences is presented for the purpose of elucidating the relationship between the sentence F0 contour and the linguistic and non-linguistic information. It is based on a quantitative formulation of the process whereby the logarithmic fundamental frequency is controlled in proportion to the sum of two components corresponding respectively to the effects of phrase and accent. The model's parameters were determined to give the best approximation to an observed F0 contour on the basis of the mean squared error. Analysis of natural utterances of various declarative sentences of Japanese revealedthat the model can generate close approximations to observed F0 contours from a set of discrete commands and a small number of parameters. The extracted parameters were found to be closely related to linguistic factors and factors constituting thenaturalness of speech. These results provide a means for generating natural F0 contours from a small set of parameters and rules for synthesis.
Conference Paper
Full-text available
We present a method for investigating the temporal alignment of intonation events by parametrizing F0 contours. Results for three German single-speaker corpora and one American English multi-speaker corpus show that the speakers generally avoid to place peaks in syllable onsets. We suggest that this is a quantal effect [9] which results from the fact that syllable onsets are boundaries in tonal production.
Article
Full-text available
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Article
Full-text available
Perceived continuity was studied by varying the direction (steady-state, upward glide, or downward glide), the frequency separation, and the slope of two sinusoidal tones separated by a louder burst of white noise. When the two tones had different directions, continuity was perceived according to a frequency-proximity principle (frequency-interpolation effect). On the other hand, when the tones had the same direction, continuity was perceived on the basis of a good-continuation principle (frequency-extrapolation, or frequency-trajectory, effect). We attempted to determine the degree of tuning of the frequency-extrapolation mechanism. Our results showed that there was some tuning for the starting frequency of the postnoise glide; in this case, the region of acceptance for continuity was centered around a frequency predicted from the trajectory of the prenoise glide. No evidence was found for tuning based on the slope of the sinusoidal tones. These results suggest that auditory processes need to analyze the postnoise sound before deciding whether the prenoise tone continued underneath the noise burst.
Article
Full-text available
This article investigates the perceptual effect of a high plateau in the intonation contour. Plateaux are flat stretches of contour and have been observed associated with high tones in Standard Southern British (SSB) English. The hypothesis that plateaux may make the accents with which they are associated sound higher in pitch than sharp peaks of the same maximum frequency is tested experimentally. In the first experiment listeners heard pairs of resynthesized utterances where the nuclear accent differed only in shape, not frequency. They indicated which stimulus they thought contained the higher pitched accent. Results showed that plateau-shaped accents sound higher than peaks. In the second experiment the effect of a plateau on prominence relations within an utterance is investigated. Listeners heard resynthesized sentences, and compared two accents. One group indicated which accent sounded higher in pitch and the other indicated which sounded more prominent. Results again indicated that plateau-shaped accents sound higher in pitch and also more prominent; judgments of pitch and prominence were very similar to one another. The results from both experiments indicated that accent shape is a perceptually important variable, although such a fine level of detail is not taken into account by autosegmental-metrical theories of intonation.
Article
Full-text available
Conducted 6 experiments with 64 university students to investigate perceived auditory continuity with alternately rising and falling frequency glides, in which the glides were perceived as continuous when deleted portions were replaced by white noise bursts. The first 3 experiments showed that perceptual continuity could be obtained when the deleted portion came either in the middle of the glide, or at the top and bottom of the glides; continuity was actually better for the latter condition. Also, it was found that as glide duration increased, the threshold between perceived continuity and discontinuity increased; there was a similar increase as the difference between highest and lowest frequencies increased. In Exps IV-VI, when the peak was deleted and replaced with noise, there was no perceptual extrapolation of the incomplete glides; rather, there seemed to be considerable rounding off of the trajectory of the glide. (French summary) (28 ref)
Article
Full-text available
The goal of this experiment is to find the most important phonetic features of Dutch accent-lending pitch movements, in terms of shape, pitch level and alignment with the segmental structure. Time pressure is used as a heuristic method to isolate important phonetic aspects of pitch movements, assuming that under time pressure the speaker will preserve those aspects. In a production experiment, accent-lending rises ('1') and falls ('A') were realized under various types of time pressure. The pitch rise is time-compressed under all pressure types, which would mean that the shape of the rise is relatively unimportant. The segmental alignment of the rise proved to be more important: the onset of the rise is synchronized with the syllable onset. For the fall no fixed synchronization point was found, but its shape was relatively invariant, indicating that shape rather than exact timing is the more important feature of the fall.
Article
Full-text available
It has been shown that visual display systems of intonation can be employed beneficially in teaching intonation to persons with deafness and in teaching the intonation of a foreign language. In this paper, the question is addressed whether important audible differences between two pitch contours correspond with visually conspicuous differences between displayed pitch contours. If visual feedback of intonation is to be effective in teaching situations, such correspondence must exist. In two experiments, phoneticians rated the dissimilarity of two pitch contours. In the first experiment they rated the two pitch contours auditorily (i.e., by listening to two resynthesized utterances). In the second, they rated the same two pitch contours visually (i.e., by looking at the two contours displayed on a computer screen). The results indicate why visual feedback may be very effective in intonation training if pitch contours are displayed in such a way that only auditorily relevant features are represented.
Article
Full-text available
Pitch perception for short-duration fundamental frequency (F0) glissandos was studied. In the first part, new measurements using the method of adjustment are reported. Stimuli were F0 glissandos centered at 220 Hz. The parameters under study were: F0 glissando extents (0, 0.8, 1.5, 3, 6, and 12 semitones, i.e., 0, 10.17, 18.74, 38.17, 76.63, and 155.56 Hz), F0 glissando durations (50, 100, 200, and 300 ms), F0 glissando directions (rising or falling), and the extremity of F0 glissandos matched (beginning or end). In the second part, the main results are discussed: (1) perception seems to correspond to an average of the frequencies present in the vicinity of the extremity matched; (2) the higher extremities of the glissando seem more important; (3) adjustments at the end are closer to the extremities than adjustments at the beginning. In the third part, numerical models accounting for the experimental data are proposed: a time-average model and a weighted time-average model. Optimal parameters for these models are derived. The weighted time-average model achieves a 94% accurate prediction rate for the experimental data. The numerical model is successful in predicting the pitch of short-duration F0 glissandos.
Chapter
Phonetically Based Phonology is centred around the hypothesis that phonologies of languages are determined by phonetic principles; that is, phonetic patterns involving ease of articulation and perception are expressed linguistically as grammatical constraints. This book brings together a team of scholars to provide a wide-ranging study of phonetically based phonology. It investigates the role of phonetics in many phonological phenomena - such as assimilation, vowel reduction, vowel harmony, syllable weight, contour line distribution, metathesis, lenition, sonority sequencing, and the Obligatory Contour Principle (OCP) - exploring in particular the phonetic bases of phonological markedness in these key areas. The analyses also illustrate several analytical strategies whereby phonological sound patterns can be related to their phonological underpinnings. Each chapter includes a tutorial discussion of the phonetics on which the phonological discussion is based. Diverse and comprehensive in its coverage, Phonetically Based Phonology will be welcomed by all linguists interested in the relationship between phonetics and phonological theory.
Book
Auditory Scene Analysis addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds. In a unified and comprehensive way, Bregman establishes a theoretical framework that integrates his findings with an unusually wide range of previous research in psychoacoustics, speech perception, music theory and composition, and computer modeling. Bradford Books imprint
Article
Recent evidence that pitch-movement shape can influence perceived alignment of rising (LH) pitch accents in several languages appears to challenge the well-established level-based approach to intonation embodied in the AM model, wherein it is typically assumed that the alignment and scaling of well-defined turning points (TPs) in the F0 contour are the primary phonetic correlates of contrastive accent category. Here we present the results of two experiments, arguing that a new approach to tonal implementation succeeds in reconciling these apparent contradictions. This approach, based on the notion of a perceptual reference point called Tonal Center of Gravity (TCoG), treats information about contour shape and TP-localization not in ‘either-or’ terms, but rather as two sets of cues working in a fundamentally synergistic way toward a single perceptual end: the alignment and scaling of TCoG. Experiment 1 shows that TCoG-based models can perform better at distinguishing productions of English L+H* and L*+H pitch accents than comparable TP-only-based models; Experiment 2 shows that TCoG is more robust than TP-only-based models to ambiguities in TP localization commonly encountered in F0 signals from natural speech. TCoG is shown to capture key insights of movement-based approaches to intonation, without abandoning the central advantages of level-based approaches like AM.
Article
Two speakers of Mexican Spanish read a total of 810 declarative sentences containing nine distinct target syllables with an H*accent, under different prosodic conditions (end of intonational phrase, end of intermediate phrase, and phrase-medial position), systematically changing syllabic position in the word and distance in syllables to the next stressed syllable. The data demonstrate that models that represent F0peak placement as a fixed proportion of the syllable/rhyme are superseded by models that take into account intrasyllabic segmental durations (e.g., van Santen & Hirschberg 1994). In addition, prosodic factors such as adjacency to word, intonational and intermediate-boundaries, and stress clash, are key components in the prediction of peak location. F0peaks before such prosodic units tend to be placed earlier in thier syllables. Long range effects of those prosodic factors are statistically significant, but generally small.An examination of the timing of the entire accent gesture in these prosodic contexts demonstrates that: 1) the location of the start of the F0rise is fairly constant (generally at the onset of the accented syllable); 2) the time allocated for the rising gesture is not invariant: syllables with early peaks also have a shorterrise time(i.e., temporal distance between the peak and the previous valley); and 3) the velocity of the accent rise is clearly correlated withpeak delay(i.e., temporal distance between the peak and the onset of the syllable): accents with relatively early peaks have a steeper rising slope, and vice versa.Results from the present study represent preliminary evidence that both the segmental composition of the accented syllable and following prosodic context trigger a timing and velocity adjustment on the entire H*accent gesture in Spanish.
Article
The perception of sounds characterized by a moving resonance was investigated a series of experiments. Stimuli were generated by exciting a tuned circuit with a short train of pulses of repetition rate 100/sec. The resonant frequency of the tuned circuit was changed in a piecewise linear manner over a 500‐cps range. Subjects matched the test stimuli by adjusting the resonant frequency of a fixed (i.e., nonvarying in time) resonant circuit until the test and comparison stimuli were judged to be most alike. Results indicate a strong tendency for subjects to adjust the frequency of the fixed resonant circuit until it is close to the terminal resonant frequency of the time‐varying circuit. This tendency depended to some extent on the direction and rate of the frequency change in the test stimulus. The implications of the results for auditory theory and speech perception are discussed briefly.
Article
Using examples from a wide variety of languages, this book reveals why speakers vary their pitch, what these variations mean, and how they are integrated into our grammars. All languages use modulations in pitch to form utterances. Pitch modulation encodes lexical “tone” to signal boundaries between morphemes or words, and encodes “intonation” to give words and sentences an additional meaning that isn’t part of their original sense. © Carlos Gussenhoven 2004 and Cambridge University Press, 2010.
Chapter
This collection of papers presents current research in speech science. The unifying theme of the collection is the relationship between phonological representations of the grammatical structure of speech, and physical models of the production and perception of actual utterances. The authors, including leading specialists from the fields of phonology, electrical engineering, linguistic phonetics and psychology, provide a wide range of views on this question. There are papers dealing with the relationship between phonology and phonetics as it applies to tone in Hausa, to intonation, stress and phrasing in English and German, to universals of patterning in sonority and syllable structure, and in consonant place assimilation, to speech synthesis tools for testing phonological and phonetic theories, and to three different models of articulatory structure. An introductory chapter by the editors outlines the aim of the volume and provides a short overview of the papers. The book is aimed at specialists in all areas of speech science.
Article
The paper is concerned with the 'edge of intonation' in a twofold sense. It focuses on utterance-final F0 movements and crosses the traditional segment-prosody divide by investigating the interplay of F0 and voiceless fricatives in speech production. An experiment was performed for German with four types of voiceless fricatives: /f/, /s/, /ʃ/ and /x/. They were elicited with scripted dialogues in the contexts of terminal falling statement and high rising question intonations. Acoustic analyses show that fricatives concluding the high rising question intonations had higher mean centres of gravity (CoGs), larger CoG ranges and higher noise energy levels than fricatives concluding the terminal falling statement intonations. The different spectral-energy patterns are suitable to induce percepts of a high 'aperiodic pitch' at the end of the questions and of a low 'aperiodic pitch' at the end of the statements. The results are discussed with regard to the possible existence of 'segmental intonation' and its implication for F0 truncation and the segment-prosody dichotomy, in which segments are the alleged troublemakers for the production and perception of intonation.
Article
The aim of this study is to determine wether the threshold and pitch of falling glissandos are identical to those calculated for rising ones - and consequently to test the hypothesis that rising glissandos are less well perceived than falling ones. The results show a threshold between 18 and 19 Hz for a duration of 200 ms, with an initial frequency of 139 Hz. The pitch corresponds to the frequency of a point at two thirds of the duration of the vowel. These results confirm those obtained for rising glissandos. The above hypothesis is not verified. In conclusion, a model of hearing is proposed which account for the modes of perception of glissandos.
Article
This paper investigates the coordination relations between f0 turning points and segmental landmarks in falling pitch accents in Catalan. Ten Central Catalan speakers participated in the production experiment, for a total of 500 target pitch accents. Results indicate that while the beginning of the falling accent gesture (H) is tightly synchronized with the onset of the accented syllable, the end of the falling gesture (L) is more variable. This contrast has also been reported in rising accents between the alignment behavior of f0 valleys and peaks. It has been suggested that the asymmetry between alignment patterns in syllable-initial vs. syllable-final position might be attributable to general properties of intergestural coordination (Gao, 2006; Prieto and Torreira, 2007). Second, the data reveal a clear effect of syllable structure: while in open syllables the end of the fall is aligned roughly with the end of the accented syllable, in closed syllables it is aligned somewhat later but well before the coda consonant. Thus the L turning point is not aligned with the offset of the accented syllable, and coda consonants seem to have a 'transparent' behavior. This same effect of coda consonants on alignment has been reported in crosslinguistic studies of rising accents. A potential perceptual explanation for these effects is House's (1990) idea that in order to produce a perceptually acceptable rising (or falling) tone in a syllable with a final nasal, speakers would have to implement the most dynamic portion of the contour during the production of the vowel.
Article
This paper addresses the validity of the segmental anchoring hypothesis for tonal landmarks (henceforth, SAH) as described in recent work by (among others) Ladd, Faulkner, D., Faulkner, H., & Schepman [1999. Constant ‘segmental’ anchoring of f0 movements under changes in speech rate. Journal of the Acoustical Society of America, 106, 1543–1554], Ladd [2003. Phonological conditioning of f0 target alignment. In: M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the XVth international congress of phonetic sciences, Vol. 1, (pp. 249–252). Barcelona: Causal Productions; in press. Segmental anchoring of pitch movements: Autosegmental association or gestural coordination? Italian Journal of Linguistics, 18 (1)]. The alignment of LH* prenuclear peaks with segmental landmarks in controlled speech materials in Peninsular Spanish is analyzed as a function of syllable structure type (open, closed) of the accented syllable, segmental composition, and speaking rate. Contrary to the predictions of the SAH, alignment was affected by syllable structure and speech rate in significant and consistent ways. In: CV syllables the peak was located around the end of the accented vowel, and in CVC syllables around the beginning-mid part of the sonorant coda, but still far from the syllable boundary. With respect to the effects of rate, peaks were located earlier in the syllable as speech rate decreased.
Article
The notion that the syllable is a unit of articulatory organization has long had intuitive appeal, although a series of studies spanning more than two decades failed to support this hypothesis (cf. Stetson, 1951; Draper, Ladefoged & Whitteridge, 1959; Kozhevenikov & Chistovich, 1965; Gay, 1978; Kent & Minifie, 1977; Harris & Bell-Berti, 1984). More recently, however, a new approach to this issue – one that considers syllables to be characteristic patterns of articulatory organization (Krakow, 1989; Browman & Goldstein, 1995) – has provided new insights into the nature of syllable organization in speech. This paper reviews the relevant physiological investigations in the literature and presents new data, which together serve to demonstrate that the syllable is, at its core, a physiological unit. The relation between such evidence and phonological patterns is discussed, including cross-language distributional differences between syllable-initial and syllable-final consonants, as well as such notions as ambisyllabicity and resyllabification.
Conference Paper
The tilt intonation model facilitates automatic analysis and synthesis of intonation. The analysis algorithm detects intonational events in F0 contours and parameterises them in terms of the continuously varying Tilt parameters. We describe the analysis system and give results for speaker independent spontaneous dialogue speech. We then describe a synthesis algorithm which can generate F0 contours given a tilt parameterisation of an utterance. We give results showing how well the automatically produced contours match natural ones. The paper concludes with a discussion of the linguistic relevance of the tilt parameters and show that this is both a useful and natural way of representing intonation. 1. INTRODUCTION The tilt intonation model is designed to facilitate automatic intonational processing for speech technology applications. The model represents intonation at a phonetic level as a sequence of parameterised intonational events. From such a representation, it is possible to enco...
Article
One of the things you learn if you read books and articles in (or about) cognitive science is that the brain does a lot of "filling in"--not filling in, but "filling in"--in scare quotes. My claim today will be that this way of talking is not a safe bit of shorthand, or an innocent bit of temporizing, but a source of deep confusion and error. The phenomena described in terms of "filling in" are real, surprising, and theoretically important, but it is a mistake to conceive of them as instances of something being filled in, for that vivid phrase always suggests too much--sometimes a little too much, but often a lot too much. Here are some examples (my boldface throughout).
Article
An acoustic analysis of a German read-speech corpus showed that utterance-final /t/ aspirations differ systematically depending on the accompanying nuclear accent contour. Two contours were included: Terminal-falling early and late F0 peaks in terms of the Kiel Intonation Model. They correspond to H+L*L-% and L*+HL-% within the autosegmental metrical (AM) model. Aspirations in early-peak contexts were characterized by (a) "short", (b) "high-intensity" noise with (c) "low" frequency values for the spectral energy maximum above the lower spectral energy boundary. The opposite holds for aspirations accompanying late-peak productions. Starting from the acoustic analysis, a perception experiment was performed using a variant of the semantic differential paradigm. The stimuli were varied in the duration and intensity pattern as well as the spectral energy pattern of the final /t/ aspiration. Results revealed that the different noise patterns found in connection with early and late peak productions were able to change the attitudinal meaning of the stimuli toward the meaning profile of the respective F0 peak category. This suggests that final aspirations can be part of the coding of meanings, so far solely associated with intonation contours. Hence, the traditionally separated segmental and suprasegmental coding levels seem to be more intertwined than previously thought.
Article
This is a brief report on an experiment, intended to show that a piecewise linear approximation of an F0 curve in speech is, perceptually, not inferior to an approximation by means of fragments of parabolas, which gives--visually at least--a better fit to the original F0 curve than does the rectilinear approximation. (More details can be read in IPO Rep. no. 816, available on request). Stimuli consisted of two pairs of linearly or parabolically frequency modulated pulse trains, one of which contained identical, the other different members. The subjects had to indicate whether it was the first, or the second pair that contained different members. The results showed that even the best performers were hardly ever able to distinguish the parabolic from the rectilinear shapes, provided the latter contained a flattened peak.
Article
This paper deals with the factors that influence the alignment of F0 movements with phonetic segments. It reports two experiments on the alignment of rising prenuclear pitch accents in Dutch. In experiment 1, it is shown that the final peak of the rise is aligned at the end of the vowel if the accented syllable contains a long vowel, but during the following consonant if the accented syllable contains a short vowel. The beginning of the rise is consistently aligned at the beginning of the accented syllable. Experiment 2 attempts to distinguish between two explanations for this finding: (1) a durational account, in which the F0 rise takes a certain amount of time and overruns into the following consonant if the vowel is short; and (2) a structural account, in which the peak of the rise is seen as a tonal target aligned with the end of the syllable (which is structurally earlier for long vowels than for short vowels). The data partially support both accounts. There is an alignment difference despite a lack of durational difference, which supports the structure-based account. However, the effect is reduced compared to experiment 1, showing that time pressure may work against the ideal alignment.
Article
In this paper, we present a time domain aperiodicity, periodicity, and pitch (APP) detector that estimates 1) the proportion of periodic and aperiodic energy in a speech signal and 2) the pitch period of the periodic component. The APP system is particularly useful in situations where the speech signal contains simultaneous periodic and aperiodic energy, as in the case of breathy vowels and some voiced obstruents. The performance of the APP system was evaluated on synthetic speech-like signals corrupted with noise at various levels of signal-to-noise ratio (SNR) and on three different natural speech databases that consist of simultaneously recorded electroglottograph (EGG) and acoustic data. When compared on a frame basis (at a frame rate of 2.5 ms) the results show excellent agreement between the periodic/aperiodic decisions made by the APP system and the estimates obtained from the EGG data (94.43% for periodicity and 96.32% for aperiodicity). The results also support previous studies that show that voiced obstruents are frequently manifested with either little or no aperiodic energy, or with strong periodic and aperiodic components. The EGG data were used as a reference for evaluating the pitch detection algorithm. The ground truth was not manually checked to rectify or exclude incorrect estimates. The overall gross error rate in pitch prediction across the three speech databases was 5.67%. In the case of synthetic speech-like data, the estimated SNR was found to be in close proportion to the actual SNR, and the pitch was always accurately found regardless of the presence of any shimmer or jitter.
emmeans: Estimated Marginal Means, aka Least-Squares Means
  • R Lenth