Article

Individual differences in acoustic and articulatory undershoot in a German diphthong – Variation between male and female speakers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Individual differences in speech production and, more specifically, in the realization of stress contrasts have been found previously (e.g. de Jong, 1995). This study extends this line of work by investigating potential genderspecific differences in the realization of different accent conditions and more specifically in the degree of undershoot. The reason suggested for these differences is the under-exploitation of the larger male articulatory space during running speech. Differences between male and female speakers in undershoot are investigated (a) by comparing the degree of undershoot in various accent conditions between male and female diphthong productions, and (b) by analyzing the degree of undershoot in relation to a speaker’s maximum articulatory vowel space. Articulatory and acoustic data from 11 German speakers (5 males, 6 females) of the diphthong /aɪ/ were analyzed in absolute terms and after normalization for a speaker’s maximal articulatory space. In addition to speakerspecific differences in undershoot and in the acoustic-articulatory relationship, results support gender-specific differences, with males exhibiting more undershoot than females in both articulatory and acoustic terms. After normalization with respect to a speaker’s maximum articulatory vowel space, females exhibit larger tongue back trajectories than males.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... They extended this line of research by investigating articulatory and acoustic undershoot in a German diphthong by means of EMA. Results point to an under-exploitation of the larger male articulatory space during running speech, with males exhibiting more undershoot than females in both articulatory and acoustic terms (Weirich & Simpson, 2018b). Thus, the smaller vocal tract dimensions in females lead to articulatory differences which have to be kept in mind when gender-specific acoustic differences are discussed. ...
Chapter
Speech carries a wealth of information about the speaker aside from any verbal message ranging from emotional state (sad, happy, bored, etc.) to illness (e.g., cold). Central features are a speaker’s gender and their sexual orientation. In part this is an inevitable product of differences in speakers’ anatomical dimensions, for example on average males have lower pitched voices than females due to longer, thicker vocal cords that vibrate more slowly. Arguably much more information has been learned by a speaker as they construct their gender or identify with a particular sexual orientation. Differences in speech already begin in young children, before any marked gender-related anatomical differences develop, emphasizing the importance of behavioral patterns. Gender, gender identity, and sexual orientation are encoded in speech in a range of different phonetic parameters relating to both phonation (activity of the vocal folds) and articulation (dimensions and configuration of the supraglottal cavities), as well as the use of pitch patterns and differences in voice quality (the way in which the vocal folds vibrate). Differences in the size and configuration of the supraglottal cavities give rise to differences in the size of the acoustic vowel space as well as subtle differences in the production of individual sounds, such as the sibilant [s]. Furthermore, significant and systematic gender-specific differences have been found in the average duration of utterances and individual sounds, which in turn have been found to have a complex relationship to the perception of tempo.
Article
Previous research has shown that coarticulatory information in the signal orients listeners in spoken word recognition, and that articulatory and perceptual dynamics closely parallel one another. The current study uses statistical classification to test the power of time-varying anticipatory coarticulatory information present in the acoustic signal for predicting upcoming sounds in the speech stream. Bayesian mixed-effects multinomial logistic regression models were trained on several different representations of spectral variation present in V1 in order to predict the identity of V2 in naturally coarticulated transconsonantal V1…V2 sequences. Models trained on simple measures of spectral variation (e.g. formant measures taken at V1 midpoint) were compared with models trained on more sophisticated time-varying representations (e.g. the estimated coefficients of polynomial curves fit to whole formant trajectories of V1). Accuracy in predicting V2 was greater when models were trained on dynamic representations of spectral variation in V1, and those trained on quadratic and cubic polynomial representations achieved the greatest accuracy, with more than 15 percentage points in correct classification over using midpoint formant frequencies alone. The results demonstrate that spectral representations with high temporal resolution capture more disambiguating anticipatory information available in the signal than representations with lower temporal resolution.
Article
The relative contributions of static and dynamic formant representations to speaker-specificity were investigated in conversational speech and in two vowels varying in inherent spectral change. Using polynomial fits, the contribution of dynamic formant coefficients to speaker-specificity relative to that of the formant intercept was investigated in the diphthongal vowel [ei] taken from English and Dutch conversational speech. The [ei] tokens were sampled from various linguistic contexts and analysed in an LR approach. Results show that formant dynamics contain speaker-specific information in conversational speech even though the high contextual variation seems to reduce its effect relative to that reported by earlier work. Vowels differ in inherent dynamicity and therefore, the added value of dynamic formant information to speaker-specificity was also compared between vowels differing in inherent spectral change. Using Dutch data, the contribution of formant dynamics to speaker-specificity was compared between [ei] and [a?] tokens produced by the same speakers. Formant dynamics in conversational speech only contributed to speaker-specificity in the diphthong [ei], not in the monophthong [a?].
Article
This paper focuses on the role of gender in the expression of politeness in the Iranian language, Farsi (Persian). Expressions of linguistic politeness are believed to be cultural universals. Farsi is known for the complexities of its politeness system, called ‘ta’aroff’, which reflects social hierarchies. This study employs a questionnaire interview and dialogical text analysis (ta’aroff expressions count) to explore whether the production and perception of ta’aroff differs across genders. The questionnaire contained demographic information and statements related to attitudes to ta’aroff. The participants were also requested to create dialogues based on a presented shopping prompt scenario. Iranian women (30) and men (30) of two age groups (20–29 and 40–59 years old) participated in the study. The results show statistically significant differences in the attitudes to ta’aroff in that men’s attitudes to ta’aroff are generally more positive than women’s. In the shopping prompt dialogues, women produced fewer ta’aroff expressions than men. There were also some differences found in the use of specific ta’aroff expressions across genders. The results suggest that gender should be considered a factor influencing both ta’aroff use in speech production and the attitudes to ta’aroff.
Article
Full-text available
This paper presents a systematic comparison of various measures of f0 range in female speakers of English and German. F0 range was analyzed along two dimensions, level (i.e., overall f0 height) and span (extent of f0 modulation within a given speech sample). These were examined using two types of measures, one based on "long-term distributional" (LTD) methods, and the other based on specific landmarks in speech that are linguistic in nature ("linguistic" measures). The various methods were used to identify whether and on what basis or bases speakers of these two languages differ in f0 range. Findings yielded significant cross-language differences in both dimensions of f0 range, but effect sizes were found to be larger for span than for level, and for linguistic than for LTD measures. The linguistic measures also uncovered some differences between the two languages in how f0 range varies through an intonation contour. This helps shed light on the relation between intonational structure and f0 range.
Article
Full-text available
Purpose: Mumbling as opposed to clear speech is a typical male characteristic in speech and can be the consequence of a small jaw opening. While behavioral reasons have often been offered to explain sex-specific differences with respect to clear speech, the purpose of this study is to investigate a potential anatomical reason for smaller jaw openings in male than in female speakers. Method: Articulatory data from two data sets (American English and German) were analyzed with respect to jaw opening in low vowels during speech. Particular focus was laid on sex-specific differences, also incorporating potential interactions with different accent conditions in one of the data sets. In addition, a modeling study compares the articulatory consequences of similar jaw opening settings in a typical male and a typical female articulatory model. Results: Greater jaw openings were found for the female speakers, in particular in the accented condition, where jaw opening was found to be larger. In line with this finding, the modeling study showed that similar jaw opening settings in male and female speakers led to differences in pharyngeal constriction, resulting in complete radico-pharyngeal closure in the male model. Conclusions: The empirical and modeling findings suggest a possible physiological component in sex-specific differences in speech clarity for low vowels.
Conference Paper
Full-text available
This study examines possible differences between the acoustic realization of the intersibilant contrast /s/~/ʃ/ in German and American English. A range of acoustic parameters (COG, standard deviation, skewness, kurtosis and Discrete Cosine Transformation coefficients) are calculated to characterize the spectra of the two sibilants. Significant differences are found between the male and female intersibilant contrast, indicating that females produce a stronger acoustic contrast between /s/ and /ʃ/ in both languages. While in the German data set a tendency for gender-specific differences in accent realizations were found, the effect did not reach significance.
Article
Full-text available
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Conference Paper
Full-text available
Studies have shown that although females on average have a larger acoustic vowel space than males, they exhibit a smaller articulatory vowel space. From this it is hypothesized that sex-specific differences in undershoot might exist. Articulatory vowel space sizes and Euclidean distances between vowel positions are analyzed in nine German speakers (5m, 4f) by means of electromagnetic articulography. Analyses include different sentence accent positions and two different sequences varying in their expected coarticulatory induced degree of undershoot. Results show a relationship between undershoot and speaker sex with males being more affected by accent and coarticulatory induced undershoot than females.
Article
Full-text available
Purpose: The purpose of this study was to further explore the understanding of speaker-specific realizations of the /s/-/ʃ/ contrast in German in relation to individual differences in palate shape. Method: Two articulatory experiments were carried out with German native speakers. In the first experiment, 4 monozygotic and 2 dizygotic twin pairs were recorded by means of electromagnetic articulography. In the second experiment, 12 unrelated speakers were recorded by means of electropalatography. Interspeaker variability in the articulatory distance between the sibilants was measured and was correlated with several parameters of the palate shape. Results: The results were twofold: (a) Similar palatal morphologies as found in monozygotic twins yield similar articulatory realizations of the /s/-/ʃ/ contrast regarding vertical and horizontal distance of the target tongue tip positions, and (b) the realization of the contrast was influenced by palatal steepness, especially the inclination angle of the alveolo-palatal region. Speakers with flat inclination angles mainly retracted their tongue to realize the contrast, whereas speakers with steep inclination angles also elevated their tongue. Conclusion: The articulatory realization of the sibilant contrast is influenced not only by speaker-specific auditory acuity, as previously observed, but also by palatal shape morphology, which affects the somatosensory feedback speakers receive.
Article
Full-text available
Despite various studies describing longer segment durations and slower speaking rates in females than males, there appears to be a stereotype of women speaking faster than men. To investigate the mismatch between empirical evidence and this widespread stereotype, listening experiments were conducted to test whether a relationship between perceived tempo and acoustic vowel space size might exists. If a speaker traverses a larger acoustic vowel space than another speaker within the same time then this speaker might be perceived as speaking faster. To test this, two listening experiments with either exclusively female or male speakers but with varying vowel space sizes were conducted. Listeners were asked to rate the perceived speech tempo of same-sex speaker pairs. The stimuli were manipulated to have the same segment durations and f0 contour. Results indicate that a positive correlation between acoustic vowel space size and perceived speech tempo exists. Since females exhibit on average a larger acoustic vowel space than males, it is suggested that the stereotype of faster speaking women might arise from this.
Article
Full-text available
The purpose of this study is to examine and compare the amount of inter-speaker variability in the articulation of monozygotic twin pairs (MZ), dizygotic twin pairs (DZ), and pairs of unrelated twins with the goal of examining in greater depth the influence of physiology on articulation. Physiological parameters are assumed to be very similar in MZ twin pairs in contrast to DZ twin pairs or unrelated speakers, and it is hypothesized that the speaker specific shape of articulatory looping trajectories of the tongue is at least partly dependent on biomechanical properties and the speaker's individual physiology. By means of electromagnetic articulography (EMA), inter-speaker variability in the looping trajectories of the tongue back during /VCV/ sequences is analyzed. Results reveal similar looping patterns within MZ twin pairs but in DZ pairs differences in the shape of the loop, the direction of the upward and downward movement, and the amount of horizontal sliding movement at the palate are found.
Article
Full-text available
In previous research, acoustic characteristics of the male voice have been shown to signal various aspects of mate quality and threat potential. But the human voice is also a medium of linguistic communication. The present study explores whether physical and vocal indicators of male mate quality and threat potential are linked to effective communicative behaviors such as vowel differentiation and use of more salient phonetic variants of consonants. We show that physical and vocal indicators of male threat potential, height and formant position, are negatively linked to vowel space size, and that height and levels of circulating testosterone are negatively linked to the use of the aspirated variant of the alveolar stop consonant /t/. Thus, taller, more masculine men display less clarity in their speech and prefer phonetic variants that may be associated with masculine attributes such as toughness. These findings suggest that vocal signals of men's mate quality and/or dominance are not confined to the realm of voice acoustics but extend to other aspects of communicative behavior, even if this means a trade-off with speech patterns that are considered communicatively advantageous, such as clarity and indexical cues to higher social class.
Article
Full-text available
Phonetic studies of phonological contrasts in features such as vowel height and consonant voicing have revealed a gender difference: phonetic correlates of phonological contrasts produced by women tend to be more distinct in acoustic and temporal space than those produced by men. However, these studies have been constrained by their dichotomous approach to gender. This article examines within- sex variation in phonetic correlates of phonological contrasts among eight American male radio disc jockeys. Four variables were examined: vowel space dispersion, consonant lenition, and two types of vowel length contrast. The social characteristics of the DJs were enumerated by having listeners rate each voice on 10 Likert scales. These ratings were aggregated into four components, which were interpreted as masculinity (i.e., gender), social class, regional accent, and personality. Two of these, masculinity and regional accent, showed significant correlations with the phonetic variables, demonstrating that phonetic distinctiveness correlates with speakers' social characteristics.
Article
Full-text available
Recent phonological approaches incorporate phonetic principles in the motivation of phonological regularities, e.g. vowel reduction and neutralization in unstressed position by target undershoot. So far, evidence for this hypothesis is based on impressionistic and acoustic data but not on articulatory data. The major goal of this study is to compare formant spaces and lingual positions during the production of German vowels for combined effects of stress, accent and corrective contrast. In order to identify strategies for vowel reduction independent of speaker-specific vocal-tract anatomies and individual biomechanical properties, an approach similar to the Generalized Procrustes Analysis was applied to formant spaces and lingual vowel target positions. The data basis consists of the German stressed and unstressed full vowels /iù yù eù ù φù œ aù a où uù / from seven speakers recorded by means of electromagnetic midsagittal articulography (EMMA). Speaker normalized articulatory and formant spaces gave evidence for a greater degree of coarticulation with the consonant context for unstressed vowels as compared to stressed vowels. However, only for tense vowels could spatial reduction patterns be attributed to vowel shortening, whereas lax vowels were reduced without shortening. The results are discussed in the light of current theories of vowel reduction, i.e. target undershoot, Adaptive Dispersion Theory and Prominence Alignment.
Article
Full-text available
In this study the acoustic and articulatory variabilities of speakers with different palate shapes were compared. Since the cross-sectional area of the vocal tract changes less for a slight change in tongue position if the palate is domeshaped than if it is flat, the acoustic variability should be greater for flat palates than for domeshaped ones. Consequently, it can be hypothesized that speakers with flat palates should reduce their articulatory variability in order to keep the acoustic output constant. This hypothesis was tested on 32 speakers recorded via electropalatography (EPG) and acoustics. The articulatory and acoustic variability of some of their vowels and /j/ was measured. Indeed, the results show that the speakers with flat palates reduce their variability in tongue height. There is no such trend in acoustic variability.
Article
Full-text available
A set of phonetic studies based on analysis of the TIMIT speech database is presented. Using a database methodological approach, these studies detail new results in speaker-dependent variation due to sex and dialect region of the talker including effects on stop release frequency, speaking rate, vowel reduction, flapping, and the use of glottal stop. TIMIT was found to be fertile ground for gathering acoustic-phonetic knowledge having relevance to the phonetic classification and recognition goals for which TIMIT was designed, as well as to the linguist attempting to describe regularity and variability in the pronunciation of read English speech.
Article
Full-text available
A pair of experiments examines first the coarticulatory relations among certain stressed and unstressed vowels, and next the perception of coarticulated unstressed vowels. The first study finds the acoustic properties of unstressed medial and, to a substantially lesser extent, of stressed medial, to be assimilated to the properties of their flanking vocalic contexts. Both initial and final flanking vowels coarticulate with medial, but carryover coarticulatory effects tend to exceed anticipatory effects. In a secondary experiment, listeners' manners of perceiving the coarticulated unstressed vowels of the first experiment are shown to be coupled to, or to be compatible with, the talkers' coarticulatory strategies. In particular, perceivers hear acoustically identical vowels to be different when the vowels appear in different contexts of flanking vowels. Similarly, instances of that are acoustically different due to different coarticulatory influences on them sound the same to listeners as long as each appears in its appropriate context of flanking vowels.
Chapter
This chapter presents three studies dealing with articulatory inter-speaker variability in German. In particular, organic sources (such as biomechanics of the tongue muscles, palatal shape and vocal tract dimensions) of idiosyncratic variation are discussed. Two studies deal with the within-pair similarity of identical (monozygotic) and non-identical (dizygotic) twin pairs; the third study describes differences between male and female speakers. The speech material comprises looping movements of the tongue in /aCV/-sequences, the production of the sibilant contrast /s/-/ʃ/ and the tense vowels /i: e: a: o: u:/ in different accent conditions. Results show that individual differences in articulatory strategies can at least in part be explained by idiosyncratic physiological restrictions and that the investigation of phonemic contrasts instead of targets, and the emphasis on speech dynamics are particularly relevant.
Article
A growing body of literature has revealed sex/gender differences in the acoustics of the sibilant fricative /s/. It has been suggested that some of this sex/gender-related variation might be socially motivated and acquired. However, the necessary developmental research to corroborate this proposal is absent from the literature. To address this, we examined sex/gender differences in the production of /s/ acoustics in relation to children׳s physical growth and gender identity. The speech production of children and adolescents aged 4–16 years old was recorded. Additionally, the physical height was measured and gender identity was evaluated through a parent-filled questionnaire. Three acoustic measures were calculated that describe the mean, standard deviation, and skewness of the spectral frequencies of /s/. Results indicated that gender identity played a key role in mediating the difference in /s/ acoustics between boys and girls for all three acoustic measurements. Additionally, for adolescent boys, gender identity explains within-gender variation in /s/. Our results thus highlight the importance of social-behavioral factors in the development of sex/gender difference in /s/ production.
Article
Measurements of formant frequencies and duration are reported for 8 Swedish vowels uttered by a male talker in three consonantal environments under varying timing conditions. An exponential function is used to describe the extent to which formant frequencies in the vowels reach their target values as a function of vowel-segment duration. A target is specified by the asymptotic values of the first two formant frequencies of the vowel and is independent of consonantal context and duration. It is thus an invariant attribute of the vowel. The results suggest an interpretation in terms of a simple dynamic model of vowel articulation.
Article
Current developments in the use of five-dimensional electromagnetic articulography for speech research are reviewed. Obvious advantages are the higher information density per sensor (three Cartesian coordinates, two spherical coordinates) compared to traditional 2D EMMA systems, and removal of the necessity to constrain the subject's head. The drawbacks are equally related to this higher dimensional space: position calculation involves solving a non-linear optimization problem. In some cases, unstable solutions are encountered, resulting in mistrackings. On the positive side, we illustrate how the higher information density allows particularly succinct and robust characterizations of tongue configuration. Discussion also focuses on monitoring of head movement. This is crucial for accurate recovery of articulator movements themselves, but is also intrinsically interesting as part of speech motor activity. In addition to improving the naturalness of the speaking situation, the freedom of head movement also means that subjects tolerate longer recording sessions. This can facilitate new experimental paradigms. Regarding drawbacks (and ways around them), instabilities in position calculation are illustrated and it is shown how a first estimate of the measured positions can be used as a starting point for a more robust estimate, taking the continuity of speech movements into account. Diagnostics for assessing the reliability of the final solution are outlined. While work remains to be done to ensure the same accuracy over the whole 5D-measurement space, it is concluded that the system already offers unparalleled scope for large-scale acquisition of flesh-point data.
Code
Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using ’R’, the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.
Article
The scaling between female and male formant frequencies tends to be highly nonuniform across vowel categories with the result that female vowels exhibit greater between-category dispersion in the F1×F2plane than male vowels. Vocal tract modeling studies strongly suggest that this greater dispersion of female vowels is partly behavioral, rather than purely anatomical, in origin. The present study tested one explanation for this behavioral difference between females and males,viz.,that without the compensatory effect of greater dispersion, the typically higher fundamental frequency (f0) of female talkers would yield reduced identifiability of vowels because of sparser harmonic sampling of spectral envelopes. The specific question addressed was whether, all else being equal, a higher f0has the assumed deleterious effect on vowel identifiability. In two experiments, the overall effect of increasing f0beyond 150Hz was to reduce vowel labeling accuracy. Across individual vowel categories, the effect of raising f0varied. Auditory modeling suggests that this category variation is partly attributable to differing degrees to which a high f0obscured the distinctive auditory properties of each vowel category. Consistent with the spectral undersampling account, the performance decline at high f0s was reduced or eliminated when f0was time-varying rather than constant.
Article
The present paper examines the plausibility of two models of flapping in American English: (1) a traditional model of flapping as a categorical switch from stop to flap production in a specified linguistic environment, and (2) a model of flapping in which flapping arises as a by-product of articulatory changes associated with the general implementation of prosodic structure. These models are tested against a corpus of X-ray microbeam records of English speakers producing utterances with word-final coronal consonants in the appropriate segmental context for flapping, but in varied prosodic locations. Tokens were submitted to perceptual, acoustic, and articulatory analyses. Results show that listeners consistently transcribe the presence of flaps according to acoustic differences in the presence of voicing during closure and a release burst. Transcriptions and lingual measurements, however, suggest that the difference between flaps and [d] is associated with gradient differences in lingual positioning. Some articulatory correlates of perceived flapping correspond to predictions of a model of increased co-production of vowels and consonants yielding lenited stops heard as flaps, but others do not. Problems raised by these results for both traditional and prosodic by-product models are discussed.
Article
Acoustic observations are reported for English front vowels embedded in a /w-1/ frame and carrying constant main stress. The vowels were produced by five speakers in clear and citation-form styles at varying durations but at a constant speaking rate. The acoustic analyses revealed (i) that formant patterns were systematically displaced in the direction of the frequencies of the consonants of the adjacent pseudosymmetrical context; (ii) that those displacements depended in a lawful manner on vowel duration; (iii) that this context and duration dependence was more limited for clear than for citation-form speech, and that the smaller formant shifts of clear speech tended to be achieved by increases in the rate of formant frequency change. The findings are compatible with a revised, and biomechanically motivated, version of the vowel undershoot model [Lindblom, J. Acoust. Soc. Am, 35, 1773-1781 (1963)] that derives formant patterns from numerical information on three variables: The ''locus-target'' distance, vowel duration; and rate of formant frequency change. The results further indicate that the ''clear'' samples were not merely louder, but involved a systematic, undershoot-compensating reorganization of the acoustic patterns.
Article
Two general principles of sexual differentiation emerge from previous sociolinguistic studies: that men use a higher frequency of nonstandard forms than women in stable situations, and that women are generally the innovators in linguistic change. It is not clear whether these two tendencies can be unified, or how differences between the sexes can account for the observed patterns of linguistic change. The extensive interaction between sex and other social factors raises the issue as to whether the curvilinear social class pattern associated with linguistic change is the product of a rejection of female-dominated changes by lower-class males. Multivariate analysis of data from the Philadelphia Project on Linguistic Change and Variation indicates that sexual differentiation is independent of social class at the beginning of a change, but that interaction develops gradually as social awareness of the change increases. It is proposed that sexual differentiation of language is generated by two distinct processes: (1) for all social classes, the asymmetric context of language learning leads to an initial acceleration of female-dominated changes and retardation of male-dominated changes; (2) women lead men in the rejection of linguistic changes as they are recognized by the speech community, a differentiation that is maximal for the second highest status group.
Article
Some precisions concerning the specification and the instruments developed for the measurement of laryngeal frequency F1 being stated, the authors give statistical results for French. In first approximation, the F1 distribution is gaussian for a given corpus; the standard deviation measured and given in percentage referring to the mean laryngeal frequency F1 is about 16% for male and female speakers. Ceteris partibus, the variations of F1 are relatively low for a given speaker from one record to an other (about ± ½ tone). The means of F1 are given for samples (gaussian) concerning adults (30 males and 30 female): respectively 118 and 207 Hz with standard deviation σ (F1) of 18 and 20 Hz. The ratio ז1 of the voiced duration to the total duration of the corpus, dependant on the sex: 0.50 for men and 0.63 for women (the difference is significant at the 0.001 level) allows the specification of the speaking rate. When the speaking rate increases, the length of pauses decreases. These results can be used for an estimation of the precision of the statistical measurement of F1 and σ (F1), in relation to the number of periods N. To N appears a total duration of the corpus determined by ז1. A comparison is made between the results of the calculation and the measurement. The F1 specification can be used for determination of the vocal source of synthesizer to improve naturalness of synthetic sounds and for analysis and synthesis of prosodic features using the results of two previous papers which have allowed to specify the position of level 1 2 and 4 and to show the differences in the perceptual importance of the variations introduced in connection with the three levels.
Article
The perception of speaker sex depends on the listener's integration of a complex range of factors. These may relate, for example, to the style of delivery, the use of particular language, pronunciation (Trudgill, 1983; Smith, 1979), the use of particular intonation patterns (McConnell-Ginet, 1983) and the perceived pitch of the speaker (Aronovitch, 1976, Elyan, 1978; Lass et al., 1976). Some acoustic-phonetic investigations have explored through instrumental analysis how speaker sex differences are perceived. These have shown that acoustic phonetic differences exist between the read speech of men and women speakers. It has been demonstrated that fundamental frequency differences exist between men and women, with men having on average, lower fundamental frequencies (Aronovitch, 1976; Coleman, 1973a). This can be explained in part by their larger larynges. However it is also acknowledged that it is not a low overall average fundamental frequency alone that contributes to the perception of an adult male voice. Some evidence shows for example that use of a wider pitch range will contribute to the perception of femininity, even where the overall pitch is low (Terrango, 1966). In addition women have been found to have on average higher formant frequencies (Coleman, 1976; Henton, 1986; Peterson & Barney, 1952; Childers & Wu, 1991; Wu & Childers, 1991) as a result of the smaller vocal tract. Women have different glottal source characteristics (Karlsson, 1989) which are reflected in the filter characteristics of the speech signal (Klatt & Klatt, 1990). There is also some evidence to suggest that other speaker sex differences exist in the temporal domain. Byrd (1992) found differences between men and women speakers in speaking rate in read speech in American English in the TIMIT database. Byrd states that under the recording conditions used for the TIMIT database, women spoke appreciably more slowly than the men and that men tended to reduce vowels to schwa ([]) more often than the women. Byrd also found that female speakers in the TIMIT database released stops in sentence-final position more frequently and produced more glottal stops than male speakers. All these findings were statistically significant.
Article
A set of phonetic studies based on analysis of the TIMIT speech database is presented which addresses topics relevant to the linguistic and speech recognition communities. First, the advantages and shortcomings of using TIMIT for linguistic research are considered, and a database methodological approach is outlined. Next, several small studies are presented which detail new results on the effect of speakers' sex and dialect region on pronunciation. The goal of this paper is to use the database to explore sex and dialect related variation thereby ascertaining differences which may merit further experimental study. This report concerns speaker-dependent effects on certain phonetic characteristics often involved in reduction such as speech rate, stop releases, flapping, central vowels, laryngeal state, syllabic consonants, and palatalization processes. Specifically, it is suggested that the phonetic characteristics found more commonly with male speakers are also those typical of reduction in speech.
Article
Differences in male and female vocal tract dimensions are hypothesized to have a number of dynamic consequences—differences in target attainment, articulatory speed, and acoustic vowel space dimension. Evidence for some of these predictions is sought by investigating articulatory and acoustic patterns in interword vowel sequences in the University of Wisconsin X-ray Microbeam Speech Production Database (UW-XRMBDB). Means of formant and lingual pellet tracks throughout such vocalic stretches exhibit similarities in acoustic and articulatory form for male and female groups, but show significant gender-specific differences in both articulatory and acoustic space traversed, with females making greater acoustic excursions for shorter articulatory distances.
Article
Two sets of data are discussed in terms of an exemplar-resonance model of the lexicon. First, a cross-linguistic review of vowel formant measurements indicate that phonetic differences between male and female talkers are a function of language, dissociated to a certain extent from vocal tract length. Second, an auditory word recognition study [Strand (2000). Gender Stereotype Effects in Speech Processing. Ph.D. Dissertation, Ohio State University] indicates that listeners can process words faster when the talker has a stereotypical sounding voice. An exemplar-resonance model of perception derives these effects suggesting that reentrant pathways [Edelman (1987). Neural Darwinism: The theory of neuronal group selection. New York: Basic Books] between cognitive categories and detailed exemplars of them leads to the emergence of social and linguistic entities.
Article
The goal of this study is to examine how the degree of vowel-to-vowel coarticulation varies as a function of prosodic factors such as nuclear-pitch accent (accented vs. unaccented), level of prosodic boundary (Prosodic Word vs. Intermediate Phrase vs. Intonational Phrase), and position-in-prosodic-domain (initial vs. final). It is hypothesized that vowels in prosodically stronger locations (e.g., in accented syllables and at a higher prosodic boundary) are not only coarticulated less with their neighboring vowels, but they also exert a stronger influence on their neighbors. Measurements of tongue position for English /a i/ over time were obtained with Carsten's electromagnetic articulography. Results showed that vowels in prosodically stronger locations are coarticulated less with neighboring vowels, but do not exert a stronger influence on the articulation of neighboring vowels. An examination of the relationship between coarticulation and duration revealed that (a) accent-induced coarticulatory variation cannot be attributed to a duration factor and (b) some of the data with respect to boundary effects may be accounted for by the duration factor. This suggests that to the extent that prosodically conditioned coarticulatory variation is duration-independent, there is no absolute causal relationship from duration to coarticulation. It is proposed that prosodically conditioned V-to-V coarticulatory reduction is another type of strengthening that occurs in prosodically strong locations. The prosodically driven coarticulatory patterning is taken to be part of the phonetic signatures of the hierarchically nested structure of prosody.
Article
Relationships between a listener's identification of a spoken vowel and its properties as revealed from acoustic measurement of its sound wave have been a subject of study by many investigators. Both the utterance and the identification of a vowel involve subjective responses and are affected by the language and dialectal backgrounds and the vocal and auditory characteristics of the individuals concerned. The purpose of this paper is to discuss some of the control methods that are being used in the evaluation of these effects in a vowel study program in progress at Bell Telephone Laboratories. The plan of the study, calibration of recording and measuring equipment, and methods for checking the performance of both speakers and listeners are described. The methods are illustrated from results of tests involving some 76 speakers and 70 listeners.
Article
Glottal closure and perceived breathiness were evaluated in 9 female and 9 male normally speaking subjects ranging in age from 20 to 35 years. Phonations of the vowel /i/ at three loudness and pitch levels were performed. The degree of glottal closure was judged by speech clinicians from video-fiberstroboscopic recordings. Later they rated the degree of perceived breathiness both in the vowels recorded during the fiberscopy and in separately tape-recorded vowels. Intra- and interjudge reliabilities were satisfactory. The degree of incomplete glottal closure and the degree of perceived breathiness increased significantly as an effect of decreased loudness. Neither the degree of closure nor the perceived breathiness were significantly affected by changes in pitch or by interaction effects. It was concluded that incomplete glottal closure of the posterior parts of glottis should be regarded as normal primarily in women and that loudness should be taken into consideration when studying glottal closure and breathiness.
Article
Comparison is drawn between male and female larynges on the basis of overall size, vocal fold membranous length, elastic properties of tissue, and prephonatory glottal shape. Two scale factors are proposed that are useful for explaining differences in fundamental frequency, sound power, mean airflow, and glottal efficiency. Fundamental frequency is scaled primarily according to the membranous length of the vocal folds (scale factor of 1.6), whereas mean airflow, sound power, glottal efficiency, and amplitude of vibration include another scale factor (1.2) that relates to overall larynx size. Some explanations are given for observed sex differences in glottographic waveforms. In particular, the simulated (computer-modeled) vocal fold contact area is used to infer male-female differences in the shape of the glottis. The female glottis appears to converge more linearly (from bottom to top) than the male glottis, primarily because of medial surface bulging of the male vocal folds.
Article
It is shown that within-speaker variations in vocal effort and phonation affect fundamental frequency (F0) and the formant frequencies of vowels in the sense of a linear compression/expansion of the spectral separations between them, given an adequate scaling of pitch. Between-speaker variations in size correspond to a translation of the spectral peaks shaped by F0 and the formants if pitch is scaled tonotopically (in Bark). On the basis of these observations, invariant cues to vowel quality are suggested. It is further shown that vowels produced by adult women tend to be phonetically more explicit and, hence, more peripheral in 'vowel space' than those of men and children. It is also shown that the formant frequencies of vowels subjected to paralinguistic variation are related by power functions of frequency.
Article
A computerized pulsed-ultrasound system was used to monitor tongue dorsum movements during the production of consonant-vowel sequences in which speech rate, vowel, and consonant were varied. The kinematics of tongue movement were analyzed by measuring the lowering gesture of the tongue to give estimates of movement amplitude, duration, and maximum velocity. All three subjects in the study showed reliable correlations between the amplitude of the tongue dorsum movement and its maximum velocity. Further, the ratio of the maximum velocity to the extent of the gesture, a kinematic indicator of articulator stiffness, was found to vary inversely with the duration of the movement. This relationship held both within individual conditions and across all conditions in the study such that a single function was able to accommodate a large proportion of the variance due to changes in movement duration. As similar findings have been obtained both for abduction and adduction gestures of the vocal folds and for rapid voluntary limb movements, the data suggest that a wide range of changes in the duration of individual movements might all have a similar origin. The control of movement rate and duration through the specification of biomechanical characteristics of speech articulators is discussed.
Article
The essential features of the coarticulation properties of Swedish dental stops in vowel‐consonant‐vowel contexts can be described by the formula s(x; t) = v(x; l)+k (t)[c(x) − v(x; t)]w c (x) , where x represents the longitudinal distance between lips and glottis and s(x; t) denotes the shape of the vocal tract at some instant of time, t, during the vowel‐consonant vowel utterance. The vowel component, v(x; t) is a linear combination of the three “extreme” shapes of the vowels /i/, /a/, and /u/ with weights that vary as functions of time. The consonant is represented by c(x), an ideal target shape, and wc (x), a so‐called coarticulation function. A time varying factor k(t) represents the degree of excursion of the consonantal gesture. Vocal tract shapes measured from x‐ray motion pictures of a set of Swedish vowel‐consonant vowel utterances compare well with shapes generated by the formula. This result is consistent with our earlier conclusions about coarticulation, viz., that the vowel and consonant gestures are largely independent at the level of neural instructions.