Journal of Phonetics

Published by Elsevier
Online ISSN: 1095-8576
Publications
Article
Empirical gaps remain regarding infant mandibular kinematics observed during naturally occurring episodes of chewing and pre-linguistic vocalizations during the first 2-years of life. Vertical jaw displacement was measured from a typically developing infant from 8 to 22 months. Infant jaw kinematics was measured for vowel babble, non-variegated and variegated babble, and chewing. Results indicated that measures of kinematic variability were significantly less for chewing than all babble categories. These measures changed across age for chewing: (a) peak vertical jaw elevation decreased in variability, while (b) jaw displacement and (c) speed of movement increased in variability. Kinematics for vowel babble were characterized as exhibiting less jaw displacement with higher average vertical jaw position than other babble types and chewing. Developmentally, jaw kinematics for babble changed for jaw displacement and average vertical jaw position. These changes were related to decreased episodes for vowel babble productions and increased episodes for variegated babble and reduplicative syllables. These results suggest that developmental processes such as non-overlapping task-demands likely differentiate trajectories of jaw movement for infant chewing and babble. Infant jaw kinematics for babble cannot be predicted from observations of adult speakers or from non-speech behaviors observed for infants or adults.
 
Article
This study examined the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. In the latter group, it also examined the role of the language environment and English proficiency. Three groups of listeners were tested: native English listeners (NE), Mandarin-speaking Chinese listeners in the US (M-US) and Mandarin listeners in Beijing, China (M-BJ). As a group, M-US and M-BJ listeners were matched on English proficiency and age of acquisition. A nonword transcription task was used. Identification accuracy for word-final stops in the nonwords established two independent interlanguage intelligibility effects. An interlanguage speech intelligibility benefit for listeners (ISIB-L) was manifest by both groups of Mandarin listeners outperforming native English listeners in identification of Mandarin-accented speech. In the benefit for talkers (ISIB-T), only M-BJ listeners were more accurate identifying Mandarin-accented speech than native English speech. Thus, both Mandarin groups demonstrated an ISIB-L while only the M-BJ group overall demonstrated an ISIB-T. The English proficiency of listeners was found to modulate the magnitude of the ISIB-T in both groups. Regression analyses also suggested that the listener groups differ in their use of acoustic information to identify voicing in stop consonants.
 
Article
The perception of phonological differences between regional dialects of American English by naïve listeners has received little attention in the speech perception literature and is still a poorly understood problem. Two experiments were carried out using the TIMIT corpus of spoken sentences produced by talkers from a number of distinct dialect regions in the United States. In Experiment 1, acoustic analysis techniques identified several phonetic features that can be used to distinguish different dialects. In Experiment 2, recordings of the sentences were played back to naïve listeners who were asked to categorize talkers into one of six geographical dialect regions. Results showed that listeners are able to reliably categorize talkers using three broad dialect clusters (New England, South, North/West), but that they have more difficulty categorizing talkers into six smaller regions. Multiple regression analyses on the acoustic measures, the actual dialect affiliation of the talkers, and the categorization responses revealed that the listeners in this study made use of several reliable acoustic-phonetic properties of the dialects in categorizing the talkers. Taken together, the results of these two experiments confirm that naïve listeners have knowledge of phonological differences between dialects and can use this knowledge to categorize talkers by dialect.
 
Article
A current theoretical view proposes that infants converge on the speech categories of their native language by attending to frequency distributions that occur in the acoustic input. To date, the only empirical support for this statistical learning hypothesis comes from studies where a single, salient dimension was manipulated. Additional evidence is sought here, by introducing a less salient pair of categories supported by multiple cues. We exposed English-learning infants to a multi-cue bidimensional grid between retroflex and alveolopalatal sibilants in prevocalic position. This contrast is substantially more difficult according to previous cross-linguistic and perceptual research, and its perception is driven by cues in both the consonantal and the following vowel portions. Infants heard one of two distributions (flat, or with two peaks), and were tested with sounds varying along only one dimension. Infants' responses differed depending on the familiarization distribution, and their performance was equally good for the vocalic and the frication dimension, lending some support to the statistical hypothesis even in this harder learning situation. However, learning was restricted to the retroflex category, and a control experiment showed that lack of learning for the alveolopalatal category was not due to the presence of a competing category. Thus, these results contribute fundamental evidence on the extent and limitations of the statistical hypothesis as an explanation for infants' perceptual tuning.
 
Histograms of VOT, f0 and H1–H2 for lax, tense and aspirated stops produced by Korean adult male and female speakers. Vertical lines indicate median values. Note that there is a difference in scale of the x-axis for the f0 figures.  
Histograms of VOT, f0 and H1–H2 in stops produced by Korean-speaking children by age group. Vertical lines indicate median values.  
The histograms of VOT, f0 and H1–H2 of the 400 stimuli used in the perception experiment. 50 of the total stimuli were taken from the adult speakers' word productions (top panels) and 350 of them came from the child speakers' word productions (middle panels). The bottom panels show the quantiles of the stimuli distributions for the subset of tokens used in the perception experiment plotted against all of the children's production data included in experiment I.  
Article
Transcription-based studies have shown that tense stops appear before aspirated or lax stops in most Korean-acquiring children's speech. This order of mastery is predicted by the short lag Voice Onset Time (VOT) values of Korean tense stops, as this is the earliest acquired phonation type across languages. However, the tense stop also has greater motor demands than the other two phonation types, given its pressed voice quality (negative H1-H2) and its relatively high f0 value at vowel onset, word-initially. In order to explain the observed order of mastery of Korean stops, we need a more sensitive quantitative model of the role of multiple acoustic parameters in production and perception. This study explores the relationship between native speakers' transcriptions/categorizations of children's stop productions and three acoustic characteristics (VOT, H1-H2 and f0). The results showed that the primary acoustic parameter that adult listeners used to differentiate tense vs. non-tense stops was VOT. Listeners used VOT and the additional acoustic parameter of f0 to differentiate lax vs. aspirated stops. Thus, the early acquisition of tense stops is explained both by their short-lag VOT values and the fact that children need to learn to control only a single acoustic parameter to produce them.
 
Article
Coarticulation is a source of acoustic variability for vowels, but how large is this effect relative to other sources of variance? We investigate acoustic effects of anticipatory V-to-V coarticulation relative to variation due to the following C and individual speaker. We examine F1 and F2 from V1 in 48 V1-C#V2 contexts produced by 10 speakers of American English. ANOVA reveals significant effects of both V2 and C on F1 and F2 measures of V1. The influence of V2 and C on acoustic variability relative to that of speaker and target vowel identity is evaluated using hierarchical linear regression. Speaker and target vowel account for roughly 80% of the total variance in F1 and F2, but when this variance is partialed out C and V2 account for another 18% (F1) and 63% (F2) of the remaining target vowel variability. Multinomial logistic regression (MLR) models are constructed to test the power of target vowel F1 and F2 for predicting C and V2 of the upcoming context. Prediction accuracy is 58% for C-Place, 76% for C-Voicing and 54% for V2, but only when variance due to other sources is factored out. MLR is discussed as a model of the parsing mechanism in speech perception.
 
Article
The age at which children master adult-like voiced stops can generally be predicted by voice onset time (VOT): stops with optional short lag are early, those with obligatory lead are late. However, Japanese voiced stops are late despite having a short lag variant, whereas Greek voiced stops are early despite having consistent voicing lead. This cross-sectional study examines the acoustics of word-initial stops produced by English-, Japanese-, and Greek-speaking children aged 2 to 5, to investigate how these seemingly exceptional mastery patterns relate to use of other phonetic correlates. Productions were analyzed for VOT, f0 and spectral tilt (H1-H2) in Japanese and English, and for amplitude trajectory in Greek and Japanese. Japanese voiceless stops have intermediate lag VOT values, so other "secondary" cues are needed to differentiate them from the voiced short lag VOT variant. Greek voiced stops are optionally prenasalized, and the amplitude trajectory for the voice bar during closure suggests that younger children use a greater degree of nasal venting to create the aerodynamic conditions necessary for voicing lead. Taken together, the findings suggest that VOT must be supplemented by measurements of other language-specific acoustic properties to explain the mastery pattern of voiced stops in some languages.
 
Article
This study investigates the role of two processes, cue enhancement (learning to attend to acoustic cues which characterize a speech contrast for native listeners) and cue inhibition (learning to ignore cues that do not), in the acquisition of the American English tense and lax ([i] vs.[I]) vowels by native Spanish listeners. This contrast is acoustically distinguished by both vowel spectrum and duration. However, while native English listeners rely primarily on spectrum, inexperienced Spanish listeners tend to rely exclusively on duration. Twenty-nine native Spanish listeners, initially reliant on vowel duration, received either enhancement training, inhibition training, or training with a natural cue distribution. Results demonstrated that reliance on spectrum properties increased over baseline for all three groups. However, inhibitory training was more effective relative to enhancement training and both inhibitory and enhancement training were more effective relative to natural distribution training in decreasing listeners' attention to duration. These results suggest that phonetic learning may involve two distinct cognitive processes, cue enhancement and cue inhibition, that function to shift selective attention between separable acoustic dimensions. Moreover, cue-specific training (whether enhancing or inhibitory) appears to be more effective for the acquisition of second language speech contrasts.
 
Article
In this paper we investigate the effect of clear speech, a distinct, listener-oriented, intelligibility-enhancing mode of speech production, on vowel and stop consonant contrasts along the temporal dimension in English and Croatian. Our previous work has shown that, in addition to enhancing the overall acoustic salience of the speech signal through a decrease in speaking rate and expansion of pitch range, clear speech modifications increased the spectral distances between vowel categories in both languages despite the different sizes of their vowel inventories (+10 in English, 5 in Croatian). Here, we examine how clear speech affects the duration of English tense ('long') vs. lax ('short') vowels, English vowels preceding voiced ('long') vs. voiceless ('short') coda stops, Croatian long vs. short vowels and Croatian and English VOT duration for voiced and voiceless stops. Overall, the results showed that the proportional distance between the 'short' and 'long' vowel categories and between the voiced and voiceless stop categories was remarkably stable across the two speaking styles in both languages. These results suggest that, in combination with the spectral enhancement of vowel contrasts, language-specific pronunciation norms along the temporal dimension are maintained in clear and conversational speech.
 
Article
This acoustic study examines sound (vowel) change in apparent time across three successive generations of 123 adult female speakers ranging in age from 20 to 65 years old, representing three regional varieties of American English, typical of western North Carolina, central Ohio and southeastern Wisconsin. A set of acoustic measures characterized the dynamic nature of formant trajectories, the amount of spectral change over the course of vowel duration and the position of the spectral centroid. The study found a set of systematic changes to /I, ε, æ/ including positional changes in the acoustic space (mostly lowering of the vowels) and significant variation in formant dynamics (increased monophthongization). This common sound change is evident in both emphatic (articulated clearly) and nonemphatic (casual) productions and occurs regardless of dialect-specific vowel dispersions in the vowel space. The cross-generational and cross-dialectal patterns of variation found here support an earlier report by Jacewicz, Fox, and Salmons (2011) which found this recent development in these three dialect regions in isolated citation-form words. While confirming the new North American Shift in different styles of production, the study underscores the importance of addressing the stress-related variation in vowel production in a careful and valid assessment of sound change.
 
Article
Much evidence has been found for pervasive links between the manual and speech motor systems, including evidence from infant development, deictic pointing, and repetitive tapping and speaking tasks. We expand on the last of these paradigms to look at intra- and cross-modal effects of emphatic stress, as well as the effects of coordination in the absence of explicit rhythm. In this study, subjects repeatedly tapped their finger and synchronously repeated a single spoken syllable. On each trial, subjects placed an emphatic stress on one finger tap or one spoken syllable. Results show that both movement duration and magnitude are affected by emphatic stress regardless of whether that stress is in the same domain (e.g., effects on the oral articulators when a spoken repetition is stressed) or across domains (e.g., effects on the oral articulators when a tap is stressed). Though the size of the effects differs between intra-and cross-domain emphases, the implementation of stress affects both motor domains, indicating a tight connection. This close coupling is seen even in the absence of stress, though it is highlighted under stress. The results of this study support the idea that implementation of prosody is not domain-specific but relies on general aspects of the motor system.
 
Article
The effect of age of acquisition on first- and second-language vowel production was investigated. Eight English vowels were produced by Native Japanese (NJ) adults and children as well as by age-matched Native English (NE) adults and children. Productions were recorded shortly after the NJ participants' arrival in the USA and then one year later. In agreement with previous investigations [Aoyama, et al., J. Phon. 32, 233-250 (2004)], children were able to learn more, leading to higher accuracy than adults in a year's time. Based on the spectral quality and duration comparisons, NJ adults had more accurate production at Time 1, but showed no improvement over time. The NJ children's productions, however, showed significant differences from the NE children's for English "new" vowels /ɪ/, /ε/, /ɑ/, /ʌ/ and /ʊ/ at Time 1, but produced all eight vowels in a native-like manner at Time 2. An examination of NJ speakers' productions of Japanese /i/, /a/, /u/ over time revealed significant changes for the NJ Child Group only. Japanese /i/ and /a/ showed changes in production that can be related to second language (L2) learning. The results suggest that L2 vowel production is affected importantly by age of acquisition and that there is a dynamic interaction, whereby the first and second language vowels affect each other.
 
Article
This study examines sign lowering as a form of phonetic reduction in American Sign Language. Phonetic reduction occurs in the course of normal language production, when instead of producing a carefully articulated form of a word, the language user produces a less clearly articulated form. When signs are produced in context by native signers, they often differ from the citation forms of signs. In some cases, phonetic reduction is manifested as a sign being produced at a lower location than in the citation form. Sign lowering has been documented previously, but this is the first study to examine it in phonetic detail. The data presented here are tokens of the sign WONDER, as produced by six native signers, in two phonetic contexts and at three signing rates, which were captured by optoelectronic motion capture. The results indicate that sign lowering occurred for all signers, according to the factors we manipulated. Sign production was affected by several phonetic factors that also influence speech production, namely, production rate, phonetic context, and position within an utterance. In addition, we have discovered interesting variations in sign production, which could underlie distinctions in signing style, analogous to accent or voice quality in speech.
 
Article
Most second language acquisition research focuses on linguistic structures, and less research has examined the acquisition of sociolinguistic patterns. The current study explored the perceptual classification of regional dialects of American English by native and non-native listeners using a free classification task. Results revealed similar classification strategies for the native and non-native listeners. However, the native listeners were more accurate overall than the non-native listeners. In addition, the non-native listeners were less able to make use of constellations of cues to accurately classify the talkers by dialect. However, the non-native listeners were able to attend to cues that were either phonologically or sociolinguistically relevant in their native language. These results suggest that non-native listeners can use information in the speech signal to classify talkers by regional dialect, but that their lack of signal-independent cultural knowledge about variation in the second language leads to less accurate classification performance.
 
Article
While cross-dialect prosodic variation has been well established for many languages, most variationist research on regional dialects of American English has focused on the vowel system. The current study was designed to explore prosodic variation in read speech in two regional varieties of American English: Southern and Midland. Prosodic dialect variation was analyzed in two domains: speaking rate and the phonetic expression of pitch movements associated with accented and phrase-final syllables. The results revealed significant effects of regional dialect on the distributions of pauses, pitch accents, and phrasal-boundary tone combinations. Significant effects of talker gender were also observed on the distributions of pitch accents and phrasal-boundary tone combinations. The findings from this study demonstrate that regional and gender identity features are encoded in part through prosody, and provide further motivation for the close examination of prosodic patterns across regional and social varieties of American English.
 
Article
Recent studies have found that naïve listeners perform poorly in forced-choice dialect categorization tasks. However, the listeners' error patterns in these tasks reveal systematic confusions between phonologically similar dialects. In the present study, a free classification procedure was used to measure the perceptual similarity structure of regional dialect variation in the United States. In two experiments, participants listened to a set of short English sentences produced by male talkers only (Experiment 1) and by male and female talkers (Experiment 2). The listeners were instructed to group the talkers by regional dialect into as many groups as they wanted with as many talkers in each group as they wished. Multidimensional scaling analyses of the data revealed three primary dimensions of perceptual similarity (linguistic markedness, geography, and gender). In addition, a comparison of the results obtained from the free classification task to previous results using the same stimulus materials in six-alternative forced-choice categorization tasks revealed that response biases in the six-alternative task were reduced or eliminated in the free classification task. Thus, the results obtained with the free classification task in the current study provided further evidence that the underlying structure of perceptual dialect category representations reflects important linguistic and sociolinguistic factors.
 
Article
This study examines the phenomenon of post-aspirated voiceless stops in Western Andalusian Spanish in /s/ + voiceless stop sequences. Previous analyses have proposed that the post-aspiration arises through a reorganization of the glottal spreading gesture for /s/ and the oral constriction gesture for the stop. This theory is tested by steadily increasing speakers' production rate, which has been shown to trigger spontaneous changes in gestural organization in speech and other motor activities. Results from the study support the initial hypothesis. There is a switch from productions with with preaspiration and short VOT to those with long VOT as rate increases. Additionally, there is a tradeoff between VOT and pre-closure aspiration, indicating they may result from the same gesture. Lastly, the variability in production shows a number of hallmarks of phase transitions in human coordination. In sum, a change in gestural organization provides a simple explanation for post-aspirated stops in this dialect.
 
Article
Individuals vary their speaking rate, and listeners use the speaking rate of precursor sentences to adjust for these changes (Kidd, 1989). Most of the research on this adjustment process has focused on situations in which there was only a single stream of speech over which such perceptual adjustment could occur. Yet listeners are often faced with environments in which multiple people are speaking simultaneously. Each of these voices provides speaking rate information. The challenge for the listener is to determine which sources of information should apply in a speech perception situation. Three studies examined when listeners would use rate information from one voice to adjust their perception of another voice. Results suggested that if only one source of duration information was available, listeners used that information, regardless of the speaker or the speaker's spatial location. When multiple sources were available, listeners primarily used information from the same source as the target item. However, even information from a source that differed in both location and talker still influenced perception to a slight degree.
 
Article
We tested the hypothesis that rapid shadowers imitate the articulatory gestures that structure acoustic speech signals-not just acoustic patterns in the signals themselves-overcoming highly practiced motor routines and phonological conditioning in the process. In a first experiment, acoustic evidence indicated that participants reproduced allophonic differences between American English /l/ types (light and dark) in the absence of the positional variation cues more typically present with lateral allophony. However, imitative effects were small. In a second experiment, varieties of /l/ with exaggerated light/dark differences were presented by ear. Acoustic measures indicated that all participants reproduced differences between /l/ types; larger average imitative effects obtained. Finally, we examined evidence for imitation in articulation. Participants ranged in behavior from one who did not imitate to another who reproduced distinctions among light laterals, dark laterals and /w/, but displayed a slight but inconsistent tendency toward enhancing imitation of lingual gestures through a slight lip protrusion. Overall, results indicated that most rapid shadowers need not substitute familiar allophones as they imitate reorganized gestural constellations even in the absence of explicit instruction to imitate, but that the extent of the imitation is small. Implications for theories of speech perception are discussed.
 
Article
This study examines the production and perception of Intonational Phrase (IP) boundaries. In particular, it investigates (1) whether the articulatory events that occur at IP boundaries can exhibit temporal distinctions that would indicate a difference in degree of disjuncture, and (2) to what extent listeners are sensitive to the effects of such differences among IP boundaries. Two experiments investigate these questions. An articulatory kinematic experiment examines the effects of structural differences between IP boundaries on the production of those boundaries. In a perception experiment listeners then evaluate the strength of the junctures occurring in the utterances produced in the production study. The results of the studies provide support for the existence of prosodic strength differences among IP boundaries and also demonstrate a close link between the production and perception of prosodic boundaries. The results are discussed in the context of possible linguistic structural explanations, with implications for cognitive accounts for the creation, implementation, and processing of prosody.
 
Article
Using a combination of magnetometry and ultrasound, we examined the articulatory characteristics of the so-called 'transparent' vowels [i], [i], and [e] in Hungarian vowel harmony. Phonologically, transparent vowels are front, but they can be followed by either front or back suffixes. However, a finer look reveals an underlying phonetic coherence in two respects. First, transparent vowels in back harmony contexts show a less advanced (more retracted) tongue body posture than phonemically identical vowels in front harmony contexts: e.g. [i] in buli-val is less advanced than [i] in bili-vel. Second, transparent vowels in monosyllabic stems selecting back suffixes are also less advanced than phonemically identical vowels in stems selecting front suffixes: e.g. [i] in ír, taking back suffixes, compared to [i] of hír, taking front suffixes, is less advanced when these stems are produced in bare form (no suffixes). We thus argue that the phonetic degree of tongue body horizontal position correlates with the phonological alternation in suffixes. A hypothesis that emerges from this work is that a plausible phonetic basis for transparency can be found in quantal characteristics of the relation between articulation and acoustics of transparent vowels. More broadly, the proposal is that the phonology of transparent vowels is better understood when their phonological patterning is studied together with their articulatory and acoustic characteristics.
 
Oscillograms and spectrograms illustrating VOT measurement protocol. On left, negative VOT is reported, denoted by À b. On right, positive VOT reported, denoted by þ p. 
Bilinguals’ mean VOTs in Greek and English initial-position stops across voiced and voiceless targets produced in unilingual modes and from code-switches. Error bars indicate standard error of the mean. 
Bilinguals’ VOTs of Greek and English voiced and voiceless stops in medial post-vocalic positions produced in unilingual modes and from code-switches. Error bars indicate standard error of the mean. 
Bilinguals’ VOTs of Greek and English voiced and voiceless stops in medial post-nasal positions produced in unilingual modes and from code-switches. Error bars indicate standard error of the mean. 
VOTs (M: boldfaced) and standard deviations (SD) of bilingual speakers' productions of Greek and English stops in initial position in unilingual mode and when code- switching from the base language.
Article
Speech production research has demonstrated that the first language (L1) often interferes with production in bilinguals' second language (L2), but it has been suggested that bilinguals who are L2-dominant are the most likely to suppress this L1-interference. While prolonged contextual changes in bilinguals' language use (e.g., stays overseas) are known to result in L1 and L2 phonetic shifts, code-switching provides the unique opportunity of observing the immediate phonetic effects of L1-L2 interaction. We measured the voice onset times (VOTs) of Greek-English bilinguals' productions of /b, d, p, t/ in initial and medial contexts, first in either a Greek or English unilingual mode, and in a later session when they produced the same target pseudowords as a code-switch from the opposing language. Compared to a unilingual mode, all English stops produced as code-switches from Greek, regardless of context, had more Greek-like VOTs. In contrast, Greek stops showed no shift toward English VOTs, with the exception of medial voiced stops. Under the specifically interlanguage condition of code-switching we have demonstrated a pervasive influence of the L1 even in L2-dominant individuals.
 
Article
Different languages use voice onset time (VOT) in different ways to signal the voicing contrast, for example, short lag/long lag (English) vs. prevoiced/short lag (French). Also, VOT depends on place of articulation, with labial VOTs being shorter than velar and alveolar and, sometimes, alveolar being shorter than velar. Here we examine the VOT in babbled utterances of five French-learning and five English-learning infants at ages 9 and 12 months. There was little or no difference between the languages for duration of positive VOTs, which were usually in the "short lag" range. The duration of prevoicing also did not differ between languages, but the proportion of prevoiced utterances did (French-learning infants: 44.2% prevoicing; English-learning: 14.3%). Labial, alveolar and velar stops differed in VOT, with alveolar longer than labial and velar longer than alveolar, suggesting a mechanical cause. The lack of long-lag VOT indicates that the English-learning infants have not mastered aspiration by 12 months. The different proportions of prevoicing, however, suggest that the French-learning infants attempt to imitate the prevoicing that is used frequently (and contrastively) in their native language environment. The results suggest that infants are sensitive to the voicing categories of the ambient language but that they may be able to control prevoicing more successfully than aspiration.
 
Article
This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit.
 
Article
The goal of this paper was to examine intrinsic and extrinsic factors contributing to the development of speech perception in monolingual and bilingual infants and toddlers. A substantial number of behavioral studies have characterized when infants show changes in behavior towards speech sounds in relation to amount of experience with these sounds. However, these studies cannot explain to what extent the developmental timeline is influenced by experience with the language versus constraints imposed by cortical maturation. Studies using electrophysiological measures to examine the development of auditory and speech processing have shown great differences in infant and adult electrophysiological correlates of processing. Many of these differences are a function of immature cortex in the infant. In this paper, we examined the maturation of infant and child event-related-potential (ERP) electrophysiological components in processing an English vowel contrast and explored to what extent these components are influenced by intrinsic (e.g., sex) versus extrinsic factors, such as language experience (monolingual vs. bilingual). Our findings demonstrate differences in the pattern of ERP responses related to age and sex, as well as language experience. These differences make it clear that general maturational factors need to be taken into consideration in examining the effect of language experience on the neurodevelopment of speech perception.
 
Monolinguals’ and bilinguals’ discrimination of English and Greek word- initial (CV–CV) contrasts (Experiment 1). 
Monolinguals’ " " and bilinguals’ discrimination of English and Greek intervocalic (V CV-V CV) contrasts (Experiment 2). 
Article
How listeners categorize two phones predicts the success with which they will discriminate the given phonetic distinction. In the case of bilinguals, such perceptual patterns could reveal whether the listener's two phonological systems are integrated or separate. This is of particular interest when a given contrast is realized differently in each language, as is the case with Greek and English stop-voicing distinctions. We had Greek-English early sequential bilinguals and Greek and English monolinguals (baselines) categorize, rate, and discriminate stop-voicing contrasts in each language. All communication with each group of bilinguals occurred solely in one language mode, Greek or English. The monolingual groups showed the expected native-language constraints, each perceiving their native contrast more accurately than the opposing nonnative contrast. Bilinguals' category-goodness ratings for the same physical stimuli differed, consistent with their language mode, yet their discrimination performance was unaffected by language mode and biased toward their dominant language (English). We conclude that bilinguals integrate both languages in a common phonetic space that is swayed by their long-term dominant language environment for discrimination, but that they selectively attend to language-specific phonetic information for phonologically motivated judgments (category-goodness ratings).
 
Article
We examined the voice onset times (VOTs) of monolingual and bilingual speakers of English and French to address the question whether cross language phonetic influences occur particularly in simultaneous bilinguals (that is, speakers who learned both languages from birth). Speakers produced sentences in which there were target words with initial /p/, /t/ or /k/. In French, natively bilingual speakers produced VOTs that were significantly longer than those of monolingual French speakers. French VOTs were even longer in bilingual speakers who learned English before learning French. The outcome was analogous in English speech. Natively bilingual speakers produced shorter English VOTs than monolingual speakers. English VOTs were even shorter in the speech of bilinguals who learned French before English. Bilingual speakers had significantly longer VOTs in their English speech than in their French. Accordingly, the cross language effects do not occur because natively bilingual speakers adopt voiceless stop categories intermediate between those of native English and French speakers that serve both languages. Monolingual speakers of French or English in Montreal had VOTs nearly identical respectively to those of monolingual Parisian French speakers and those of monolingual Connecticut English speakers. These results suggest that mere exposure to a second language does not underlie the cross language phonetic effect; however, these findings must be resolved with others that appear to show an effect of overhearing.
 
Article
The way that bilinguals produce phones in each of their languages provides a window into the nature of the bilingual phonological space. For stop consonants, if early sequential bilinguals, whose languages differ in voice onset time (VOT) distinctions, produce native-like VOTs in each of their languages, it would imply that they have developed separate first and second language phones, that is, language-specific phonetic realisations for stop-voicing distinctions. Given the ambiguous phonological status of Greek voiced stops, which has been debated but not investigated experimentally, Greek-English bilinguals can offer a unique perspective on this issue. We first recorded the speech of Greek and Australian-English monolinguals to observe native VOTs in each language for /p, t, b, d/ in word-initial and word-medial (post-vocalic and post-nasal) positions. We then recorded fluent, early Greek-Australian-English bilinguals in either a Greek or English language context; all communication occurred in only one language. The bilinguals in the Greek context were indistinguishable from the Greek monolinguals, whereas the bilinguals in the English context matched the VOTs of the Australian-English monolinguals in initial position, but showed some modest differences from them in the phonetically more complex medial positions. We interpret these results as evidence that bilingual speakers possess phonetic categories for voiced versus voiceless stops that are specific to each language, but are influenced by positional context differently in their second than in their first language.
 
Article
This study compares the time to initiate words with varying syllable structures (V, VC, CV, CVC, CCV, CCVC). In order to test the hypothesis that different syllable structures require different amounts of time to prepare their temporal controls, or plans, two delayed naming experiments were carried out. In the first of these the initiation time was determined from acoustic recordings. The results confirmed the hypothesis but also showed an interaction with the initial segment (i.e., vowel-initial words were initiated later than words beginning with consonants, but this difference was much smaller for words starting stops compared to /l/ or /s/). Adding a coda did not affect the initiation time. In order to rule out effects of segment-specific articulatory to acoustic interval differences, a second experiment was performed in which speech movements of the tongue, the jaw and the lips were recorded by means of electromagnetic articulography. Results from initiation time, based on articulatory measurements, showed a significant syllable structure effect with VC words being initiated significantly later than CV(C) words. Only minor effects of the initial segment were found. These results can be partly explained by the amount of accumulated experience a speaker has in coordinating the relevant gesture combinations and triggering them appropriately in time.
 
Article
Syllable complexity has been found to affect the time the speaker needs for planning and initiating utterance production. Shorter latencies for complex onsets (CCV) as compared to simple onsets (CV) have been explained by effects of segment-specific biomechanical constraints at the level of motor execution, and by neighborhood density at the planning level. Within the framework of Articulatory Phonology, shorter planning latencies for CV syllables (compared to VC) have been attributed to quicker stabilization for tighter gestural coupling hypothesized for in-phase coupling of the onset consonant and release with the vowel. We attempted to test both onset complexity (C vs CC) and coda complexity (open vs. closed syllables) within a This studycomparesthetimetoinitiatewordswithvaryingsyllablestructures(V,VC,CV,CVC,CCV, CCVC). Inordertotestthehypothesisthatdifferentsyllablestructuresrequiredifferentamountsof time topreparetheirtemporalcontrols,orplans,twodelayednamingexperimentswerecarriedout.In the firstofthesetheinitiationtimewasdeterminedfromacousticrecordings.Theresultsconfirmedthe hypothesisbutalsoshowedaninteractionwiththeinitialsegment(i.e.,vowel-initialwordswere initiatedlaterthanwordsbeginningwithconsonants,butthisdifferencewasmuchsmallerforwords startingstopscomparedto/l/or/s/).Addingacodadidnotaffecttheinitiationtime.Inordertoruleout effects ofsegment-specificarticulatorytoacousticintervaldifferences,asecondexperimentwas performedinwhichspeechmovementsofthetongue,thejawandthelipswererecordedbymeansof electromagneticarticulography.Resultsfrominitiationtime,basedonarticulatorymeasurements, showedasignificantsyllablestructureeffectwithVCwordsbeinginitiatedsignificantlylaterthan CV(C) words.Onlyminoreffectsoftheinitialsegmentwerefound.Theseresultscanbepartly explainedbytheamountofaccumulatedexperienceaspeakerhasincoordinatingtherelevantgesture combinationsandtriggeringthemappropriatelyintime.
 
Article
This study was designed to examine the feasibility of using the spectral mean and/or spectral skewness to distinguish between alveolar and palato-alveolar fricatives produced by individual adult speakers of English. Five male and five female speaker participants produced 100 CVC words with an initial consonant /s/ or /ʃ/. The spectral mean and skewness were derived every 10 milliseconds throughout the fricative segments and plotted for all productions. Distinctions were examined for each speaker through visual inspection of these time history plots and statistical comparisons were completed for analysis windows centered 50 ms after the onset of the fricative segment. The results showed significant differences between the alveolar and palato-alveolar fricatives for both the mean and skewness values. However, there was considerable inter-speaker overlap, limiting the utility of the measures to evaluate the adequacy of the phonetic distinction. When the focus shifted to individual speakers rather than average group performance, only the spectral mean distinguished consistently between the two phonetic categories. The robustness of the distinction suggests that intra-speaker overlap in spectral mean between prevocalic /s/ and /ʃ/ targets may be indicative of abnormal fricative production and a useful measure for clinical applications.
 
Article
Russian maintains a contrast between non-palatalized and palatalized trills that has been lost in most Slavic languages. This research investigates the phonetic expression of this contrast in an attempt to understand how the contrast is maintained. One hypothesis is that the contrast is stabilized through resistance to coarticulation between the trill and surrounding vowels and prosodic positional weakening effects-factors expected to weaken the contrast. In order to test this hypothesis, we investigate intrasegmental and intersegmental coarticulation and the effect of domain boundaries on Russian trills. Since trills are highly demanding articulatorily and aerodynamically, and since Russian trills are in contrast, there is an expectation that they will be highly resistant to coarticulation and to prosodic influence. This study shows, however, that phonetic variability due to domain boundaries and coarticulation is systematically present in Russian trills. Implications of the relation between prosodic position and lingual coarticulation for the Degree of Articulatory Constraint (DAC) model, Articulatory Phonology, and the literature on prosodic strength are discussed. Based on the quantitative analysis of phonetic variability in Russian trills, we conjecture a hypothesis on why the contrast in trills is maintained in Russian, but lost in other Slavic languages. Specifically, phonological strategies used by several Slavic languages to deal with the instability of Proto-Slavic palatalized trills are present phonetically in Russian. These phonetic tendencies structure the variability of Russian trills, and could be the source of contrast stabilization.
 
Article
Speech errors are known to exhibit an intrusion bias in that segments are added rather than deleted; also, a shared final consonant can cause an interaction of the initial consonants. A principled connection between these two phenomena has been drawn in a gestural account of errors: Articulatory measures revealed a preponderance of errors in which both the target and intruding gesture are co-produced, instead of one replacing the other. This gestural intrusion bias has been interpreted as an errorful coupling of gestures in a dynamically stable coordination mode (1:1, in-phase), triggered by the presence of a shared coda consonant. Capturing tongue motion with ultrasound, the current paper investigates whether shared gestural composition other than a coda can trigger gestural co-production errors. Subjects repeated two-word phrases with alternating initial stop or fricative consonants in a coda condition (e.g., top cop), a nocoda condition (e.g., taa kaa) and a three-word phrase condition (e.g., taa kaa taa). The no-coda condition showed a lower error rate than the coda condition. The three-word phrase condition elicited an intermediate error rate for the stop consonants, but a high error rate for the fricative alternations. While all conditions exhibited both substitution and co-production errors, a gestural intrusion bias emerged mainly for the coda condition. The findings suggest that the proportion of different error types (substitutions, co-production errors) differs as a function of stimulus type: not all alternating stimulus patterns that trigger errors result in an intrusion bias.
 
Article
Research on pause duration has mainly focused on the impact of syntactic structure on the duration of pauses within an utterance and on the impact of syntax, discourse, and prosodic structure on the likelihood of pause occurrence. Relatively little is known about what factors play a role in determining the duration of pauses between utterances or phrases. Two experiments examining the effect of prosodic structure and phrase length on pause duration are reported. Subjects read sentences varying along the following parameters: a) the length in syllables of the intonational phrase preceding and following the pause, and b) the prosodic structure of the intonational phrase preceding and following the pause, specifically whether or not the intonational phrase branches into smaller phrases. In order to minimize variability due to speech rate and individual differences, speakers read sentences synchronously in dyads. The results showed a significant post-boundary effect of prosodic branching and significant pre- and post-boundary phrase length effects. The results are discussed in terms of production units.
 
Article
The area function of the vocal tract in all of its spatial detail is not directly computable from the speech signal. But is partial, yet phonetically distinctive, information about articulation recoverable from the acoustic signal that arrives at the listener's ear? The answer to this question is important for phonetics, because various theories of speech perception predict different answers. Some theories assume that recovery of articulatory information must be possible, while others assume that it is impossible. However, neither type of theory provides firm evidence showing that distinctive articulatory information is or is not extractable from the acoustic signal. The present study focuses on vowel gestures and examines whether linguistically significant information, such as the constriction location, constriction degree, and rounding, is contained in the speech signal, and whether such information is recoverable from formant parameters. Perturbation theory and linear prediction were combined, in a manner similar to that in Mokhtari (1998) [Mokhtari, P. (1998). An acoustic-phonetic and articulatory study of speech-speaker dichotomy. Doctoral dissertation, University of New South Wales], to assess the accuracy of recovery of information about vowel constrictions. Distinctive constriction information estimated from the speech signal for ten American English vowels were compared to the constriction information derived from simultaneously collected X-ray microbeam articulatory data for 39 speakers [Westbury (1994). Xray microbeam speech production database user's handbook. University of Wisconsin, Madison, WI]. The recovery of distinctive articulatory information relies on a novel technique that uses formant frequencies and amplitudes, and does not depend on a principal components analysis of the articulatory data, as do most other inversion techniques. These results provide evidence that distinctive articulatory information for vowels can be recovered from the acoustic signal.
 
Lengthening (white bar +) and shortening (black bar À) of perceived consonant duration as a function of perceptual (a) contrast with or (b) assimilation to the duration of preceding vowel.
Inversely and directly covarying stimulus pairs used in the discrimination tasks.
Summary table of the overall results.
Article
In the experiments reported here, listeners categorized and discriminated speech and non-speech analogue stimuli in which the durations of a vowel and a following consonant or their analogues were varied orthogonally. The listeners' native languages differed in how these durations covary in speakers' productions of such sequences. Because auditorist and autonomous models of speech perception hypothesize that the auditory qualities evoked by both kinds of stimuli determine their initial perceptual evaluation, they both predict that listeners from all the languages will respond similarly to non-speech analogues as they do to speech in both tasks. Because neither direct realist nor interactive models hypothesize such a processing stage, they predict instead that in the way in which vowel and consonant duration covary in the listeners' native languages will determine how they categorize and discriminate the speech stimuli, and that all listeners will categorize and discriminate the non-speech differently from the speech stimuli. Listeners' categorization of the speech stimuli did differ as a function of how these durations covary in their native languages, but all listeners discriminated the speech stimuli in the same way, and they all categorized and discriminated the non-speech stimuli in the same way, too. These similarities could arise from listeners adding the durations of the vowel and consonant intervals (or their analogues) in these tasks with these stimuli; they do so when linguistic experience does not influence them to perceive these durations otherwise. These results support an autonomous rather than interactive model in which listeners either add or apply their linguistic experience at a post-perceptual stage of processing. They do not however support an auditorist over a direct realist model because they provide no evidence that the signal's acoustic properties are transformed during the hypothesized prior perceptual stage.
 
Article
This paper examines the acoustic characteristics of voiceless sibilant fricatives in English-and Japanese-speaking adults and the acquisition of contrasts involving these sounds in 2- and 3-year-old children. Both English and Japanese have a two-way contrast between an alveolar fricative (/s/), and a postalveolar fricative (/∫/ in English and /ɕ/ in Japanese). Acoustic analysis of the adult productions revealed cross-linguistic differences in what acoustic parameters were used to differentiate the two fricatives in the two languages and in how well the two fricatives were differentiated by the acoustic parameters that were investigated. For the children's data, the transcription results showed that English-speaking children generally produced the alveolar fricative more accurately than the postalveolar one, whereas the opposite was true for Japanese-speaking children. In addition, acoustic analysis revealed the presence of covert contrast in the productions of some English-speaking and some Japanese-speaking children. The different development patterns are discussed in terms of the differences in the fine phonetic detail of the contrast in the two languages.
 
Article
Several fixed classification experiments test the hypothesis that F(1), f(0), and closure voicing covary between intervocalic stops contrasting for [voice] because they integrate perceptually. The perceptual property produced by the integration of these acoustic properties was at first predicted to be the presence of low frequency energy in the vicinity of the stop, which is considerable in [+voice] stops but slight in [-voice] stops. Both F(1) and f(0) at the edges of vowels flanking the stop were found to integrate perceptually with the continuation of voicing into the stop, but not to integrate with one another. These results indicate that the perceptually relevant property is instead the continuation of low frequency energy across the vowel-consonant border and not merely the amount of low frequency energy present near the stop. Other experiments establish that neither F(1) nor f(0) at vowel edge integrate perceptually with closure duration, which shows that only auditorily similar properties integrate and not any two properties that reliably covary. Finally, the experiments show that these acoustic properties integrate perceptually (or fail to) in the same way in non-speech analogues as in the original speech. This result indicates that integration arises from the auditory similarity of certain acoustic correlates of the [voice] contrast.
 
Article
In this study we investigated grouping-related F0 patterns in Mandarin by examining the effect of syllable position in a group while controlling for tone, speaking mode, number of syllables in a group, and group position in a sentence. We analyzed syllable duration, F0 displacement, ratio of peak velocity to F0 displacement (vp/d ratio) and shape of F0 velocity profile (parameter C) in sequences of Rising, Falling and High tones. Results showed that syllable duration had the most consistent grouping-related patterns. In a short phrase of 1-4 syllables, duration is longest in the final position, second longest in the initial position, and shortest in the medial positions. In Rising and Falling tone sequences, syllable duration was positively related to F0 displacement, but negatively related to vp/d ratio. Sequences consisting of only the High tone, however, showed no duration-matching F0 variations. Modeling simulations with a second-order linear system showed that duration variations alone could generate F0 displacement and vp/d ratio variations comparable to those in actual data. We interpret the results as evidence that grouping is encoded directly by syllable duration, while the corresponding variations in F0 displacement, vp/d ratio and velocity profile are the consequences of duration control.
 
Article
In this study, we compare the effects of English lexical features on word duration for native and non-native English speakers and for non-native speakers with different L1s and a range of L2 experience. We also examine whether non-native word durations lead to judgments of a stronger foreign accent. We measured word durations in English paragraphs read by 12 American English (AE), 20 Korean, and 20 Chinese speakers. We also had AE listeners rate the `accentedness' of these non-native speakers. AE speech had shorter durations, greater within-speaker word duration variance, greater reduction of function words, and less between-speaker variance than non-native speech. However, both AE and non-native speakers showed sensitivity to lexical predictability by reducing second mentions and high frequency words. Non-native speakers with more native-like word durations, greater within-speaker word duration variance, and greater function word reduction were perceived as less accented. Overall, these findings identify word duration as an important and complex feature of foreign-accented English.
 
Article
The current study focused on the production of non-contrastive geminates across different boundary types in English to investigate the hypothesis that word-internal heteromorphemic geminates may differ from those that arise across a word boundary. In this study, word-internal geminates arising from affixation, and described as either assimilated or concatenated, were matched to heteromorphemic geminates arising from sequences of identical consonants that spanned a word boundary and to word-internal singletons. Word-internal geminates were found to be longer than matched singletons in absolute and relative terms. By contrast, heteromorphemic geminates that occurred at word boundaries were only longer than matched singletons in absolute terms. In addition, heteromorphemic geminates in two word phrases were typically "pulled apart" in careful speech; that is, speakers marked the boundaries between free morphemes with pitch changes and pauses. Morpheme boundaries in words with bound affixes were very rarely highlighted in this way. These results are taken to indicate that most word-internal heteromorphemic geminates are represented as a single long consonant in the speech plan rather than as a consonant sequence. Only those geminates that arise in two word phrases exhibit phonetic characteristics that are fully consistent with the representation of two identical consonants crossing a morpheme boundary.
 
Summary of the significant main effects and interactions.
Scatterplots of the relationships between LOR and natural speech /a-l/ perception, LOR and degree of separation on F3 when perceiving /a-l/, and natural speech /a-l/ perception and degree of F3 separation when perceiving /a-l/.
Article
This study tested the predictions of the Speech Learning Model (SLM, Flege, 1988) on the case of native Japanese (NJ) speakers' perception and production of English /ɹ / and /l/. NJ speakers' degree of foreign accent, intelligibility of /ɹ -l/ productions, and ability to perceive natural speech /ɹ -l/ were assessed as a function of length of residency in North America, age of arrival in North America, years of student status in an English environment, and percentage of Japanese usage. Additionally, the extent to which NJ speakers' utilized the F3 onset cue when differentiating /ɹ -l/ in perception and production was assessed, this cue having previously been shown to be the most reliable indicator of category membership. As predicted, longer residencies predicted more native English-like accents, more intelligible productions, and more accurate natural speech identifications; however, no changes were observed in F3 reliance, indicating that though performance improves it does so through reliance on other cues.
 
Article
A locus equation describes a 1st order regression fit to a scatter of vowel steady-state frequency values predicting vowel onset frequency values. Locus equation coefficients are often interpreted as indices of coarticulation. Speaking rate variations with a constant consonant-vowel form are thought to induce changes in the degree of coarticulation. In the current work, the hypothesis that locus slope is a transparent index of coarticulation is examined through the analysis of acoustic samples of large-scale, nearly continuous variations in speaking rate. Following the methodological conventions for locus equation derivation, data pooled across ten vowels yield locus equation slopes that are mostly consistent with the hypothesis that locus equations vary systematically with coarticulation. Comparable analyses between different four-vowel pools reveal variations in the locus slope range and changes in locus slope sensitivity to rate change. Analyses across rate but within vowels are substantially less consistent with the locus hypothesis. Taken together, these findings suggest that the practice of vowel pooling exerts a non-negligible influence on locus outcomes. Results are discussed within the context of articulatory accounts of locus equations and the effects of speaking rate change.
 
Article
The coordination of velum and oral gestures for English [n] is studied using real-time MRI movies to reconstruct vocal tract aperture functions. This technique allows for the examination of parts of the vocal tract otherwise inaccessible to dynamic imaging or movement tracking. The present experiment considers syllable onset, coda, and juncture geminate nasals and also addresses the effects of a variety of word stress patterns on segment internal coordination. We find a bimodal timing pattern in which near-synchrony of velum lowering and tongue tip raising characterizes the timing for onsets and temporal lag between the gestures is characteristic for codas, supporting and extending the findings of Krakow (1989), 1993) for [m]. Intervocalic word-internal nasals are found to have timing patterns that are sensitive to the local stress context, which suggests the presence of an underlying timing specification that can yield flexibly. We consider these findings in light of the gestural coupling structures described by Goldstein and colleagues (Goldstein, Byrd, & Saltzman 2006; Nam, Goldstein, and Saltzman in press; Goldstein, Nam, Saltzman, & Chitoran 2008).
 
Article
Words can be pronounced in multiple ways in casual speech. Corpus analyses of the frequency with which these pronunciation variants occur (e.g., Patterson & Connine, 2001) show that typically, one pronunciation variant tends to predominate; this raises the question of whether variant recognition is aligned with exposure frequency. We explored this issue in words containing one of four phonological contexts, each of which favors one of four surface realizations of word-medial /t/: [t], [ʔ], [ɾ], or a deleted variant. The frequencies of the four realizations in all four contexts were estimated for a set of words in a production experiment. Recognition of all pronunciation variants was then measured in a lexical decision experiment. Overall, the data suggest that listeners are sensitive to variant frequency: Word classification rates closely paralleled production frequency. The exceptions to this were [t] realizations (i.e., canonical pronunciations of the words), a finding which confirms other results in the literature and indicates that factors other than exposure frequency affect word recognition.
 
Article
Cross-language differences in phonetic settings for phonological contrasts of stop voicing have posed a challenge for attempts to relate specific phonological features to specific phonetic details. We probe the phonetic-phonological relationship for voicing contrasts more broadly, analyzing in particular their relevance to nonnative speech perception, from two theoretical perspectives: feature geometry and articulatory phonology. Because these perspectives differ in assumptions about temporal/phasing relationships among features/gestures within syllable onsets, we undertook a cross-language investigation on perception of obstruent (stop, fricative) voicing contrasts in three nonnative onsets that use a common set of features/gestures but with differing time-coupling. Listeners of English and French, which differ in their phonetic settings for word-initial stop voicing distinctions, were tested on perception of three onset types, all nonnative to both English and French, that differ in how initial obstruent voicing is coordinated with a lateral feature/gesture and additional obstruent features/gestures. The targets, listed from least complex to most complex onsets, were: a lateral fricative voicing distinction (Zulu /ɬ/-ɮ/), a laterally-released affricate voicing distinction (Tlingit /tɬ/-/dɮ/), and a coronal stop voicing distinction in stop+/l/ clusters (Hebrew /tl/-/dl/). English and French listeners' performance reflected the differences in their native languages' stop voicing distinctions, compatible with prior perceptual studies on singleton consonant onsets. However, both groups' abilities to perceive voicing as a separable parameter also varied systematically with the structure of the target onsets, supporting the notion that the gestural organization of syllable onsets systematically affects perception of initial voicing distinctions.
 
Article
Little is known about how listeners judge phonemic versus allophonic (or freely varying) versus post-lexical variations in voice quality, or about which acoustic attributes serve as perceptual cues in specific contexts. To address this issue, native speakers of Gujarati, Thai, and English discriminated among pairs of voices that differed only in the relative amplitudes of the first versus second harmonics (H1-H2). Results indicate that speakers of Gujarati (which contrasts H1-H2 phonemically) were more sensitive to changes than are speakers of Thai or English. Further, sensitivity was not affected by the overall source spectral slope for Gujarati speakers, unlike Thai and English speakers, who were most sensitive when the spectrum fell away steeply. In combination with previous findings from Mandarin speakers, these results suggest a continuum of sensitivity to H1-H2. In Gujarati, the independence of sensitivity and spectral context is consistent with use of H1-H2 as a cue to the language's phonemic phonation contrast. Speakers of Mandarin, in which creaky phonation occurs in conjunction with the low dipping Tone 3, apparently also learn to hear these contrasts, but sensitivity is conditioned by spectral context. Finally, for Thai and English speakers, who vary phonation only post-lexically, sensitivity is both lower and contextually-determined, reflecting the smaller role of H1-H2 in these languages.
 
Article
Voice onset time (VOT) is known to vary with place of articulation. For any given place of articulation there are differences from one language to another. Using data from multiple speakers of 18 languages, all of which were recorded and analyzed in the same way, we show that most, but not all, of the within language place of articulation variation can be described by universally applicable phonetic rules (although the physiological bases for these rules are not entirely clear). The between language variation is also largely (but not entirely) predictable by assuming that languages choose one of the three possibilities for the degree of aspiration of voiceless stops. Some languages, however, have VOTs that are markedly different from the generally observed values. The phonetic output of a grammar has to contain language specific components to account for these results.
 
Article
Using monosyllabic words that can be continued to quadrisyllabic words (for example, sei, Seiko, Seikola, Seikolasta), all spoken with two degrees of prominence (unaccented and strongly accented), this study examined the temporal and tonal domains of accent in Finnish. Large accentual lengthening was observed to extend from word onset to the end of the third syllable, with minor lengthening appearing on the first segment of the fourth syllable. The tonal domain of accentuation in turn was observed to extend from word onset to the middle of the third syllable, and in shorter words, to a corresponding temporal location in the next word. Thirdly, it was observed that polysyllabic shortening does not operate in Finnish: word length (number of constituent syllables) has no overall effect on segment durations. The results, together with previous ones, show that in Finnish, a full-fledged quantity language, segment durations are adjusted to achieve a temporally and tonally uniform realization of accent. This is contrary to the situation in many nonquantity languages, in which the temporal realization of accent varies as a function of the segmental structure of the accented syllable.
 
Mean foreign accent ratings obtained for subgroups of native Korean (NK) children (n ¼ 6 each) grouped on age of arrival in North America and native English (NE) children (n ¼ 6 each) grouped on age. The NE children's values are plotted alongside those of the NK subgroup closest in chronological age. The error bars enclose 71 SE.
Summary of ANOVAs examining mean listener-and talker-based foreign accent ratings (see text)
Pearson correlations between ratings of English sentence production and five variables, as well as inter-variable correlations, for the native Korean adults and children
Article
The purpose of this longitudinal study was to evaluate the influence of age (adult vs. child) and length of residence (LOR) in an L2-speaking country (3 vs. 5 years) on degree of foreign accent in a second language (L2). Korean adults and children living in North America, and age-matched groups of native English (NE) adults and children, recorded English sentences in sessions held 1.2 years apart (T1 vs. T2). NE-speaking listeners rated the sentences for overall degree of perceived foreign accent using a 9-point scale. The native Korean (NK) children received significantly higher ratings than the NK adults did, but lower ratings than the NE children. The NK children—even those who had arrived as young children and been enrolled in English-medium schools for an average of 4 years—spoke English with detectable foreign accents. The effects of LOR and the T1–T2 differences were non-significant for both the NK adults and the NK children. The findings were inconsistent with the hypothesis that adult–child differences in L2 speech learning are due to the passing of a critical period. The suggestion is made that the milder foreign accents observed for children than adults are due, at least in part, to the greater L2 input typically received by immigrant children than adults.
 
Top-cited authors
James Emil Flege
  • University of Alabama at Birmingham
jean-luc Schwartz
Louis-Jean Boe
  • University of Grenoble
Nathalie Vallée
  • Grenoble Institute of Technology
Laura C Dilley
  • Michigan State University