Chapter

The origin of coarticulation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The variation that a speech sound undergoes under the influence of neighbouring sounds has acquired the well-established label coarticulation. The phenomenon of coarticulation has become a central problem in the theory of speech production. Much experimental work has been directed towards discovering its characteristics, its extent and its occurrence across different languages. This book is a major study of coarticulation by a team of international researchers. It provides a definitive account of the experimental findings to date, together with discussions of their implications for modelling the process of speech production. Different components of the speech production system (larynx, tongue, jaw, etc.) require different techniques for investigation and a whole section of this book is devoted to a description of the experimental techniques currently used. Other chapters offer a theoretically sophisticated discussion of the implications of coarticulation for the phonology-phonetics interface.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It is found that before the acoustic closure of the intervocalic C (e.g., the gap in F2 and F3 in Figure 1b), formant transitions toward the contrasting V2 can already be observed (e.g., the 3 upward movement of F2 before the gap in Figure 1b). This has made the notion of articulatory syllable appear too restricted (Kühnert & Nolan, 1999), because coarticulation seems to go beyond syllable boundaries. The alternative hypothesis is then formulated that the vowel properties occurring before the landmark-based consonant onset is due to anticipatory coarticulation (Daniloff & Hammarberg, 1973). ...
... This is not a fully novel idea, as it is consistent with the articulatory syllable hypothesis (Kozhevnikov & Chistovich, 1965). But the findings of the present study have answered many of the uncertainties followed the initial proposal of the articulatory syllable hypothesis (Kühnert & Nolan, 1999). Also, the new evidence for dimension-specific sequential articulation addressed the conundrum of coarticulation resistance (Bladon & Al-Bamerni, 1967;Recasens, 1984Recasens, , 1987Recasens, , 1989. ...
Article
Full-text available
This study tested the hypothesis that consonant and vowel are synchronised at the syllable onset, and that such synchronised co-onset is the essence of coarticulation. Articulatory data were collected for Mandarin Chinese, using Electromagnetic Articulography (EMA), and acoustic data were collected simultaneously. As a departure from conventional approaches, a minimal triplet paradigm was applied, in which divergence points between movement trajectories in contrastive pairs were used to determine segmental onsets. Triplets of disyllabic words consisting of two matching contrastive pairs in a C1V1#C2V2 structure were used, whereby the consonant pair differed only in C2 and the vowel pair differed only in V2 (the numerical indices indicate syllable position). Both articulatory and acoustical results showed that the articulation of vowels and consonants started at about the same time, thus supporting the CV synchrony hypothesis. The realisation of CV synchronisation was dimension specific, however. For any particular articulator, only the dimensions free of consonantal requirement started their movements toward the vowel from the syllable onset, while the rest of the dimensions moved toward successive consonantal and vocalic targets. The finding of CV co-onset increases the amount of temporal overlap between C and V relative to the widely assumed CV asynchrony. The evidence of dimension-specific sequential articulation sheds further light on coarticulation by offering a timing-based explanation for the well-known phenomenon of coarticulation resistance.
... Menzerath and de Lacerda (1933) observed in German, based on the means of articulatory observation at the time, that the lip movement for the vowel /u/ in /pu/ starts at about the same time as the articulation of the initial consonant. They proposed that the phenomenon was due to a general organization principle of "koartikulation" for articulatory control, a term that later became popularized as the English word "coarticulation" (Kühnert and Nolan, 1999). A phenomenon of syllable-based coarticulation similar to the original one was again observed in Russian by Kozhevnikov and Chistovich (1965). ...
Article
Full-text available
Recent research has shown evidence based on a minimal contrast paradigm that consonants and vowels are articulatorily synchronized at the onset of the syllable. What remains less clear is the laryngeal dimension of the syllable, for which evidence of tone synchrony with the consonant-vowel syllable has been circumstantial. The present study assesses the precise tone-vowel alignment in Mandarin Chinese by applying the minimal contrast paradigm. The vowel onset is determined by detecting divergence points of F2 trajectories between a pair of disyllabic sequences with two contrasting vowels, and the onsets of tones are determined by detecting divergence points of f0 trajectories in contrasting disyllabic tone pairs, using generalized additive mixed models (GAMMs). The alignment of the divergence-determined vowel and tone onsets is then evaluated with linear mixed effect models (LMEMs) and their synchrony is validated with Bayes factors. The results indicate that tone and vowel onsets are fully synchronized. There is therefore evidence for strict alignment of consonant, vowel and tone as hypothesized in the synchronization model of the syllable. Also, with the newly established tone onset, the previously reported 'anticipatory raising' effect of tone now appears to occur within rather than before the articulatory syllable. Implications of these findings will be discussed.
... Coarticulation has been most commonly explored in the context of linguistics, where the term refers to how the production of a phoneme is influenced by those that precede and follow it [26]. ...
Preprint
Full-text available
In music performance contexts, vocalists tend to gesture in ways that show both similarities and idiosyncrasies across performers. We present a quantitative analyses and visualisation pipeline that characterises the multidimensional codependencies of spontaneous body movements and vocalisations in vocal performers. We apply this pipeline to a dataset of performances within the Karnatak music tradition of South India, including audio and motion tracking data, openly published with this report. Our results show that time-varying features of head and hand gestures tend to be more similar when the concurrent vocal time-varying features are also more similar. While for each performer we find clear co-structuring of sound and movement, they each show their own characteristic salient dimensions (e.g., hand position, head acceleration) on which movement is coarticulated with singing. Our analyses thereby provide a computational characterisation of each performer’s unique multimodal coarticulations with singing. The results support our conceptual contribution of widening the conception of coarticulation, from within a ‘modality’ (e.g., speech articulator positions, joint angles in reaching), to a multimodal coarticulation constrained by both physiological and aesthetic ‘control parameters’ that reduce degrees of freedom of the multimodal performance such that motifs that sound alike tend to co-structure with gestures that move alike. See our multimodal data dashboard: https://tsg-131-174-75-200.hosting.ru.nl/karnatak/
... In fact, selecting a specific measurement point inevitably influences the characterization of the vowel to some extent by phenomena such as co-articulation (e.g. Kühnert and Nolan, 1999;Farnetani and Recasens, 2010;Embarki and Dodane, 2011). ...
Article
Full-text available
This study investigated the performance of several metrics used to evaluate spectral stability in vowels. Four metrics suggested in the literature and a newly developed one were tested and compared to the traditional method of associating the spectrally stable portion with the middle of the vowel. First, synthetic stimuli whose spectrally stable portion had been defined in advance were used to evaluate the potential of the different metrics to capture spectral stability. Second, the output of the different metrics on the acoustic measurements obtained in the vowel portions identified as spectrally stable was compared on both synthesized and natural speech. It is clear that higher-dimensional features are needed to capture spectral stability and that the best-performing metrics yield acoustic measurements that are similar to those obtained in the middle of the vowel. This study empirically validates long-standing intuitions about the validity of selecting the middle section of vowels as the preferred method to identify the spectrally stable region in vowels.
... Being the basic blocks of our mental model of speech, phonemes seem to be at the adequate level of abstraction. However, the continuous nature of the speech phenomenon, conflicting with the discrete nature of that model, causes coarticulation phenomena [72]; those make it difficult to synthesize text based on a naïve phoneme-based representation. ...
Thesis
The study of Voks is presented here. Voks is a family of vocal instruments that allows for control, using one’s hands, of intonation, rhythmic sequencing, and vocal quality parameters of an articulated voice. This dissertation focuses more specifically on the question of rhythmic control, which is here based on the frame/content theory of speech production. It relies on matching impulsions produced by a biphasic tapping gesture, with control points placed on a prerecorded voice sample. This mode of operation relies on a complex loop that involves both gestural production and auditory perception of rhythm, which can be modeled by the notion of the P-center, or perceptual center. Direct manual control of melody and vocal effort, also known as chironomy, is performed using continuous controllers such as the graphic tablet and the theremin, enabling new expressive gestures. For synthesis, two distinct vocoders, World and SuperVP, are integrated into Voks and compared. They make it possible to control melody, rhythm and vocal timbre. Voks takes as input voice samples equipped with syllabic labels; those data can be either directly fed as input or generated automatically on the fly. The Voks instrument family has been applied to music and to foreign language acquisition.
... The term coarticulation refers to the process by which neighboring sounds influence the production of one another. This is not a marginal process, as it is known that sounds are not uttered in isolation as they are all produced by a single vocal tract that has to adjust to the elaboration of continuous sounds (Kühnert & Nolan, 1999). ...
Thesis
Full-text available
This dissertation investigates language-specific acoustic and aerodynamic phenomena in language contact situations. Whereas most work on second language and bilingual phonology has focused on individual consonants and vowels, this project examines patterns of coarticulation in the two languages of Spanish-English and French-English bilingual speakers. These include speakers whose first language is either Spanish, French or English, and are late second-language acquirers, and heritage speakers of Spanish, who are early second-language acquirers. I focus on subtly different coarticulation patterns between English and Spanish, including the extent to which vowels are nasalized in contact with nasal consonants (Chapter 2), are lengthened before voiced consonants (Chapter 3), and whose quality is affected before voiced consonants (Chapter 4). Whereas the existence of such effects can be taken as universal, the degree to which they are implemented varies from language to language, presumably contributing to what defines a ‘native accent.’ My work thus presents a novel method to investigate coarticulatory patterns. The theoretical question that I address in my dissertation is whether bilingual speakers can establish distinct coarticulatory patterns in their two languages in ways that are similar to those of monolinguals of the two languages. A related question is to what extent learning both languages in childhood (as in the case of heritage speakers) facilitates separating the two phonetic systems. In Chapter 2, I study coarticulatory vowel nasalization in Spanish and English using pressure transducers and Generalized Additive Mixed Models to observe how nasal airflow changes over time. In Chapter 3, I focus on vowel length as a cue for voicing of the following consonant in two Romance languages (Spanish, French) and English, which show opposite patterns. Chapter 4 is about vowel formant displacement patterns across time and the effect of vocalic length in Spanish and English. In Chapter 5, I present a new phonological model, “The Bilingual Coarticulatory Model”, which describes coarticulation as malleable and adjustable cross-linguistically in bilingual speakers that possess a higher level of linguistic proficiency. Results show that properties pertaining to vowel quality are easier to acquire than durational properties, which would go against some of the L2 literature on the acquisition of vowels. Native speakers of Spanish show native-like nasalization values in L2 English, yet only when the syllabic structure of sequences is shared. Heritage speakers show native-like results in both languages with regard to nasalization, and L1En speakers show an adjustment of onset of nasalization but not of degree of nasalization. Regarding duration, heritage speakers were the only group to completely separate the two coarticulatory systems, as the other groups showed cross-linguistic influence. Finally, regarding the dynamics of vowel formants, speakers transfer L1 patterns to the L2. Linguistic proficiency in the L2 was a significant factor to acquire coarticulatory patterns. In the case of heritage speakers, different findings were found depending on the variable under study.
... Indeed, if a speech unit is anticipated into a preceding one, this implies that both units have been concomitantly planned and sent for execution. In other words, their coproduction is a reflection of the fact that the two units are encoded and coordinated in the same speech plan (see Whalen 1990;Kühnert and Nolan 1999;Ma et al. 2015;Recasens 2018). ...
Article
Full-text available
In this study, we test whether anticipatory Vowel-to-Vowel coarticulation varies with age in the speech of 246 adult French speakers aged between 20 and 93. The relationship between coarticulation and the known age-related change in speech rate is also investigated. The results show a gradual decrease in the amount of coarticulation for speakers from 20 to mid-50s, followed by a more abrupt decrease for speakers older than 70. For speakers in between, diverse coarticulation profiles emerge. Speech rate is also found to evolve from early to late adulthood and not only for older speakers; it shows a gradual decrease for speakers up to mid-50s and a more abrupt deceleration afterwards. Yet, the relationship between rate and coarticulation is not linear; it appears stronger for the younger speakers, with faster speakers coarticulating more, than for the adults over 70 y.o.a. Results are discussed in relation to possible changes in the parametrization and coordination of speech units at different ages.
... [k̠ ] when followed by a back vowel) (Kühnert & Nolan, 1999), both assimilatory and dissimilatory tonal contextual effects have been reported, and the term 'tonal coarticulation' has been used in the literature to refer to general contextual effects and include both types of contextual effects. Dissimilatory effects have been observed in both progressive and regressive tone coarticulation in Cantonese and Mandarin. ...
Article
https://www.sciencedirect.com/science/article/pii/S0167639318303017?dgcid=author The task of forensic voice comparison (FVC) often involves the comparison of a voice in an offender recording with that in a suspect recording, with the aim to assist the investigating authority or the court in determining the identity of the speaker. One of the main goals in FVC research is to identify speech variables that are useful for differentiating speakers. While French and Stevens (2013) stated that connected speech processes (CSPs) vary across speakers and thus CSPs may be included in the ‘toolbox’ for forensic voice comparison casework, little empirical research has been done to test how effective various CSPs are in speaker discrimination. This paper reports an exploratory study comparing the speaker-discriminatory power of lexical tones in their citation forms and coarticulated tones. 20 Cantonese and 20 Mandarin speakers were instructed to produce tones under different speech rates and tonal contexts. Results based on discriminant analysis show that the combination of normal speech rate and compatible tonal context appears to have yielded the best speaker discrimination. On the other hand, the combination of fast speech and a conflicting tonal context, which in principle led to the greatest tonal coarticulatory effects, yielded the worst speaker discrimination. The addition of duration on top of tonal f0 significantly improved the classification rates in both languages. Furthermore, for the same tone categories, the Mandarin ones generally discriminate speakers better than the Cantonese counterparts, suggesting that tone inventory density affects the speaker-discriminatory power of tones. Implications of the findings for forensic speaker comparison are discussed.
... [k̠ ] when followed by a back vowel) (Kühnert & Nolan, 1999), both assimilatory and dissimilatory tonal contextual effects have been reported, and the term 'tonal coarticulation' has been used in the literature to refer to general contextual effects and include both types of contextual effects. Dissimilatory effects have been observed in both progressive and regressive tone coarticulation in Cantonese and Mandarin. ...
Article
The task of forensic voice comparison (FVC) often involves the comparison of a voice in an offender recording with that in a suspect recording, with the aim to assist the investigating authority or the court in determining the identity of the speaker. One of the main goals in FVC research is to identify speech variables that are useful for differentiating speakers. While French and Stevens (2013) stated that connected speech processes (CSPs) vary across speakers and thus CSPs may be included in the 'toolbox' for forensic voice comparison casework, little empirical research has been done to test how effective various CSPs are in speaker discrimination. This paper reports an exploratory study comparing the speaker-discriminatory power of lexical tones in their citation forms and coarticulated tones. 20 Cantonese and 20 Mandarin speakers were instructed to produce tones under different speech rates and tonal contexts. Results based on discriminant analysis show that the combination of normal speech rate and compatible tonal context appears to have yielded the best speaker discrimination. On the other hand, the combination of fast speech and a conflicting tonal context, which in principle led to the greatest tonal coarticulatory effects, yielded the worst speaker discrimination. The addition of duration on top of tonal f0 significantly improved the classification rates in both languages. Furthermore, for the same tone categories, the Mandarin ones generally discriminate speakers better than the Cantonese counterparts, suggesting that tone inventory density affects the speaker-discriminatory power of tones. Implications of the findings for forensic speaker comparison are discussed.
... As such, many of the articulatory mechanisms and aerodynamic consequences described are still speculative and in need of experimental verfication. Nevertheless, this study shows that appropriate interpretation of the acoustics alone can provide invaluable insights into details of articulatory overlap, which are often considered to be lost on the acoustic 'surface' (Browman & Goldstein 1986, Kühnert & Nolan 1999. Being able to identify differences in articulatory synchronization from the acoustic record can be crucial in understanding the complex details of organizational structure in spoken language which is not readily amenable to the intrusion of articulatory and aerodynamic instrumentation, such as conversation. ...
Article
Different types of non-pulmonic sound production found in read and spontaneous German are described and exemplified. Stop releases driven by a glottalic airstream are found in sequences of final plosive plus glottalized vowel onset. Both ingressive and egressive velaric stop releases are considered to arise from double occlusions involving partial articulatory overlap in sequences of dorsal stop followed by an apical or bilabial stop. Certain aspects of the articulatory and aerodynamic mechanisms involved remain unclear. However, it is argued that an identification of these patterns in the acoustic record alone provides an invaluable insight into aspects of articulatory synchronization in types of spoken language, such as conversation, which are not generally amenable to the more intrusive methods of articulatory analysis.
Article
Chapter 9 explores prosodic structure as an integral component of linguistic structure. Prosodic structure specifies how phonological constituents are to be grouped to form larger units within a given utterance; this is known as their delimitative function. Prosodic structure also helps determine which of phonological constituents are produced with prominence relative to the other constituents; this is known as its culminative function. These functions entail strengthening of segmental realization (prosodic strengthening), often leading to linguistic enhancement of syntagmatic and paradigmatic contrast. Theories of the phonetics-prosody interface assume that phonetic realization of the spoken utterance is fine-tuned according to prosodic structure. In turn, crucial aspects of phonetic realization signal higher-order prosodic structure for listeners.
Article
This acoustic study explores how Korean learners produce coarticulatory vowel nasalization in English that varies with prosodic structural factors of focus-induced prominence and boundary. N-duration and A1-P0 (degree of V-nasalization) are measured in consonant-vowel-nasal (CVN) and nasal-vowel-consonant (NVC) words in various prosodic structural conditions (phrase-final vs. phrase-medial; focused vs. unfocused). Korean learners show a systematic fine-tuning of the non-contrastive V-nasalization in second language (L2) English in relation to prosodic structure, although it does not pertain to learning new L2 sound categories (i.e., L2 English nasal consonants are directly mapped onto Korean nasal consonants). The prosodic structurally conditioned phonetic detail in English appears to be accessible in most part to Korean learners and was therefore reflected in their production of L2 English. Their L2 production, however, is also found to be constrained by their first language (L1-Korean) to some extent, resulting in some phonetic effects that deviate from both L1 and L2. The results suggest that the seemingly low-level coarticulatory process is indeed under the speaker’s control in L2, which reflects interactions of the specificities of the phonetics-prosody interface in L1 and L2. The results are also discussed in terms of their implications for theories of L2 phonetics.
Article
Full-text available
Previous studies suggest that listeners may use segmental coarticulation cues to facilitate spoken word recognition. Based on existing production studies which showed a pre-low raising effect in Cantonese tonal coarticulation, this study used a word identification task to investigate whether the tonal coarticulatory cue, carried by high-level and rising tones, was used when native listeners recognized pre-low and pre-high disyllabic words. The finding indicated that the listeners may rely on F0 of the rising tone to resolve lexical competition when hearing pre-high words. However, it did not provide evidence supporting the use of pre-low raising cue in spoken word recognition.
Article
When speech sounds are produced, articulatory movements for one sound overlap with those of the surrounding sounds, generating articulatory and acoustic signals that at any point in time are informative about two or more sounds, not just one. This process of intermingling of information about several speech sounds in the articulatory and acoustic signals is called coarticulation. This chapter synthesises theories and experimental findings of the last century on the nature of coarticulation, and shows how our modern understanding of this complex process is deeply rooted in theories that have evolved over decades due to novel experimental findings as well as critique from competing theories. After discussion of our current understanding of coarticulation, some suggestions for initiating students into the surprising effects of coarticulation are introduced.
Chapter
Full-text available
This chapter deals with two categories of “complex segments,” viz. consonants with secondary articulation, such as /k w / or /l ɣ / (= /ɫ/ ‘dark l ’), and consonants that involve two articulations of equal status, such as / /. The main difference between these two types of sounds is that in the former there is a major (“consonantal”) articulatory stricture on which a vowel‐like minor articulation is superimposed, while in the latter the two articulations have a stricture type of equal status (typically, stop or nasal). While consonants with secondary articulation are very common in the languages of the world, consonants with double articulation are much rarer.
Article
Full-text available
Musical performance is a multimodal experience, for performers and listeners alike. This paper reports on a pilot study which constitutes the first step toward a comprehensive approach to the experience of music as performed. We aim at bridging the gap between qualitative and quantitative approaches, by combining methods for data collection. The purpose is to build a data corpus containing multimodal measures linked to high-level subjective observations. This will allow for a systematic inclusion of the knowledge of music professionals in an analytic framework, which synthesizes methods across established research disciplines. We outline the methods we are currently developing for the creation of a multimodal data corpus dedicated to the analysis and exploration of instrumental music performance from the perspective of embodied music cognition. This will enable the study of the multiple facets of instrumental music performance in great detail, as well as lead to the development of music creation techniques that take advantage of the cross-modal relationships and higher-level qualities emerging from the analysis of this multi-layered, multimodal corpus. The results of the pilot project suggest that qualitative analysis through stimulated recall is an efficient method for generating higher-level understandings of musical performance. Furthermore, the results indicate several directions for further development, regarding observational movement analysis, and computational analysis of coarticulation, chunking, and movement qualities in musical performance. We argue that the development of methods for combining qualitative and quantitative data are required to fully understand expressive musical performance, especially in a broader scenario in which arts, humanities, and science are increasingly entangled. The future work in the project will therefore entail an increasingly multimodal analysis, aiming to become as holistic as is music in performance.
Article
Full-text available
This study compares prosodic structural effects on nasal (N) duration and coarticulatory vowel (V) nasalization in NV (Nasal-Vowel) and CVN (Consonant-Vowel-Nasal) sequences in Mandarin Chinese with those found in English and Korean. Focus-induced prominence effects show cross-linguistically applicable coarticulatory resistance that enhances the vowel's phonological features. Boundary effects on the initial NV reduced N's nasality without having a robust effect on V-nasalization, whose direction is comparable to that in English and Korean. Boundary effects on the final CVN showed language specificity of V-nasalization, which could be partly attributable to the ongoing sound change of coda nasal lenition in Mandarin.
Article
Full-text available
This study investigates focus and boundary effects on Korean nasal consonants and vowel nasalization. Under focus, nasal consonants lengthen in CVN# but shorten in #NVC, enhancing [nasal] vs [oral]. Vowels resist nasalization under focus, enhancing [oral]. Domain-initial nasal consonants denasalize, exercising no coarticulatory influence. Domain-final nasal consonants shorten counter to expectation, although vowel nasalization increases. Comparison with English data reveals similarities (focus-induced coarticulatory resistance) despite cross-linguistic differences in marking prominence, but it also suggests that prosodic-structural conditioning of non-contrastive vowel nasalization, albeit based on phonetic underpinnings of coarticulatory process, is fine-tuned in language-specific ways, resulting in cross-linguistic variation.
Article
Full-text available
The acquisition of Spanish liquids is studied in Mexican children between the ages of two and six. Liquids are analyzed both segmentally, and in their interaction with the context in which they occur. Individual differences in the acquisition of liquids are identified, and their motivations discussed. The reanalysis of morphological boundaries and the analogical treatment of phonological sequences are shown to play a role in the distribution of liquids in the child’s input, and to be a source of output-differences between children and adults. These findings are helpful to a better understanding of the atypical language acquisition and logopedia.
Conference Paper
Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.
Chapter
Francis Nolan (1952– ) is professor of phonetics in the Department of Linguistics at the University of Cambridge.
Article
Full-text available
This study explores the relationship between prosodic strengthening and linguistic contrasts in English by examining temporal realization of nasals (N-duration) in CVN# and #NVC, and their coarticulatory influence on vowels (V-nasalization). Results show that different sources of prosodic strengthening bring about different types of linguistic contrasts. Prominence enhances the consonant׳s [nasality] as reflected in an elongation of N-duration, but it enhances the vowel׳s [orality] (rather than [nasality]) showing coarticulatory resistance to the nasal influence even when the nasal is phonologically focused (e.g., mob-bob; bomb-bob). Boundary strength induces different types of enhancement patterns as a function of prosodic position (initial vs. final). In the domain-initial position, boundary strength reduces the consonant׳s [nasality] as evident in a shortening of N-duration and a reduction of V-nasalization, thus enhancing CV contrast. The opposite is true with the domain-final nasal in which N-duration is lengthened accompanied by greater V-nasalization, showing coarticulatory vulnerability. The systematic coarticulatory variation as a function of prosodic factors indicates that V-nasalization as a coarticulatory process is indeed under speaker control, fine-tuned in a linguistically significant way. In dynamical terms, these results may be seen as coming from differential intergestural coupling relationships that may underlie the difference in V-nasalization in CVN# vs. #NVC. It is proposed that the timing initially determined by such coupling relationships must be fine-tuned by prosodic strengthening in a way that reflects the relationship between dynamical underpinnings of speech timing and linguistic contrasts.
Article
This article presents an analysis of small‐scale melodic movement in South Indian rāga performance employing the concept of coarticulation, defined here as the tendency for the performance of a unit to be influenced by that which precedes or follows it. Coarticulation has been much studied in phonetics and also explored to some extent in sign language and the kinematics of instrumental performance. Here I seek to account for variation in the performance of Karnatak musical units known as svaras (the scale degrees of a rāga) and gamakas (ornaments) through the phenomenon of coarticulation, thus providing an analysis of small‐scale melodic movement that focuses on the dynamic processes which form the style rather than on the categorisation of discrete elements. The material investigated is a video recording of ālāpana (improvisation) in rāga Toḍi performed by the Karnatak violinist T. V. Ramanujacharlu in Tamil Nadu, South India. A section of the recording is transcribed into staff notation and visualised through pitch‐contour graphs created in Praat sound‐analysis software. The hand movements required to produce the musical phrases are described from observation of the video alongside figures showing motion‐tracking data. Interviews with musicians, participant observation and the author's experience as a student of Karnatak violin provide the foundation for interpretation of the material. Results show that coarticulation can be seen between svaras through the oscillatory gamakas with which they are performed. Atomistic and gestural conceptions of South Indian music are discussed, following which suggestions are made for the implications of this research in modelling the Karnatak style, as well as for potential applications in musical information retrieval (MIR).
Article
This project replicates and extends previous work on coarticulation in velar-vowel sequences in English. Coarticulatory data for 46 young adult speakers, 23 who stutter and 23 who do not stutter show coarticulatory patterns in young adults who stutter that are no different from typical young adults. Additionally, the stability of velar-vowel production is analysed in token-to-token variability found in multiple repetitions of the same velar-vowel sequence. Across participants, identical patterns of coarticulation were found between people who do and do not stutter, but decreased stability was found in velar closure production in a significant subset of people who stutter. Other people who stutter appeared no different than typical speakers. Outcomes of this study suggest that articulatory maturation in young adults who stutter is, on average, no different from typical young adults, but that some young adults who stutter could be viewed as having less stably activated articulatory sub-systems.
Article
Previous analyses of the behaviour of siSwati mid vowels show conflicting results. According to Ziervogel and Mabuza, and Taljaard and Snyman, siSwati mid vowels are raised to close mid [e, o] when preceding the high vowels [i, u] (e.g. [likheʃi] ‘lift’ (n.), [inɮovu] ‘elephant’) but remain open mid [ϵ, ɔ] before [-high] vowels (e.g. [liʦemba] ‘hope n.’, [ibola] ‘football’), suggesting that siSwati has a vowel height assimilation and/or ATR assimilation. However, Kockaert, in his acoustic analysis of the same vowels in the same environments, disputes this description. He concludes that there is no significant difference in the F1, F2, and F3 frequency values of these vowels. Results of an experiment that I have conducted show that the phonological environment does influence the quality of siSwati mid vowels. The change though is not evidence of harmony but that of co-articulation.
Article
Full-text available
A machine that can read printed material to the blind became a priority at the end of World War II with the appointment of a U.S. Government committee to instigate research on sensory aids to improve the lot of blinded veterans. The committee chose Haskins Laboratories to lead a multisite research program. Initially, Haskins researchers overestimated the capacities of users to learn an acoustic code based on the letters of a text, resulting in unsuitable designs. Progress was slow because the researchers clung to a mistaken view that speech is a sound alphabet and because of persisting gaps in man-machine technology. The tortuous route to a practical reading machine transformed the scientific understanding of speech perception and reading at Haskins Labs and elsewhere, leading to novel lines of basic research and new technologies. Research at Haskins Laboratories made valuable contributions in clarifying the physical basis of speech. Researchers recognized that coarticulatory overlap eliminated the possibility of alphabet-like discrete acoustic segments in speech. This work advanced the study of speech perception and contributed to our understanding of the relation of speech perception to production. Basic findings on speech enabled the development of speech synthesis, part science and part technology, essential for development of a reading machine, which has found many applications. Findings on the nature of speech further stimulated a new understanding of word recognition in reading across languages and scripts and contributed to our understanding of reading development and reading disabilities. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Article
Full-text available
On Defining Assimilation and Coarticulation The paper is concerned with the processes, mechanisms, and causes of the mutual interaction of segments in connected speech. The factual aspects of these interactional processes and their manifestation are covered by such terms as assimilation, coarticulation, accommodation and transients. There appear to be no objective differences between these terms, i.e. the postulated disparity between linguistic and biomechanical phenomena has not been objectively justified. There are two main mechanisms (models) that attempt to explain the functioning of these phenomena - feature spreading and coproduction. We believe that feature spreading is a more suitable explanatory model, since it covers and describes a wider range of assimilatory cases than coproduction. As far as the causes of these processes are concerned, it seems that the individual types of assimilations may be governed by different principles, and most of the proposed explanations are still hypothetical in nature.
Article
Full-text available
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Article
In Standard Japanese, the phoneme /z/ is realized variably either as an affricate or a fricative. This variation was analyzed using a phonetically annotated part of the Corpus of Spontaneous Japanese. Contrary to the traditional linguistic account, the variation is not a positionally conditioned, hence categorical, allophonic variation. Although the effect of linguistic position exists to some extent, its influence is secondary compared to the influence of local temporal characteristics of speech called time allotted for consonant articulation (TACA). The overall prediction rate of the manner of /z/ articulation by means of TACA was 74%, and, when coupled with information on linguistic position, the prediction rate was as high as 80%.
Article
The goal of this study is to examine how the degree of vowel-to-vowel coarticulation varies as a function of prosodic factors such as nuclear-pitch accent (accented vs. unaccented), level of prosodic boundary (Prosodic Word vs. Intermediate Phrase vs. Intonational Phrase), and position-in-prosodic-domain (initial vs. final). It is hypothesized that vowels in prosodically stronger locations (e.g., in accented syllables and at a higher prosodic boundary) are not only coarticulated less with their neighboring vowels, but they also exert a stronger influence on their neighbors. Measurements of tongue position for English /a i/ over time were obtained with Carsten's electromagnetic articulography. Results showed that vowels in prosodically stronger locations are coarticulated less with neighboring vowels, but do not exert a stronger influence on the articulation of neighboring vowels. An examination of the relationship between coarticulation and duration revealed that (a) accent-induced coarticulatory variation cannot be attributed to a duration factor and (b) some of the data with respect to boundary effects may be accounted for by the duration factor. This suggests that to the extent that prosodically conditioned coarticulatory variation is duration-independent, there is no absolute causal relationship from duration to coarticulation. It is proposed that prosodically conditioned V-to-V coarticulatory reduction is another type of strengthening that occurs in prosodically strong locations. The prosodically driven coarticulatory patterning is taken to be part of the phonetic signatures of the hierarchically nested structure of prosody.
Article
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fisher’s Linear Discriminants to project acoustic information onto a low-dimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio–visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset.
Article
This study explores the phenomenon of coarticulation in spoken and signed language, focusing in particular on long-distance effects, defined here as the articulatory influence of one phonetic element on another across at least one intervening segment. While a great deal of variability has been found among language users in the production and perception of such effects, the fact that long-distance coarticulation occurs at all has important theoretical implications. Recent work on sign language, together with relevant spoken-language results, offers new insights and raises interesting questions concerning the human language capacity in general.
Article
Background: Apraxia of speech (AOS) is considered a disorder of speech planning or programming. Evidence for this stems from perceptual, acoustic, and electropalatographic investigations of articulation in AOS that revealed a delayed onset of anticipatory vowel gestures. Articulatory prolongation and syllable segregation have been attributed to a disturbance in anticipatory coarticulation. Aims: The aim of the current study was to investigate anticipatory lingual movement for consonantal gestures in AOS, and its impact on absolute and relative speech timing. Methods & Procedures: Tongue-tip movement and tongue-to-palate contact patterns were recorded for three speakers with AOS and a concomitant aphasia (age range = 35-63 years; M = 50.67 years; SD = 14.29) and five healthy talkers (age range = 29-65 years; M = 52.6 years; SD = 14.5) during the phrases "a scarlet" and "a sergeant", using electromagnetic articulography (EMA) (AG-200 system) and electropalatography (EPG) (Reading Electropalatograph system). Anticipatory lingual movement and speech timing were analysed during the final C1VC/C2 syllable in each of these phrases, where C represented an alveolar or postalveolar consonant. Specifically, tongue-tip displacement was calculated from the onset of release to the end of release of C1 to provide an indication of anticipatory lingual movement. With respect to speech timing, absolute (i.e., duration from time of maximum contact for C1 to time of maximum contact for C2) and relative (i.e., absolute duration expressed as a function of total syllable duration) durational measures were recorded, as was the stability of each. The results recorded for each of the participants with AOS were individually compared to those obtained by the control group. Outcomes & Results: The EMA results indicated that two participants with AOS exhibited reduced anticipatory lingual movement (i.e., greater tongue-tip displacement) during repetitions of "sergeant"; however, all speakers produced a comparable tongue-tip displacement to that produced by the control group during the release of /l/ in "scarlet". The EPG results indicated that absolute duration was significantly prolonged during the final syllables of both stimuli for each of the apraxic speakers. Equivocal results were reported for relative timing and temporal stability. Conclusions: The results provide some preliminary evidence of reduced anticipatory lingual movement in AOS, and have demonstrated that this can have a significant impact on abso- lute speech timing. However, measures of relative timing were suggestive of either unim- paired or more extensive coarticulation. Additional research is required to resolve this issue.
Article
In two artificial language learning experiments, we investigated the impact of attention load on segmenting speech through two sublexical cues: transitional probabilities (TPs) and coarticulation. In Experiment 1, we observed that coarticulation processing was resilient to high attention load, whereas TP computation was penalized in a graded manner. In Experiment 2, we showed that encouraging participants to actively search for "word" candidates enhanced overall performance but was not sufficient to preclude the impairment of statistically driven segmentation by attention load. As long as attentional resources were depleted, independently of their intention to find these "words," participants segmented only TP words with the highest TPs, not TP words with lower TPs. Attention load thus has a graded and differential impact on the relative weighting of the cues in speech segmentation, even when only sublexical cues are available in the signal.
Article
This paper shows that in the 17th century various attempts were made to build fully automatic speaking devices resembling those exhibited in the late 18th century France and Germany. Through the analysis of writings by well-known 17th century scientists, and a document hitherto unknown in the history of phonetics and speech synthesis, an excerpt from la Science universelle (1667[1641]) of the French writer Charles Sorel (1599-1674), it is argued that engineers and scientists of the Baroque period have to be credited with the first model of multilingual text-to-speech synthesis engines using unlimited vocabulary.
ResearchGate has not been able to resolve any references for this publication.