Functional Load modulates speech production, but not speech perception: Evidence from Thai vowel length

Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.


The functional load (FL) of a phonological contrast has been shown to correlate with its resistance to merger on evolutionary timescales. The effects of FL on day-today speech, however, remain an uncharted territory. In this paper, we studied the effects of FL on the production and perception of vowel length contrasts in Bangkok Thai. We found that, in production, FL had a positive correlation with long/short vowel duration ratios, as well as with the discriminability between short and long vowels distributions in duration space. For perception, we found no correlations between FL and any of the perceptual measurements. We hypothesize that different units and mechanisms involved in production and perception, as well as their relationship to phonological contrast, are responsible for the presence or lack of effects of FL. We also discuss the implications of our findings for theories of sound change that privilege perception over production. Finally, we show how the real-time effects of FL on vowel length contrasts production may be accommodated in a non-linear dynamical model of phonological contrast.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The process of spoken word-recognition breaks down into three basic functions, of access, selection and integration. Access concerns the mapping of the speech input onto the representations of lexical form, selection concerns the discrimination of the best-fitting match to this input, and integration covers the mapping of syntactic and semantic information at the lexical level onto higher levels of processing. This paper describes two versions of a “cohort”-based model of these processes, showing how it evolves from a partially interactive model, where access is strictly autonomous but selection is subject to top-down control, to a fully bottom-up model, where context plays no role in the processes of form-based access and selection. Context operates instead at the interface between higher-level representations and information generated on-line about the syntactic and semantic properties of members of the cohort. The new model retains intact the fundamental characteristics of a cohort-based word-recognition process. It embodies the concepts of multiple access and multiple assessment, allowing a maximally efficient recognition process, based on the principle of the contingency of perceptual choice.
The goal of this paper is to show how dynamical theories of phonetics and phonology bridge the dualistic gap between discrete phonological descriptions and continuous phonetic descriptions. By delving into the first principles of dynamics, it is shown that dynamical theories do not assume separate sets of principles to describe discrete and continuous aspects of a system. Rather, the discrete description is shown to predict the continuous one, using the concept of a differential equation, which is thoroughly explained. Linear and nonlinear differential equations are introduced using a discrete approximation, and then used to show how phonological contrast has been accounted for using dynamical systems analysis. A dynamical recurrent neural network model of word formation is then discussed to show how linguistic plans for words are serialized and coordinated into motoric word plans for different articulatory systems in the vocal tract. Furthermore, it is shown that many aspects of the discrete, time-invariant phonological description can be predicted from observed variable continuous phonetic functions, using the principle of least squares and recurrent neural networks.
The idea that functional load offers a tool of potentially great explanatory power in diachronic linguistics is shared by a number of contemporary linguists, particularly those influenced at first or second hand by Prague. It is the purpose of the present paper to investigate the hypothesis that functional load plays a significant role in sound change. I will attempt to demonstrate that functional load, if it is a factor in sound change at all, is one of the least important of those we know anything about, and that it is best disregarded in discussions centering on the cause and direction of phonological change.
For nearly a century, linguists have suggested that diachronic merger is less likely between phonemes with a high functional load - that is, phonemes that distinguish many words in the language in question. However, limitations in data and computational power have made assessing this hypothesis difficult. Here we present the first larger-scale study of the functional load hypothesis, using data from sound changes in a diverse set of languages. Our results support the functional load hypothesis: phoneme pairs undergoing merger distinguish significantly fewer minimal pairs in the lexicon than unmerged phoneme pairs. Furthermore, we show that higher phoneme probability is positively correlated with merger, but that this effect is stronger for phonemes that distinguish no minimal pairs. Finally, within our dataset we find that minimal pair count and phoneme probability better predict merger than change in system entropy at the lexical or phoneme level.
We have argued that dynamically defined articulatory gestures are the appropriate units to serve as the atoms of phonological representation. Gestures are a natural unit, not only because they involve task-oriented movements of the articulators, but because they arguably emerge as prelinguistic discrete units of action in infants. The use of gestures, rather than constellations of gestures as in Root nodes, as basic units of description makes it possible to characterise a variety of language patterns in which gestural organisation varies. Such patterns range from the misorderings of disordered speech through phonological rules involving gestural overlap and deletion to historical changes in which the overlap of gestures provides a crucial explanatory element. Gestures can participate in language patterns involving overlap because they are spatiotemporal in nature and therefore have internal duration. In addition, gestures differ from current theories of feature geometry by including the constriction degree as an inherent part of the gesture. Since the gestural constrictions occur in the vocal tract, which can be charactensed in terms of tube geometry, all the levels of the vocal tract will be constricted, leading to a constriction degree hierarchy. The values of the constriction degree at each higher level node in the hierarchy can be predicted on the basis of the percolation principles and tube geometry. In this way, the use of gestures as atoms can be reconciled with the use of Constriction degree at various levels in the vocal tract (or feature geometry) hierarchy. The phonological notation developed for the gestural approach might usefully be incorporated, in whole or in part, into other phonologies. Five components of the notation were discussed, all derived from the basic premise that gestures are the primitive phonological unit, organised into gestural scores. These components include (1) constriction degree as a subordinate of the articulator node and (2) stiffness (duration) as a subordinate of the articulator node. That is, both CD and duration are inherent to the gesture. The gestures are arranged in gestural scores using (3) articulatory tiers, with (4) the relevant geometry (articulatory, tube or feature) indicated to the left of the score and (5) structural information above the score, if desired. Association lines can also be used to indicate how the gestures are combined into phonological units. Thus, gestures can serve both as characterisations of articulatory movement data and as the atoms of phonological representation.
We describe a model called the TRACE model of speech perception. The model is based on the principles of interactive activation. Information processing takes place through the excitatory and inhibitory interactions of a large number of simple processing units, each working continuously to update its own activation on the basis of the activations of other units to which it is connected. The model is called the TRACE model because the network of units forms a dynamic processing structure called “the Trace,” which serves at once as the perceptual processing mechanism and as the system's working memory. The model is instantiated in two simulation programs. TRACE I, described in detail elsewhere, deals with short segments of real speech, and suggests a mechanism for coping with the fact that the cues to the identity of phonemes vary as a function of context. TRACE II, the focus of this article, simulates a large number of empirical findings on the perception of phonemes and words and on the interactions of phoneme and word perception. At the phoneme level, TRACE II simulates the influence of lexical information on the identification of phonemes and accounts for the fact that lexical effects are found under certain conditions but not others. The model also shows how knowledge of phonological constraints can be embodied in particular lexical items but can still be used to influence processing of novel, nonword utterances. The model also exhibits categorical perception and the ability to trade cues off against each other in phoneme identification. At the word level, the model captures the major positive feature of Marslen-Wilson's COHORT model of speech perception, in that it shows immediate sensitivity to information favoring one word or set of words over others. At the same time, it overcomes a difficulty with the COHORT model: it can recover from underspecification or mispronunciation of a word's beginning. TRACE II also uses lexical information to segment a stream of speech into a sequence of words and to find word beginnings and endings, and it simulates a number of recent findings related to these points. The TRACE model has some limitations, but we believe it is a step toward a psychologically and computationally adequate model of the process of speech perception.
A fundamental problem in spoken language is the duality between the continuous aspects of phonetic performance and the discrete aspects of phonological competence. We study 2 instances of this problem from the phenomenon of voicing neutralization and vowel harmony. In each case, we present a model where the experimentally observed continuous distinctions are linked to the discreteness of phonological form using the mathematics of nonlinear dynamics.
The pattern of durations of individual phonetic segments and pauses conveys information about the linguistic content of an utterance. Acoustic measures of segmental timing have been used by many investigators to determine the variables that influence the durational structure of a sentence. The literature on segmental duration is reviewed and related to perceptual data on the discrimination of duration and to psychophysical data on the ability of listeners to make linguistic decisions on the basis of durational cues alone. We conclude that, in English, duration often serves as a primary perceptual cue in the distinctions between (1) inherently long verses short vowels, (2) voiced verses voiceless fricatives, (3) phrase-final verses non-final syllables, (4) voiced versus voiceless postvocalic consonants, as indicated by changes to the duration of the preceding vowel in phrase-final positions, (5) stressed verses unstressed or reduced vowels, and (6) the presence or absence of emphasis. Subject Classification: [43]70.40, [43]70.70, [43]70.20.
Frequency counts are a measure of how much use a language makes of a linguistic unit, such as a phoneme or word. However, what is often important is not the units themselves, but the contrasts between them. A measure is therefore needed for how much use a language makes of a contrast, i.e. the functional load (FL) of the contrast. We generalize previous work in linguistics and speech recognition and propose a family of measures for the FL of several phonological contrasts, including phonemic oppositions, distinctive features, suprasegmentals, and phonological rules. We then test it for robustness to changes of corpora. Finally, we provide examples in Cantonese, Dutch, English, German and Mandarin, in the context of historical linguistics, language acquisition and speech recognition. More information can be found at
C/D model: A computational model of phonetic implementation
  • O Fujimura
Fujimura, O. (1994). C/D model: A computational model of phonetic implementation. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 17, 1-20.
Interactive processes in speech perception: The TRACE model
  • J L Mcclelland
  • J L Elman
McClelland, J. L., & Elman, J. L. (1986a). Interactive processes in speech perception: The TRACE model. Parallel Distributed Processing, 2(58), 121.
The Initiation of Sound Change: Perception, production, and social factors
  • J J Ohala
Ohala, J. J. (2012). The listener as a source of sound change: An update. In M.-J. Solé & D. Recasens (Eds.), The Initiation of Sound Change: Perception, production, and social factors (pp. 21-36). John Benjamins.
Perception of Thai distinctive vowel length in noise
  • C Onsuwan
  • C Tantibundhit
  • N Saimai
  • T Saimai
  • P Chootrakool
  • S Thatphithakkul
Onsuwan, C., Tantibundhit, C., Saimai, N., Saimai, T., Chootrakool, P., & Thatphithakkul, S. (2013). Perception of Thai distinctive vowel length in noise. Proceedings of Meetings on Acoustics ICA2013, 19(1), 060115.
A Functional-Load Account of Geminate Contrastiveness: A Meta-Study
  • K Tang
  • J Harris
Tang, K., & Harris, J. (2014). A Functional-Load Account of Geminate Contrastiveness: A Meta-Study. Annual Meeting of the Linguistics Association of Great Britain, Oxford, UK.