Project

Evolution of Language

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
94
Reads
1 new
258

Project log

William Garrett Mitchener
added 2 research items
Universal grammar (UG) is a list of innate constraints that specify the set of grammars that can be learned by the child during primary language acquisition. UG of the human brain has been shaped by evolution. Evolution requires variation. Hence, we have to postulate and study variation of UG. We investigate evolutionary dynamics and language acquisition in the context of multiple UGs. We provide examples for competitive exclusion and stable coexistence of different UGs. More specific UGs admit fewer candidate grammars, and less specific UGs admit more candidate grammars. We will analyze conditions for more specific UGs to outcompete less specific UGs and vice versa. An interesting finding is that less specific UGs can resist invasion by more specific UGs if learning is more accurate. In other words, accurate learning stabilizes UGs that admit large numbers of candidate grammars.
William Garrett Mitchener
added 3 research items
Although discrete formalisms have been successful at describing the sets of grammatical sentences in human languages, new tools are needed to model language variation. An individual's speech pattern can be modeled more realistically by a stochastic grammar consisting of a set of idealized grammars together with a set of usage rates. A population can then be represented as a probability measure over a space of usage rates and physical or social locations. In this article, I investigate a measure-valued dierential equation for a spatially distributed population in which individuals use stochastic grammars. Under appropriate hypotheses, and assuming that children learn based on an average feature of the nearby population's speech, the asymptotic behavior of the measure dynamics are controlled by the feature's dynamics, which can signicantly reduce the dimension of the model. I discuss the example of a single usage rate for choosing one of two grammatical options. If space is unstructured, then all populations tend to a stable equilibrium dominated by one option or the other. If space consists of two well-mixed compartments, then each compartment may choose a dierent dominant idealized grammar, but increased migration causes a bifurcation in which one idealized grammar goes extinct. If space is continuous, numerical experiments show that the measure and feature dynamics can exhibit traveling waves.
We investigate a model of language evolution, based on population game dynamics with learning. First, we examine the case of two genetic variants of universal grammar (UG), the heart of the human language faculty, assuming each admits two possible grammars. The dynamics are driven by a communication game. We prove using dynamical systems techniques that if the payoff matrix obeys certain constraints, then the two UGs are stable against invasion by each other, that is, they are evolutionarily stable. Then, we prove a similar theorem for an arbitrary number of disjoint UGs. In both theorems, the constraints are independent of the learning process. Intuitively, if a mutation in UG results in grammars that are incompatible with the established languages, then the mutation will die out because mutants will be unable to communicate and therefore unable to realize any potential benefit of the mutation. An example for which these theorems do not apply shows that compatible mutations may or may not be able to invade, depending on the population's history and the learning process. These results suggest that the genetic history of language is constrained by the need for compatibility and that mutations in the language faculty may have died out or taken over due more to historical accident than to any straightforward notion of relative fitness.
William Garrett Mitchener
added 2 research items
We consider the task of learning three verb classes: raising (e.g., seem), control (e.g., try) and ambiguous verbs that can be used either way (e.g., begin). These verbs occur in sentences with similar surface forms, but have distinct syntactic and semantic properties. They present a conundrum because it would seem that their meaning must be known to infer their syntax, and that their syntax must be known to infer their meaning. Previous research with human speakers pointed to the usefulness of two cues found in sentences containing these verbs: animacy of the sentence subject and eventivity of the predicate embedded under the main verb. We apply a variety of algorithms to this classification problem to determine whether the primary linguistic data is sufficiently rich in this kind of information to enable children to resolve the conundrum, and whether this information can be extracted in a way that reflects distinctive features of child language acquisition. The input consists of counts of how often various verbs occur with animate subjects and eventive predicates in two corpora of naturalistic speech, one adult-directed and the other child-directed. Proportions of the semantic frames are insufficient. A Bayesian attachment model designed for a related language learning task does not work well at all. A hierarchical Bayesian model (HBM) gives significantly better results. We also develop and test a saturating accumulator that can successfully distinguish the three classes of verbs. Since the HBM and saturating accumulator are successful at the classification task using biologically realistic calculations, we conclude that there is sufficient information given subject animacy and predicate eventivity to bootstrap the process of learning the syntax and semantics of these verbs. KeywordsBayesian inference–Child language acquisition–Clustering–Control–Raising–Syntax–Unsupervised learning
In a typical human population, some features of the lan-guage are bound to be in flux. Variation in each individual's usage rates of optional features reflects language change in progress. Sociolinguistic sur-veys have determined that some individuals use new features to a greater degree than the population average, that is, they seem to be leading the change. This article describes a mathematical model of the spread of lan-guage change inspired by a model from population genetics. It incorporates the premise that some individuals are linguistic leaders and exert more in-fluence on the speech of learning children than others. Using historical data from the rise of do-support in English, a maximum likelihood calculation yields an estimate for the influence ratio used in the model. The influence ratio so inferred indicates that 19 of the 200 simulated individuals account for 95% of the total influence, confirming that language change may be driven by a relatively small group of leaders. The model can be improved in any number of ways, but additional features must be selected carefully so as not to produce a computationally intractable inference problem. This project demonstrates how data and techniques from different subfields of linguistics can be combined within a mathematical model to reveal other-wise inaccessible information about language variation and change.
William Garrett Mitchener
added a research item
I discuss a stochastic model of language learning and change. During a syntactic change, each speaker makes use of constructions from two different idealized grammars at variable rates. The model incorporates regularization in that speakers have a slight preference for using the dominant idealized grammar. It also includes incrementation: The population is divided into two interacting generations. Children can detect correlations between age and speech. They then predict where the population’s language is moving and speak according to that prediction, which represents a social force encouraging children not to sound out-dated. Both regularization and incrementation turn out to be necessary for spontaneous language change to occur on a reasonable time scale and run to completion monotonically. Chance correlation between age and speech may be amplified by these social forces, eventually leading to a syntactic change through prediction-driven instability.