Article

Learning the unlearnable: The role of missing evidence

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Syntactic knowledge is widely held to be partially innate, rather than learned. In a classic example, it is sometimes argued that children know the proper use of anaphoric one, although that knowledge could not have been learned from experience. Lidz et al. [Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn't have learned: Experimental evidence for syntactic structure at 18 months. Cognition, 89, B65-B73.] pursue this argument, and present corpus and experimental evidence that appears to support it; they conclude that specific aspects of this knowledge must be innate. We demonstrate, contra Lidz et al., that this knowledge may in fact be acquired from the input, through a simple Bayesian learning procedure. The learning procedure succeeds because it is sensitive to the absence of particular input patterns--an aspect of learning that is apparently overlooked by Lidz et al. More generally, we suggest that a prominent form of the "argument from poverty of the stimulus" suffers from the same oversight, and is as a result logically unsound.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recent papers on the acquisition of highly infrequent structures have conflicting results. Regier and Gahl (2004) found that constructions that are largely absent in the input can be learned based on minimal evidence. In particular, a Bayesian model was able to correctly predict that anaphoric one referred to a particular NP (red ball) via very few exposures to the target form. ...
... anaphoric 'one') nor tested model over more exemplars, both of which limit generalizability. In contrast with Regier and Gahl (2004), an effect for frequency was found in Wolter and Gyllstad's (2013) study of adult L2 learners. Learners were asked to identify collocations in the target language, and the authors found that highly infrequent collocations were less reliable identified. ...
... Wolter and Gyllstad, 2013) or does not (e.g. Regier and Gahl, 2004) interfere with acquisition -is a pattern repeated enough for Gass and Mackey (2002) to observe that the role of frequency in L2 acquisition is 'complex', though they agree with many cognitive approaches (e.g. Ellis, 2002) that it does play a large role in acquisition. ...
Article
This article provides a Poverty of Stimulus argument for the participation of a dedicated linguistic module in second language acquisition. We study the second language (L2) acquisition of a subset of English infinitive complements that exhibit the following properties: (a) they present an intricate web of grammatical constraints while (b) they are highly infrequent in corpora, (c) they lack visible features that would make them salient, and (d) they are communicatively superfluous. We report on an experiment testing the knowledge of some infinitival constructions by near-native adult first language (L1) Spanish / L2 English speakers. Learners demonstrated a linguistic system that includes contrasts based on subtle restrictions in the L2, including aspect restrictions in Raising to Object. These results provide evidence that frequency and other cognitive or environmental factors are insufficient to account for the acquisition of the full spectrum of English infinitivals. This leads us to the conclusion that a domain-specific linguistic faculty is required.
... ''Dalmatian") decreases the likelihood that children consider that form extendable to similar meanings (in this case, other dog breeds). As we discussed already, research on the acquisition of syntax has likewise proposed that repeatedly encountering a verb with the same argument structure leads the learner to infer that it cannot be used in any other way (the entrenchment hypothesis, Braine & Brooks, 1995; see also Regier & Gahl, 2004;Stefanowitsch, 2008). This appears to be the opposite of what would be expected from a tendency to extend frequent forms to new uses. ...
... In Experiment II, the comprehension test precedes production, affording an opportunity to observe transfer in the opposite direction, from comprehension to production. As discussed in the introduction, comprehension-to-production transfer of form-meaning mappings is thought to be crucial to eliminating accessibility-driven overextension in child language (Braine & Brooks, 1995;Regier & Gahl, 2004). If this kind of comprehension-to-production transfer occurs, we expect that extension of frequent forms in production observed in Experiment I may be replaced by entrenchment in Experiment II, where production follows comprehension. ...
... Another purpose of reordering the test tasks was to test the hypothesis that entrenchment in comprehension results in the development of strong form-meaning mappings that can then guide production (Braine & Brooks, 1995;Regier & Gahl, 2004). This hypothesis was confirmed by the finding that the entrenchment effect was observed in both production and form choice tasks when they followed the comprehension task. ...
... In section five I show why I think this PSA fails. Here I discuss Regier and Gahl's (2004) response, arguing that although their model of how the syntax of anaphoric 'one' can be learnt is strictly false, their objection that LWF don't consider all the evidence children might draw on still stands. Then I present three further problems with the study, concerning (1) the nature of the syntactic rule LWF claim to be innate, (2) a dubitable claim about the relevance of grammatical errors in the input, and (3) an unjustified assumption about the nature of the PLD. ...
... Critical responses to LWF dispute both the inaccessibility and indispensability of the evidence, and the infants' acquisition of the acquirendum (Regier & Gahl 2004, Tomasello 2004. I cannot present and assess all these critiques, so I will canvass just one, that of Regier and Gahl (2004), before discussing three criticisms of my own. ...
... Critical responses to LWF dispute both the inaccessibility and indispensability of the evidence, and the infants' acquisition of the acquirendum (Regier & Gahl 2004, Tomasello 2004. I cannot present and assess all these critiques, so I will canvass just one, that of Regier and Gahl (2004), before discussing three criticisms of my own. Regier and Gahl (2004, henceforth R&G) dispute the indispensability of the evidence LWF cite for the acquisition of anaphoric 'one'. ...
... ii(i) Knowing syntactic rules are structure-dependent (Chomsky 1980a, Anderson & Lightfoot 2000, Fodor & Crowther 2002, Berwick et al. 2011, Anderson 2013 1 i(ii) Knowing certain dependencies are limited to spanning no more than a single specific, abstract linguistic structure (Chomsky 1973, Huang 1982, Lasnik & Saito 1984) (iii) Knowing certain syntactic category assignments are illicit for certain words in a language (Baker 1978) However, recent investigations have suggested that learning strategies involving less specific knowledge may be sufficient to learn the target syntactic generalizations in several cases (e.g. Regier & Gahl 2004, Foraker et al. 2009, Pearl & Lidz 2009, Pearl & Mis 2011, Perfors, Tenenbaum, & Regier 2011, Pearl & Sprouse 2013a. Interestingly, a common successful approach in some of the most recent work (Pearl & Mis 2011, Perfors, Tenenbaum, & Regier 2011, Pearl & Sprouse 2013a involves expanding the set of informative data to include indirect positive evidence (discussed in more detail below in §2). ...
... Here, we apply both of these ideas to the case study of English anaphoric one, using computationally modeled learners that form their generalizations based on realistic input data (Sakas & Fodor 2001, Sakas & Nishimoto 2002, Yang 2002, Sakas 2003, Regier & Gahl 2004, Yang 2004, Legate & Yang 2007, Foraker et al. 2009, Pearl & Lidz 2009, Pearl 2011, Pearl & Mis 2011, Perfors, Tenenbaum, & Regier 2011, Sakas & Fodor 2012, Yang 2012, Legate & Yang 2013, Pearl & Sprouse 2013a. We demonstrate that a learner that assumes one's antecedent is the same syntactic category as one and is biased to include indirect positive evidence coming from other pronouns in English can generate the looking-preference behavior observed in eighteen-month-olds (Lidz et al. 2003). ...
... Indirect negative evidence 1 would correspond to the learner noticing that items like 1b are absent from the input, and so inferring that these items are absent because they are ungrammatical. Indirect negative evidence has been argued to be available, particularly to statistical learners that form expectations about how frequently items should appear in the input (e.g. via some form of entrenchment: Rohde & Plaut 1999, Regier & Gahl 2004, Clark & Lappin 2009, Foraker et al. 2009, Perfors et al. 2010, Perfors, Tenenbaum, & Regier 2011, Ambridge et al. 2013, Ramscar et al. 2013) and learners that use statistical preemption to recognize when an alternative semantically and pragmatically related item is used instead of the item in question (e.g. Boyd & Goldberg 2011, Goldberg 2011, Ambridge et al. 2013. ...
Article
Full-text available
Language learners are often faced with a scenario where the data allow multiple generalizations, even though only one is actually correct. One promising solution to this problem is that children are equipped with helpful learning strategies that guide the types of generalizations made from the data. Two successful approaches in recent work for identifying these strategies have involved (i) expanding the set of informative data to include indirect positive evidence, and (ii) using observable behavior as a target state for learning. We apply both of these ideas to the case study of English anaphoric one, using computationally modeled learners that assume one’s antecedent is the same syntactic category as one and form their generalizations based on realistic data. We demonstrate that a learner that is biased to include indirect positive evidence coming from other pronouns in English can generate eighteen-month-old looking-preference behavior. Interestingly, we find that the knowledge state responsible for this target behavior is a context-dependent representation for anaphoric one, rather than the adult representation, but this immature representation can suffice in many communicative contexts involving anaphoric one. More generally, these results suggest that children may be leveraging broader sets of data to make the syntactic generalizations leading to their observed behavior, rather than selectively restricting their input. We additionally discuss the components of the learning strategies capable of producing the ob-served behavior, including their possible origin and whether they may be useful for making other linguistic generalizations.
... The idea was first discussed in early 1980s (e.g., Chomsky 1981) and has recently been attracting growing attention within research on probabilistic learning models (e.g., Elman 1993; Lewis and Elman 2001;Seidenberg 1997;Tenenbaum and Griffiths 2001;Rhode and Plaut 1999;Regier and Gahl 2004). Roughly speaking, INE is the absence of input evidence that a certain hypothesis predicts to be possible in the language, and the learning mechanism uses the absence of expected data as evidence against the hypothesis. ...
... Roughly speaking, INE is the absence of input evidence that a certain hypothesis predicts to be possible in the language, and the learning mechanism uses the absence of expected data as evidence against the hypothesis. An important characteristic of recent probabilistic learning models that shapes learning around INE is that they have an ability to discriminate subset-superset hypotheses on the basis of positive evidence alone (e.g., Regier and Gahl 2004; but see Pearl and Lidz 2006). For the acquisition of possible scope interpretations, a probabilistic learner who can reliably detect the absence of a certain scope interpretation in the input data would be able to use the absence as evidence against the hypothesis that generates the scope interpretation. ...
... The possible scope interpretations that Hypothesis 3 generates (wide or narrow scope of ka) form a superset of the interpretations that Hypothesis 2 generates (wide scope of ka). In this case, a probabilistic leaner (with, for example, a Bayesian learning algorithm: e.g., Regier and Gahl 2004;Tenenbaum and Griffiths 2001) might be able to learn to dismiss the superset hypothesis. The idea is as follows: if the learner observes that the superset hypothesis generates not only the interpretation that can be seen in the input (i.e., the wide scope of ka) but also an interpretation that is never encountered (i.e., the narrow scope of ka), then the absence of the predicted interpretation can serve as evidence against the hypothesis. ...
... (i) a sensitivity to the distributional data in the available input (Sakas & Fodor 2001;Yang 2002;Regier & Gahl 2004;Yang 2004;Legate & Yang 2007;Pearl & Weinberg 2007;Foraker et al. 2009;McMurray & Hollich 2009;Pearl & Lidz 2009;Mitchener & Becker 2011;Pearl 2011, Pearl & Mis 2011Perfors, Tenenbaum & Regier 2011) (ii) a preference for simpler/smaller/narrower hypotheses (Regier & Gahl 2004;Foraker et al. 2009;Pearl & Lidz 2009;Mitchener & Becker 2011;Pearl & Mis 2011;Perfors, Tenenbaum & Regier 2011) (iii) a preference for highly informative data (Fodor 1998b;Pearl & Weinberg 2007;Pearl 2008) (iv) a preference for learning in cases of local uncertainty (Pearl & Lidz 2009) (v) a preference for data with multiple correlated cues (Soderstrom et al. 2009) The size and diversity of this hypothesis space of learning biases suggests that a finer-grained framework may be more informative than the traditional binary framework (UG versus non-UG). ...
... (i) a sensitivity to the distributional data in the available input (Sakas & Fodor 2001;Yang 2002;Regier & Gahl 2004;Yang 2004;Legate & Yang 2007;Pearl & Weinberg 2007;Foraker et al. 2009;McMurray & Hollich 2009;Pearl & Lidz 2009;Mitchener & Becker 2011;Pearl 2011, Pearl & Mis 2011Perfors, Tenenbaum & Regier 2011) (ii) a preference for simpler/smaller/narrower hypotheses (Regier & Gahl 2004;Foraker et al. 2009;Pearl & Lidz 2009;Mitchener & Becker 2011;Pearl & Mis 2011;Perfors, Tenenbaum & Regier 2011) (iii) a preference for highly informative data (Fodor 1998b;Pearl & Weinberg 2007;Pearl 2008) (iv) a preference for learning in cases of local uncertainty (Pearl & Lidz 2009) (v) a preference for data with multiple correlated cues (Soderstrom et al. 2009) The size and diversity of this hypothesis space of learning biases suggests that a finer-grained framework may be more informative than the traditional binary framework (UG versus non-UG). ...
... Oh look, another one."). Regier & Gahl (2004) demonstrated that a learner using online Bayesian inference can learn the correct syntactic representation and semantic interpretation of one from child-directed speech, provided that the child expands the range of informative data beyond the traditional data set of unambiguous data. Their model highlights the utility of a bias to use statistical distribution information in the data and a bias to prefer simpler/smaller/narrower hypotheses when encountering ambiguous data. ...
Article
Full-text available
The induction problems facing language learners have played a central role in debates about the types of learning biases that exist in the human brain. Many linguists have argued that some of the learning biases necessary to solve these language induction problems must be both innate and language-specific (i.e., the Universal Grammar (UG) hypothesis). Though there have been several recent high-profile investigations of the necessary learning bias types for different linguistic phenomena, the UG hypothesis is still the dominant assumption for a large segment of linguists due to the lack of studies addressing central phenomena in generative linguistics. To address this, we focus on how to learn constraints on long-distance dependencies, also known as syntactic island constraints. We use formal acceptability judgment data to identify the target state of learning for syntactic island constraints and conduct a corpus analysis of child-directed data to affirm that there does appear to be an induction problem when learning these constraints. We then create a computational learning model that implements a learning strategy capable of successfully learning the pattern of acceptability judgments observed in formal experiments, based on realistic input. Importantly, this model does not explicitly encode syntactic constraints. We discuss learning biases required by this model in detail as they highlight the potential problems posed by syntactic island effects for any theory of syntactic acquisition. We find that, although the proposed learning strategy requires fewer complex and domain-specific components than previous theories of syntactic island learning, it still raises difficult questions about how the specific biases required by syntactic islands arise in the learner. We discuss the consequences of these results for theories of acquisition and theories of syntax.
... The potential induction problem presented by English anaphoric one (1) has received considerable recent attention (e.g., Foraker, Regier, Khetarpal, Perfors, and Tenenbaum (2009) ;Lidz, Waxman, and Freedman (2003); Pearl and Lidz (2009); Regier and Gahl (2004)). ...
... Interpretation of one: syntactic antecedent of one = "red bottle" semantic referent of one = RED BOTTLE The original proposal for learning anaphoric one required children to have innate domain-specific knowledge about the structure of language, as part of the child's Universal Grammar (Baker, 1978). However, more recent studies have suggested alternative solutions involving innate domain-general learning abilities coupled with input restrictions that arise from domain-specific learning constraints (Foraker et al., 2009;Pearl & Lidz, 2009;Regier & Gahl, 2004). ...
... Domain-general learning abilities + domain-specific input restrictions Regier and Gahl (2004) (henceforth R&G) noted that Sem-Syn data like (7) could be leveraged to learn the correct representation for anaphoric one. Specifically, a learner with domain-general statistical learning abilities could track how often a property that was mentioned was important for the referent to have (e.g., when "red" was mentioned, was the referent just a BOTTLE or specifically a RED BOTTLE?). ...
Article
Full-text available
A controversial claim in linguistics is that children face an in-duction problem, which is often used to motivate the need for Universal Grammar. English anaphoric one has been argued to present this kind of induction problem. While the original solution was that children have innate domain-specific knowl-edge about the structure of language, more recent studies have suggested alternative solutions involving domain-specific in-put restrictions coupled with domain-general learning abilities. We consider whether indirect evidence coming from a broader input set could obviate the need for such input restrictions. We present an online Bayesian learner that uses this broader input set, and discover it can indeed reproduce the correct learning behavior for anaphoric one, given child-directed speech. We discuss what is required for acquisition success, and how this impacts the larger debate about Universal Grammar.
... In the study of language acquisition, Regier and Gahl (2004) argue that indirect negative evidence might play an important role in human acquisition of grammar, but do not link this idea to distributional semantics. ...
... Our hypothesis for this is that the grammar that generates language implicitly creates negative cooccurrence and so -PMI encodes this syntactic information. Interestingly, this idea creates a bridge between distributional semantics and the argument by Regier and Gahl (2004) that indirect negative evidence might play an important role in human language acquisition of grammar. ...
Preprint
In distributional semantics, the pointwise mutual information ($\mathit{PMI}$) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as $\mathit{PMI}$ goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative $\mathit{PMI}$ ($\mathit{\texttt{-} PMI}$) at $0$, also known as Positive $\mathit{PMI}$ ($\mathit{PPMI}$). In this paper, we investigate alternative ways of dealing with $\mathit{\texttt{-} PMI}$ and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different $\mathit{PMI}$ matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive $\mathit{PMI}$ (or both), we find that most of the encoded semantics and syntax come from positive $\mathit{PMI}$, in contrast to $\mathit{\texttt{-} PMI}$ which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel $PMI$ variants and grounding the popular $PPMI$ measure.
... We examine the DirFiltered learner first. Previous studies (Regier & Gahl, 2004;Pearl & Lidz, 2009) found that this filtered learner has a very high probability of learning one is N when it is smaller than NP (p N ≈ 1) and a very high probability of including a mentioned property in the antecedent (p incl ≈ 1), even with s values as low as 2. We find this is true when s=7 or above; however, when s=5, the learner is much less certain that the mentioned property should be included in the antecedent (p incl =0.683); when s=2, the learner is inclined to believe one is N 0 (p N =0.340) and is nearly certain that the mentioned property should NOT be included in the antecedent (p incl =0.020). Similarly, when s=7 or above, the learner reliably reproduces the observed infant behavior (p beh =0.557−0.585) ...
... In particular, recall that there is a tight connection between syntactic and referential information in the model (Figure 3), as both are used to determine the linguistic antecedent. In particular, each ALWAYS impacts the selection of the antecedent when a property is mentioned, which was not true in the previous probabilistic learning models used by Regier and Gahl (2004) and Pearl and Lidz (2009). This is reflected in the update equations for the DirRefSynAmb data, where both φ N and φ incl involve the current values of p N and p incl , as do all the equations corresponding to the probabilities of the different antecedent representations (recall equation (3)). ...
Article
Language learners are often faced with a scenario where the data allow multiple generalizations, even though only one is actually correct. One promising solution to this problem is that children are equipped with helpful learning strategies that guide the types of generalizations made from the data. Two successful approaches in recent work for identifying these strategies have involved (i) expanding the set of informative data to include indirect positive evidence , and (ii) using observable behavior as a target state for learning. We apply both of these ideas to the case study of English anaphoric one , using computationally modeled learners that assume one ’s antecedent is the same syntactic category as one and form their generalizations based on realistic data. We demonstrate that a learner that is biased to include indirect positive evidence coming from other pronouns in English can generate eighteen-month-old looking-preference behavior. Interestingly, we find that the knowledge state responsible for this target behavior is a context-dependent representation for anaphoric one , rather than the adult representation, but this immature representation can suffice in many communicative contexts involving anaphoric one . More generally, these results suggest that children may be leveraging broader sets of data to make the syntactic generalizations leading to their observed behavior, rather than selectively restricting their input. We additionally discuss the components of the learning strategies capable of producing the observed behavior, including their possible origin and whether they may be useful for making other linguistic generalizations.
... How do humans acquire such exceptions? The entrenchment hypothesis suggests that speakers track and use the distributional properties of their input as indirect negative evidence for the existence of an exception (Braine and Brooks, 1995;Regier and Gahl, 2004;Theakston, 2004). For instance, if an English learner never encounters the verb last in the passive voice, despite having seen last used productively in the active voice, they may conclude that last cannot occur in the the passive voice. ...
Preprint
Artificial neural networks can generalize productively to novel contexts. Can they also learn exceptions to those productive rules? We explore this question using the case of restrictions on English passivization (e.g., the fact that "The vacation lasted five days" is grammatical, but "*Five days was lasted by the vacation" is not). We collect human acceptability judgments for passive sentences with a range of verbs, and show that the probability distribution defined by GPT-2, a language model, matches the human judgments with high correlation. We also show that the relative acceptability of a verb in the active vs. passive voice is positively correlated with the relative frequency of its occurrence in those voices. These results provide preliminary support for the entrenchment hypothesis, according to which learners track and uses the distributional properties of their input to learn negative exceptions to rules. At the same time, this hypothesis fails to explain the magnitude of unpassivizability demonstrated by certain individual verbs, suggesting that other cues to exceptionality are available in the linguistic input.
... 4. An alternate explanation is that the a-adjective constrain is driven by entrenchment effects (Regier & Gahl, 2004;Stefanowitsch, 2008;Theakston, 2004). Unlike the statistical preemption account, which posits that only frequency between functionally related forms (i.e. ...
Article
Full-text available
What we say generally follows distributional regularities, such as learning to avoid "the asleep dog" because we hear "the dog that's asleep" in its place. However, not everyone follows such regularities. We report data on English monolinguals and Spanish-English bilinguals to examine how working memory mediates variation in a-adjective usage (asleep, afraid), which, unlike typical adjectives (sleepy, frightened), tend to resist attributive use. We replicate previous work documenting this tendency in a sentence production task. Critically, for all speakers, the tendency to use a-adjectives attributively or non-attributively was modulated by individual differences in working memory. But for bilinguals, a-adjective use was additionally modulated by an interaction between working memory and category fluency in the dominant language (English), revealing an interactive role of domain-general and language-related mechanisms that enable regulation of competing (i.e. attributive and non-attributive) alternatives. These results show how bilingualism reveals fundamental variation in language use, memory, and attention.
... To answer these questions, previous works have applied computational modeling to the language evolution field (e.g., [5,108,22,94,13,103]). Considering some examples among these studies, some suggest that language regularities can derive from pragmatic properties of communicative interactions [101,6], others, from cultural transmission across generations of learners [12,14]. ...
Thesis
The ability to acquire and produce a language is a key component of intelligence. If communication is widespread among animals, human language is unique in its productivity and complexity. By better understanding the source of natural language, one can use this knowledge to build better interactive AI models that can acquire human languages as rapidly and efficiently as children. In this manuscript, we build up on the emergent communication field to investigate the well-standing question of the source of natural language. In particular, we use communicating neural networks that can develop a language to solve a collaborative task. Comparing the emergent language properties with human cross-linguistic regularities can provide answers to the crucial questions of the origin and evolution of natural language. Indeed, if neural networks develop a cross-linguistic regularity spontaneously, then the latter would not depend on specific biological constraints. From the cognitive perspective, looking at neural networks as another expressive species can shed light on the source of cross-linguistic regularities – a fundamental research interest in cognitive science and linguistics. From the machine learning perspective, endowing artificial models with human constraints necessary to evolve communicative protocols as productive and robust as natural language would encourage the development of better interactive AI models.In this manuscript, we focus on studying four cross-linguistic regularities related to word length, word order, semantic categorization, and compositionality. Across the different studies, we find that some of these regularities arise spontaneously while others are missing in neural networks’ languages. We connect the former case to the presence of shared communicative constraints such as the discrete nature of the communication channel. On the latter, we relate the absence of human-like regularities to the lack of constraints either on the learners’ side (e.g., the least-effort constraints) or language functionality (e.g., transmission of information). In sum, this manuscript provides several case studies demonstrating how we can use successful neural network models to tackle crucial questions about the origin and evolution of our language. It also stresses the importance of mimicking the way humans learn their language in artificial agents’ training to induce better learning procedures for neural networks, so that they can evolve an efficient and open-ended communication protocol.
... Indirect negative evidence is another option: if Norwegian learners of English expect to encounter English wh-island violations at a rate comparable to Norwegian, then the absence of the structures could over time lead the learner towards the restrictive hypothesis. Prior research has argued that indirect evidence may play a role in L1 (Foraker et al., 2009;Perfors et al., 2011;Ramscar et al., 2013;Regier and Gahl, 2004;Rohde and Plaut, 1999) and L2 acquisition (Dahl, 2004;Plough, 1992). However, it is unclear whether the frequency of the island-violations is high enough in L1 to form the basis for strong predictions in L2. ...
Article
Full-text available
Norwegian allows filler-gap dependencies into embedded questions, which are islands for filler-gap dependency formation in English. We ask whether there is evidence that Norwegian learners of English transfer the functional structure that permits island violations from their first language (L1) to their second language (L2). In two acceptability judgment studies, we find that Norwegians are more likely to accept ‘island-violating’ filler-gap dependencies in L2 English if the corresponding filler-gap dependency is acceptable in Norwegian: Norwegian learners variably accept English sentences with dependencies into embedded questions, but not into subject phrases. These results are consistent with models that permit transfer of abstract functional structure. Norwegians are still less likely to accept filler-gap dependencies into English embedded questions than Norwegian embedded questions. We interpret the latter finding as evidence that, despite transfer, Norwegian speakers may partially restructure their L2 English analysis. We discuss how indirect positive evidence may play a role in helping learners restructure.
... Learners pay little or no attention to direct negative evidence about what is ungrammatical in their language (Brown & Hanlon 1970;Marcus 1993 and references there) and thus must construct a grammar hypothesis given only positive exemplars (i.e., direct positive evidence). There is a body of research that suggests that learners can use indirect negative evidence, i.e., the lack of a linguistic pattern in the child's linguistic environment from which the child can induce what is disallowed by the grammar being acquired (see Foraker et al. 2009;Regier & Gahl 2004). However, a number of concerns have been raised about the ability of a child to identify and utilize indirect negative evidence during language acquisition (e.g., Hornstein 2016;Pinker 1989;Yang 2017). ...
Article
Full-text available
In this article, we propose a reconceptualization of the principles and parameters (P&P) framework. We argue that in lieu of discrete parameter values, a parameter value exists on a gradient plane that encodes a learner's confidence that a particular parametric structure licenses the utterances in the learner's linguistic input. Crucially, this gradient parameter hypothesis obviates the need for default parameter values. Default parameter values can be put to use effectively from the perspective of linguistic learnability but are lacking in terms of empirical and theoretical consistency. We present findings from a computational implementation of a gradient P&P learner. The findings suggest that the gradient parameter hypothesis provides the basis for a viable alternative to existing computational models of language acquisition in the classic P&P paradigm. We close with a brief discussion of how a gradient parameter space offers a path to address shortcomings that have been attributed to the P&P framework. ARTICLE HISTORY
... Some 12 theorists (e.g., Pinker, 1989) have asserted that such contrastive learning is impossible without the aid of innate language knowledge. Research shows, however, that probabilistic models are capable of dealing successfully with this difficulty (e.g., Regier & Gahl, 2004;Szagun, 2011). As a natural case in point, Slobin (1978) reported a study of his daughter Heida's struggling to master regular and irregular forms of the English past tense. ...
Article
Full-text available
Native fluent speakers do not appear to have conscious knowledge of the linguistic categories and declarative rules that structural linguists use to describe grammar and that most psycholinguists have adopted for explaining language functioning. The implication derived in this paper is that these categories and rules are deprived of psychological reality. It is proposed that a psychologically real morphosyntax is concerned with sentence surface. The pragmatic framework and the semantic relational matrix at the onset of sentence production are converted directly into syntagmatic patterns, flexibly distributed along the sentence line, according to the distributional constraints of the tongue. These constraints are reflected in probabilistic associations between words and sequences of words. Natural morphosyntax is learned incidentally through implicit procedural learning. Children extract frequent syntagmatic patterns from adapted adult input. The resulting knowledge is stored in procedural memory. The cortico-striatal-cerebellar system of the brain has the computational power necessary to deal with sentence sequential patterning and associative regularities. Keywords: morphosyntax, language learnability, syntagmatic processing, probabilistic associations, pattern extraction, implicit procedural learning and memory. 3 Résumé Il est facile de vérifier que les sujets parlants ordinaires et fluents dans leur langue ne dispose pas d'une connaissance claire des catégories linguistiques et des règles déclaratives que la linguistique structurale utilise couramment pour décrire les faits de grammaire et que la plupart des psycholinguistes ont adopté pour expliquer le fonctionnement langagier. L'implication émergeant de la discussion présentée ici est que ces catégories et règles n'ont pas de réalité psychologique. On propose alternativement que la morphosyntaxe naturelle est concernée exclusivement par la surface des énoncés. Un encadrement pragmatique et une matrice sémantique figurent, certes, au point de départ de la production des énoncés. Ils sont convertis directement en patrons syntagmatiques, distribués flexiblement tout au long the la ligne phrastique en accord avec les contraintes distributionnelles de la langue. Ces contraintes sont reflétées dans les associations séquentielles existant entre les mots et séquences de mots dans une langue particulière. la morphosyntaxe est acquise incidemment selon un apprentissage procédural implicite. Les enfants extraient les patrons syntagmatiques, en commençant par les plus fréquents, à partir d'un input langagier adulte adapté à leur niveau de développement. Les connaissances résultantes sont stockées en mémoire procédurale ; le système cérébral cortico-striatal dispose du pouvoir computationnel nécessaire pour traiter les patrons séquentiels et les régularités associatives impliquées dans la morphosyntaxe des langues naturelles ; Mots-clés : morphosyntaxe, apprenabilité du langage, traitement syntagmatique, associations probabilistes, extraction de patron, apprentissage et mémoire procédurale implicite 4
... Thus, indirect negative evidence can be inferred from positive evidence. Children can differentiate between an accidental non-occurrence and a statistically significant non-occurrence, and deduce that the latter is ungrammatica l (Regier and Gahl 2004;Stefanowitsch, 2008). ...
Article
A relevant issue in the philosophy of science is the demarcation problem: how to distinguish science from nonscience, and, more specifically, science from pseudoscience. Sometimes, the demarcation problem is debated from a very general perspective, proposing demarcation criteria to separate science from pseudoscience, but without discussing any specific field in detail. This article aims to focus the demarcation problem on particular disciplines or theories. After considering a set of demarcation criteria, four pseudosciences are examined: psychoanalysis, speculative evolutionary psychology, universal grammar, and string theory. It is concluded that these theoretical frameworks do not meet the requirements to be considered genuinely scientific.
... As we will see shortly, the role of missing evidence is crucial for the usage-based perspective (cf. Regier & Gahl, 2004). ...
... Nonetheless, it is clear that inference is not all there is to learning. For example, Bayesian inference necessarily predicts that, unless one can explain away the repetitions, repeatedly encountering a form being used in a certain way would lead one to infer that the form cannot be used in any other way (Regier and Gahl 2004;Xu and Tenenbaum 2007). Harmon and Kapatsinski's (forthcoming) data discussed above show that high frequency of a form-meaning mapping does lead to entrenchment in comprehension but it leads the speaker to extend the form to new uses in production. ...
Chapter
Full-text available
The great variability of morphological structure across languages makes it uncontroversial that morphology is learned. Yet, morphology presents formidable learning challenges, on par with those of syntax. This article takes a constructionist perspective in assuming that morphological constructions are a major outcome of the learning process. However, the existence of morphological paradigms in many languages suggests that they are often not the only outcome. The article reviews domain-general approaches to achieving this outcome. The primary focus is on mechanisms proposed within the associative/connectionist tradition, which are compared with Bayesian approaches. The issues discussed include the role of prediction and prediction error in learning, generative vs. discriminative learning models, directionality of associations, the roles of (unexpectedly) present vs. absent stimuli, general-to-specific vs. specific-to-general learning, and the roles of type and token frequency. In the process, the notion of a construction itself is shown to be more complicated that it first appears.
... SeeRegier and Gahl (2004) for the general argument of learning from the absence of expected data. ...
Article
Full-text available
The extent to which gaps continue to be synchronically motivated by grammar competition after first appearing in a language remains an open question. In this paper I investigate the structure of genitive plural paradigmatic gaps in Modern Greek nouns. These gaps are interesting because they seem at first glance to be synchronically motivated by competing morphological stress patterns, based on the distribution of defective lexemes in the lexicon. However, as we will see, the results of a production and rating experiment indicate that despite the availability of this synchronic motivation, speakers treat genitive plural gaps as examples of lexicalized defectiveness, divorced from issues related to stress. Ultimately, the point will be that the distribution of gaps in Modern Greek is misleading regarding their synchronic structure.
... A learner who uses the size principle to make inferences about the possible antecedents of PRO is predicted to favor a stricter restriction on the antecedent (i.e. a smaller hypothesis) over one which allows a wider range of interpretations (a larger hypothesis), provided that the observed data is compatible with both hypotheses (Pearl & Lidz, 2009;Regier & Gahl, 2004). Of the non-adultlike grammars that have been proposed for adjunct control, some allow a free interpretation of PRO (Nominalization and high attachment), some allow an internal, but not an external interpretation of PRO (optional low attachment), and some allow only an object control interpretation (obligatory low attachment). ...
Thesis
Full-text available
This dissertation uses children’s acquisition of adjunct control as a case study to investigate grammatical and performance accounts of language acquisition. In previous research, children have consistently exhibited non-adultlike behavior for sentences with adjunct control. To explain children’s behavior, several different grammatical accounts have been proposed, but evidence for these accounts has been inconclusive. In this dissertation, I take two approaches to account for children’s errors. First, I spell out the predictions of previous grammatical accounts, and test these predictions after accounting for some methodological concerns that might have influenced children’s behavior in previous studies. While I reproduce the non-adultlike behavior observed in previous studies, the predictions of previous grammatical accounts are not borne out, suggesting that extragrammatical factors are needed to explain children’s behavior. Next, I consider the role of two different types of extragrammatical factors in predicting children’s non-adultlike behavior. With a new task designed to address the task demands in previous studies, children exhibit significantly higher accuracy than with previous tasks. This suggests that children’s behavior has been influenced by task- specific processing factors. In addition to the task, I also test the predictions of a similarity-based interference account, which links children’s errors to the same memory mechanisms involved in sentence processing difficulties observed in adults. These predictions are borne out, supporting a more continuous developmental trajectory as children’s processing mechanisms become more resistant to interference. Finally, I consider how children’s errors might influence their acquisition of adjunct control, given the distribution in the linguistic input. I discuss the results of a corpus analysis, including the possibility that adjunct control could be learned from the input. The kinds of information that could be useful to a learner become much more limited, however, after considering the processing limitations that would interfere with the representations available to the learner.
... This paper concerns a little word with a fraught history in syntactic theory-a word that has been used for decades to justify specific assumptions about the hierarchical structure of noun phrases (see, e.g., Baker, 1978;Carnie, 2012;Cowper, 1992;Radford, 1981) and wielded as a weapon in the debate concerning learnability of phrase-structure categories (Baker, 1978;Hornstein & Lightfoot, 1981;Radford, 1988;Lidz, Waxman, & Freedman, 2003;cf. Akhtar, Callanan, Pullum, & Scholz, 2004;Foraker, Regier, Khetarpal, Perfors, & Tenenbaum, 2009;Regier & Gahl, 2004;Tomasello, 2004). The word is one, and it is illustrated in its anaphoric use in (1a-b): ...
Article
One anaphora (e.g., this is a good one) has been used as a key diagnostic in syntactic analyses of the English noun phrase, and “one-replacement” has also figured prominently in debates about the learnability of language. However, much of this work has been based on faulty premises, as a few perceptive researchers, including Ray Jackendoff, have made clear. Abandoning the view of anaphoric one (a-one) as a form of syntactic replacement allows us to take a fresh look at various uses of the word one. In the present work, we investigate its use as a cardinal number (1-one) in order to better understand its anaphoric use. Like all cardinal numbers, 1-one can only quantify an individuated entity and provides an indefinite reading by default. Owing to unique combinatoric properties, cardinal numbers defy consistent classification as determiners, quantifiers, adjectives, or nouns. Once the semantics and distribution of cardinal numbers, including 1-one, are appreciated, many properties of a-one follow with minimal stipulation. We claim that 1-one and a-one are distinct but very closely related lexemes. When 1-one appears without a noun (e.g., Take one), it is nearly indistinguishable from a-one (e.g., take one)—the only differences being interpretive (1-one foregrounds its cardinality while a-one does not) and prosodic (presence versus absence of primary accent). While we ultimately argue that a family of constructions is required to describe the full range of syntactic contexts in which one appears, the proposed network accounts for properties of a-one by allowing it to inherit most of its syntactic and interpretive constraints from its historical predecessor, 1-one.
... This paper concerns a little word with a fraught history in syntactic theory—a word that has been used for decades to justify specific assumptions about the hierarchical structure of noun phrases (see, e.g., Baker 1978; Radford 1981; Cowper 1992; Carnie 2012) and wielded as a weapon in the debate concerning learnability of phrase-structure categories (Baker 1978; Hornstein and Lightfoot 1981; Radford 1988; Lidz et al. 2003; cf. Akhtar et al. 2004; Regier & Gahl 2004, Foraker et al. 2009 Tomasello 2004). The word is one, and it is illustrated in its anaphoric use in (1a-b): 1.a. ...
Chapter
Full-text available
This paper provides a compositional, lexically based analysis of the infinitival, verb-headed idiom exemplified by the sentences What does this have to do with me? and It may have had something to do with money. 1 Using conventions of Sign-Based Construction Grammar (SBCG, Sag 2012, Kay and Sag 2012, Michaelis 2012), we show that this multiword expression is revealingly represented as an intransitive verb word do, whose subject cannot be locally instantiated, which is necessarily in base form, and which invokes or is invoked by other idiomatically construed lexemes, including a special subject-raising lexeme have, which contributes a (potentially null instantiated) degree argument. We argue that idiomatic do, despite its restricted combinatorial potential, is compositionally interpreted, denoting an association between two entities, the first of which is expressed by the non-locally-instantiated subject and the second of which is expressed by the with-headed PP. We draw several lessons from this study. First, as is perhaps self-evident, multi-word expressions that are composed mostly of idiomatic words, such as have, to, and do in this idiom, may also require the presence of non-idiomatic words. An example is the presence in the to-do-1 We are grateful for the privilege of contributing to a volume honoring Lauri Karttunen. Given that our contribution is about the form and meaning of an idiom, we find it a welcome coincidence that after a brilliant career that began in semantics and pragmatics, and then moved into computational linguistics, Lauri has recently returned to linguistic meaning, dissecting with his accustomed mastery a highly idiomatic class of raising adjectives with protean implicative properties, e.g., You will be lucky to break even (Karttunen 2013). We would also like to express our gratitude for the very helpful comments of an anonymous referee. 2
... His investigations were limited to a small number of rare constructions, and typically focused on nouns, but are nonetheless encouraging in their success in the semantic domain. Regier and Gahl [2004] contributed to the lexicon learning problem in the form of a mathematical proof-of-concept to provide counter-evidence for the poverty of stimulus argument. Using Bayes' rule and the "size principle," only a small number of positive examples were necessary for handling a set of sentences using the anaphoric one, because the learner must favour the hypothesis that is most likely to generate the positive examples, and not the incorrect ones (it knows they are incorrect because they are unseen). ...
Thesis
State-of-the-art parsers suffer from incomplete lexicons, as evidenced by the fact that they all contain built-in methods for dealing with out-of-lexicon items at parse time. Since new labelled data is expensive to produce and no amount of it will conquer the long tail, we attempt to address this problem by leveraging the enormous amount of raw text available for free, and expanding the lexicon offline, with a semi-supervised word learner. We accomplish this with a method similar to self-training, where a fully trained parser is used to generate new parses with which the next generation of parser is trained. This thesis introduces Chart Inference (CI), a two-phase word-learning method with Combinatory Categorial Grammar (CCG), operating on the level of the partial parse as produced by a trained parser. CI uses the parsing model and lexicon to identify the CCG category type for one unknown word in a context of known words by inferring the type of the sentence using a model of end punctuation, then traversing the chart from the top down, filling in each empty cell as a function of its mother and its sister. We first specify the CI algorithm, and then compare it to two baseline wordlearning systems over a battery of learning tasks. CI is shown to outperform the baselines in every task, and to function in a number of applications, including grammar acquisition and domain adaptation. This method performs consistently better than self-training, and improves upon the standard POS-backoff strategy employed by the baseline StatCCG parser by adding new entries to the lexicon. The first learning task establishes lexical convergence over a toy corpus, showing that CI’s ability to accurately model a target lexicon is more robust to initial conditions than either of the baseline methods. We then introduce a novel natural language corpus based on children’s educational materials, which is fully annotated with CCG derivations. We use this corpus as a testbed to establish that CI is capable in principle of recovering the whole range of category types necessary for a wide-coverage lexicon. The complexity of the learning task is then increased, using the CCGbank corpus, a version of the Penn Treebank, and showing that CI improves as its initial seed corpus is increased. The next experiment uses CCGbank as the seed and attempts to recover missing question-type categories in the TREC question answering corpus. The final task extends the coverage of the CCGbank-trained parser by running CI over the raw text of the Gigaword corpus. Where appropriate, a fine-grained error analysis is also undertaken to supplement the quantitative evaluation of the parser performance with deeper reasoning as to the linguistic points of the lexicon and parsing model.
... This paper concerns a little word with a fraught history in syntactic theory-a word that has been used for decades to justify specific assumptions about the hierarchical structure of noun phrases (see, e.g., Baker 1978;Radford 1981;Cowper 1992;Carnie 2012) and wielded as a weapon in the debate concerning learnability of phrase-structure categories (Baker 1978;Hornstein and Lightfoot 1981;Radford 1988;Lidz et al. 2003;cf. Akhtar et al. 2004;Regier & Gahl 2004, Foraker et al. 2009Tomasello 2004). The word is one, and it is illustrated in its anaphoric use in (1a-b): ...
Research
Full-text available
One anaphora (e.g., this is a good one) has been used as a key diagnostic in syntactic analyses of the English noun phrase, and ‘one-replacement’ has also figured prominently in debates about the learnability of language. However, much of this work has been based on faulty premises, as a few perceptive researchers, including Ray Jackendoff, have made clear. Abandoning the view of anaphoric one (A-ONE) as a form of syntactic replacement allows us to take a fresh look at various uses of the word one. In the present work, we investigate its use as a cardinal number (1-ONE) in order to better understand its anaphoric use. Like all cardinal numbers, 1-ONE can only quantify an individuated entity and provides an indefinite reading by default. Owing to unique combinatoric properties, cardinal numbers defy consistent classification as determiners, quantifiers, adjectives or nouns. Once the semantics and distribution of cardinal numbers including 1-ONE are appreciated, many properties of A-ONE follow with minimal stipulation. We claim that 1-ONE and A-ONE are distinct but very closely related lexemes. When 1-ONE appears without a noun (e.g., Take ONE), it is nearly indistinguishable from A-ONE (e.g., TAKE one)—the only differences being interpretive (1-ONE foregrounds its cardinality while A-ONE does not) and prosodic (presence versus absence of primary accent). While we ultimately argue that a family of constructions is required to describe the full range of syntactic contexts in which one appears, the proposed network accounts for properties of A-ONE by allowing it to share (inherit) most of its syntactic and interpretive constraints from its historical predecessor, 1-ONE.
... The question then is how they could have purged the impossible interpretation. One possibility is that children utilize a Bayesian learning mechanism(Regier & Gahl 2004, Tenenbaum & Griffiths 2001 and take both the presence of the WSC and absence of the NSC into consideration to determine the correct interpretive option of Neg…he. This first requires that ample data points regarding Neg…he exist in children's input, which is presumably true, considering our preliminary adult corpus search results. ...
... . "Ideal" learning, Bayesian learning and computational resources [1 lecture] Regier and Gahl (2004) in response to Lidz et al. (2003) present a Bayesian learning model that learns the correct structural referent for anaphoric "one" in English from a corpus of child-directed speech. Similarly to Reali & Christiansen (op. ...
Conference Paper
Full-text available
Computational modeling of human language processes is a small but growing subfield of computational linguistics. This paper describes a course that makes use of recent research in psychocomputational modeling as a framework to introduce a number of mainstream computational linguistics concepts to an audience of linguistics, cognitive science and computer science doctoral students. The emphasis on what I take to be the largely interdisciplinary nature of computational linguistics is particularly germane for the computer science students. Since 2002 the course has been taught three times under the auspices of the MA/PhD program in Linguistics at The City University of New York's Graduate Center. A brief description of some of the students' experiences after having taken the course is also provided.
... The inference engine is based on the idea that every possible grammar defined by UG makes predictions about what the learner should expect to find in the environment (Gleitman 1990, Lightfoot 1991, Fodor 1998, Tenenbaum & Griffiths 2001, Yang 2002, Regier & Gahl 2003, Pearl & Lidz 2009). The acquisitional intake compares those predicted features against the perceptual intake representation, driving an inference about the grammatical features responsible for the sentence under consideration. ...
Article
Full-text available
Evidence of children’s sensitivity to statistical features of their input in language acquisition is often used to argue against learning mechanisms driven by innate knowledge. At the same time, evidence of children acquiring knowledge that is richer than the input supports arguments in favor of such mechanisms. This tension can be resolved by separating the inferential and deductive components of the language learning mechanism. Universal Grammar provides representations that support deductions about sentences that fall outside of experience. In addition, these representations define the evidence that learners use to infer a particular grammar. The input is compared with the expected evidence to drive statistical inference. In support of this model, we review evidence of (a) children’s sensitivity to the environment, (b) mismatches between input and intake, (c) the need for learning mechanisms beyond innate representations, and (d) the deductive consequences of children’s acquired syntactic representations.
... On the cognitive side, context has been studied in everyday learning. Recent studies [9,10,11,12,13,14,15,16,17,18] include knowledge transfer, mental reasoning about causal relations, probabilistic reasoning by children, language processing, and attribution. ...
... Much of this progress is a product of the current revolution in cognitive science concerning the rise of "probabilistic modeling" accounts of reasoning and learning (Chater, Tenenbaum & Yuille, 2006Pearl, 2001;Glymour, 2003). Many of the ideas about probability that underpin these models were first formulated by the philosopher and mathematician, Reverend Thomas Bayes, in the 18 th century, and are now being successfully applied to a very broad set of problems in developmental psychology, including induction and inference in learning (e.g., Glymour, 2003;Tenenbaum, Griffiths, & Kemp, 2006), language acquisition in infancy (e.g., Chater & Manning, 2006;Tenenbaum & Xu, 2000;Xu & Tenenbaum, 2007;Niyogi, 2002;Dowman, 2002;Regier & Gahl, 2004), and the development of social cognition (e.g., Goodman, Baker, Bonawitz, Mansinghka, Gopnik, Wellman, Schulz, & Tenenbaum, 2006;Baker, Saxe, & Tenenbaum, 2006), among others. The Causality and Imagination application of probabilistic models has successfully described and predicted patters of behavioral data across a variety of experimental paradigms. ...
Article
Full-text available
This review describes the relation between the imagination and causal cognition, particularly with relevance to recent developments in computational theories of human learning. According to the Bayesian model of human learning, our ability to imagine possible worlds and engage in counterfactual reasoning is closely tied to our ability to think causally. Indeed, the purpose and distinguishing feature of causal knowledge is that it allows one to generate counterfactual inferences. We begin with a brief description of the "probabilistic models" framework of causality, and review empirical work in that framework which shows that adults and children use causal knowledge to generate counterfactuals. We also outline a theoretical argument that suggests that the imagination is central to the process of causal understanding. We will then offer evidence that Bayesian learning implicates the imaginative process, and conclude with a discussion of how this computational method may be applied to the study of the imagination, more classically construed.
... -Language modeling (Regier & Gahl 2004, Tenenbaum & Griffiths 2001 • Our starting assumption: Speakers are sensitive to the probability of a given combination of lexeme + inflectional property set (i.e., usage probability calculated over content paradigm cell). This includes sensitivity to the absence of an expected structure. ...
... There is not space here to go into the details, but see Sokolov and Snow (1994) for an overview of arguments and evidence from the child learning literature. See Regier and Gahl (Regier and Gahl 2004) for arguments from a computational perspective. There are at least two studies on paradigmatic gaps that assume gaps are learned primarily from implicit negative evidence (Johansson 1999, Orgun andSprouse 1999). ...
... A simple reason for these results is that English grammar is not the optimal description of the input, which contains about 27% of exceptional data that does not fit English grammar ). However, the performance of probabilistic learning mechanisms improves if they are equipped with a way to evaluate and discard the most ambiguous piece of data (Pearl, 2007;Pearl & Lidz, 2009;Regier & Gahl, 2004). The evaluation of the ambiguity of a word stress pattern relies on language-specific knowledge. ...
Article
Open access. Click on DOI to read. A word often expresses many different morphological functions. Which part of a word contributes to which part of the overall meaning is not always clear, which raises the question as to how such functions are learned. While linguistic studies tacitly assume the co‐occurrence of cues and outcomes to suffice in learning these functions (Baer‐Henney, Kügler, & van de Vijver, 2015; Baer‐Henney & van de Vijver, 2012), error‐driven learning suggests that contingency rather than contiguity is crucial (Nixon, 2020; Ramscar, Yarlett, Dye, Denny, & Thorpe, 2010). In error‐driven learning, cues gain association strength if they predict a certain outcome, and they lose strength if the outcome is absent. This reduction of association strength is called unlearning. So far, it is unclear if such unlearning has consequences for cue–outcome associations beyond the ones that get reduced. To test for such consequences of unlearning, we taught participants morphophonological patterns in an artificial language learning experiment. In one block, the cues to two morphological outcomes—plural and diminutive—co‐occurred within the same word forms. In another block, a single cue to only one of these two outcomes was presented in a different set of word forms. We wanted to find out, if participants unlearn this cue's association with the outcome that is not predicted by the cue alone, and if this allows the absent cue to be associated with the absent outcome. Our results show that if unlearning was possible, participants learned that the absent cue predicts the absent outcome better than if no unlearning was possible. This effect was stronger if the unlearned cue was more salient. This shows that unlearning takes place even if no alternative cues to an absent outcome are provided, which highlights that learners take both positive and negative evidence into account—as predicted by domain general error‐driven learning.
Article
Many domains of inquiry in psychology are concerned with rich and complex phenomena. At the same time, the field of psychology is grappling with how to improve research practices to address concerns with the scientific enterprise. In this Perspective, we argue that both of these challenges can be addressed by adopting a principle of methodological variety. According to this principle, developing a variety of methodological tools should be regarded as a scientific goal in itself, one that is critical for advancing scientific theory. To illustrate, we show how the study of language and communication requires varied methodologies, and that theory development proceeds, in part, by integrating disparate tools and designs. We argue that the importance of methodological variation and innovation runs deep, travelling alongside theory development to the core of the scientific enterprise. Finally, we highlight ongoing research agendas that might help to specify, quantify and model methodological variety and its implications. Philosophers of science have identified epistemological criteria for evaluating the promise of a scientific theory. In this Perspective, Dale et al. propose that a principle of methodological variety should be one of these criteria, and argue that psychologists should actively cultivate methodological variety to advance theory.
Article
Morphological structures interact dynamically with lexical processing and storage, with the parameters of morphological typology being partly dependent on cognitive pathways for processing, storage and generalization of word structure, and vice versa. Bringing together a team of well-known scholars, this book examines the relationship between linguistic cognition and the morphological diversity found in the world's languages. It includes research from across linguistic and cognitive science sub-disciplines that looks at the nature of typological diversity and its relationship to cognition, touching on concepts such as complexity, interconnectedness within systems, and emergent organization. Chapters employ experimental, computational, corpus-based and theoretical methods to examine specific morphological phenomena, and an overview chapter provides a synthesis of major research trends, contextualizing work from different methodological and philosophical perspectives. Offering a novel perspective on how cognition contributes to our understanding of word structure, it is essential reading for psycholinguists, theoreticians, typologists, computational modelers and cognitive scientists.
Article
Despite the recent popularity of contextual word embeddings, static word embeddings still dominate lexical semantic tasks, making their study of continued relevance. A widely adopted family of such static word embeddings is derived by explicitly factorizing the Pointwise Mutual Information (PMI) weighting of the cooccurrence matrix. As unobserved cooccurrences lead PMI to negative infinity, a common workaround is to clip negative PMI at 0. However, it is unclear what information is lost by collapsing negative PMI values to 0. To answer this question, we isolate and study the effects of negative (and positive) PMI on the semantics and geometry of models adopting factorization of different PMI matrices. Word and sentence-level evaluations show that only accounting for positive PMI in the factorization strongly captures both semantics and syntax, whereas using only negative PMI captures little of semantics but a surprising amount of syntactic information. Results also reveal that incorporating negative PMI induces stronger rank invariance of vector norms and direction, as well as improved rare word representations.
Article
Children's linguistic knowledge and the learning mechanisms by which they acquire it grow substantially in infancy and toddlerhood, yet theories of word learning largely fail to incorporate these shifts. Moreover, researchers’ often-siloed focus on either familiar word recognition or novel word learning limits the critical consideration of how these two relate. As a step toward a mechanistic theory of language acquisition, we present a framework of “learning through processing” and relate it to the prevailing methods used to assess children's early knowledge of words. Incorporating recent empirical work, we posit a specific, testable timeline of qualitative changes in the learning process in this interval. We conclude with several challenges and avenues for building a comprehensive theory of early word learning: better characterization of the input, reconciling results across approaches, and treating lexical knowledge in the nascent grammar with sufficient sophistication to ensure generalizability across languages and development. Expected final online publication date for the Annual Review of Linguistics, Volume 8 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Full-text available
Poverty of the stimulus has been at the heart of ferocious and tear-filled debates at the nexus of psychology, linguistics, and philosophy for decades. This review is intended as a guide for readers without a formal linguistics or philosophy background, focusing on what poverty of the stimulus is and how it’s been interpreted, which is traditionally where the tears have come in. I discuss poverty of the stimulus from the perspective of language development, highlighting how poverty of the stimulus relates to expectations about learning and the data available to learn from. I describe common interpretations of what poverty of the stimulus means when it occurs, and approaches for determining when poverty of the stimulus is in fact occurring. I close with illustrative examples of poverty of the stimulus in the domains of syntax, lexical semantics, and phonology, and discuss the value of identifying instances of poverty of the stimulus when it comes to understanding language development.
Chapter
Full-text available
Subject control in non-finite adjuncts is observed across languages (as in ‘John called Mary after drawing a picture’). Research on the acquisition of adjunct control has generally focused on the relevant grammatical components and when they are acquired. This paper considers these components in the context of the linguistic input to ask how control in adjuncts is acquired. Although adjunct control is available in the input, the instances themselves do not provide evidence for abstract syntactic relations. Implications are considered for linguistic dependencies and the evidence in the input.
Article
Non-adjacent dependencies are ubiquitous in language, but difficult to learn in artificial language experiments in the lab. Previous research suggests that non-adjacent dependencies are more learnable given structural support in the input – for instance, in the presence of high variability between dependent items. However, not all non-adjacent dependencies occur in supportive contexts. How are such regularities learned? One possibility is that learning one set of non-adjacent dependencies can highlight similar structures in subsequent input, facilitating the acquisition of new non-adjacent dependencies that are otherwise difficult to learn. In three experiments, we show that prior exposure to learnable non-adjacent dependencies - i.e., dependencies presented in a learning context that has been shown to facilitate discovery - improves learning of novel non-adjacent regularities that are typically not detected. These findings demonstrate how the discovery of complex linguistic structures can build on past learning in supportive contexts.
Article
Full-text available
A central question in language acquisition is how children master sentence types that they have seldom, if ever, heard. Here we report the findings of a pre-registered, randomised, single-blind intervention study designed to test the prediction that, for one such sentence type, complex questions (e.g., Is the crocodile who’s hot eating? ), children could combine schemas learned, on the basis of the input, for complex noun phrases ( the [THING] who’s [PROPERTY] ) and simple questions ( Is [THING] [ACTION]ing? ) to yield a complex-question schema ( Is [the [THING] who’s [PROPERTY]] ACTIONing? ). Children aged 4;2 to 6;8 ( M = 5;6, SD = 7.7 months) were trained on simple questions (e.g., Is the bird cleaning? ) and either (Experimental group, N = 61) complex noun phrases (e.g., the bird who’s sad ) or (Control group, N = 61) matched simple noun phrases (e.g., the sad bird ). In general, the two groups did not differ on their ability to produce novel complex questions at test. However, the Experimental group did show (a) some evidence of generalising a particular complex NP schema ( the [THING] who’s [PROPERTY] as opposed to the [THING] that’s [PROPERTY] ) from training to test, (b) a lower rate of auxiliary-doubling errors (e.g., *Is the crocodile who’s hot is eating? ), and (c) a greater ability to produce complex questions on the first test trial. We end by suggesting some different methods – specifically artificial language learning and syntactic priming – that could potentially be used to better test the present account.
Chapter
The question of complexity, as in what makes one language more 'complex' than another, is a long-established topic of debate amongst linguists. Recently, this issue has been complemented with the view that languages are complex adaptive systems, in which emergence and self-organization play major roles. However, few students of the phenomenon have gone beyond the basic assessment of the number of units and rules in a language (what has been characterized as 'bit complexity') or shown some familiarity with the science of complexity. This book reveals how much can be learned by overcoming these limitations, especially by adopting developmental and evolutionary perspectives. The contributors include specialists of language acquisition, evolution and ecology, grammaticization, phonology, and modeling, all of whom approach languages as dynamical, emergent, and adaptive complex systems.
Chapter
Full-text available
Throughout most of the 20th century, analytical and reductionist approaches have dominated in biological, social, and humanistic sciences, including linguistics and communication. We generally believed we could account for fundamental phenomena in invoking basic elemental units. Although the amount of knowledge generated was certainly impressive, we have also seen limitations of this approach. Discovering the sound formants of human languages, for example, has allowed us to know vital aspects of the ‘material’ plane of verbal codes, but it tells us little about significant aspects of their social functions. I firmly believe, therefore, that alongside a linguistics that looks ‘inward’ there should also be a linguistics that looks ‘outward’, or one even that is constructed ‘from the outside’, a linguistics that I refer to elsewhere as ‘holistic’ though it could be identified by a different name. My current vision is to promote simultaneously the perspective that goes from the part to the whole and that which goes from the whole to the parts, i.e., both from the top down and from the bottom up. This goal is shared with other disciplines which recognize that many phenomena related to life are interwoven, self-organising, emergent and processual. Thus, we need to re-examine how we have conceived of reality, both the way we have looked at it and the images we have used to talk about it. Several approaches now grouped under the label of complexity have been elaborated towards this objective of finding new concepts and ways of thinking that better fit the complex organisation of facts and events. https://books.google.es/books?id=b1pEDgAAQBAJ&pg=PA218&lpg=PA218&dq=complexity+and+language+a+sociocognitive&source=bl&ots=5GyrP3_8Gs&sig=wKZxQLPJ01MHGubSSl63J82Y3dU&hl=ca&sa=X&ved=2ahUKEwjCwaW12dPbAhUGwxQKHQ1uDJ0Q6AEwB3oECAcQAQ#v=onepage&q=complexity%20and%20language%20a%20sociocognitive&f=false
Article
Full-text available
Fluent speakers do not appear to have conscious knowledge of the linguistic categories and declarative rules that linguists use to describe grammar and that most psycholinguists have adopted for explaining language functioning. The implication derived in this paper is that these categories and rules are deprived of psychological reality. It is proposed that a psychologically real morphosyntax is concerned with sentence surface. The pragmatic framework and the semantic relational matrix at the onset of sentence production are converted directly into syntagmatic patterns, flexibly distributed along the sentence line. These patterns are reflected in probabilistic associations between words and sequences of words. Natural morphosyntax is learned incidentally through implicit procedural learning. Children extract frequent syntagmatic patterns from adapted adult input. The resulting knowledge is stored in procedural memory. The cortico-striatal -cerebellar system of the brain has the computational power necessary to deal with sentence sequential patterning and associative regularities.
Chapter
Full-text available
Ellipsis constructions present many challenges to incremental sentence processing. One challenge is that most partial sentences that are compatible with ellipsis continuations are also compatible with non-ellipsis continuations. Example (1) is a case in point. This partial sentence is compatible with the ellipsis of the material following the wh-phrase in an embedded interrogative as in (1a) (a construction known as sluicing in the syntax literature), and various non-ellipsis continuations such as those in (1b) and (1c). John was writing something, but I don't know what … a. Ellipsis b…he was writing c… motivates him to write so much. Furthermore, there does not seem to be an obvious cue that can tell the parser whether ellipsis follows or not. In other words, environments where ellipsis is typically found show structural ambiguity. Therefore, there is always a danger that inducing ellipsis may turn out to be an incorrect analysis, and it may require structural reanalysis. Such reanalysis is costly and is avoided by the parser whenever possible (Schneider and Phillips 2001; Sturt et al. 2001). This in turn suggests that it is always safer for the parser to choose a non-ellipsis structure, since it can rely on bottom-up information in non-ellipsis continuations. If this is the case, the parser should choose ellipsis if and only if bottom-up information confirms that ellipsis is there. That is, the parser should not induce or infer ellipsis incrementally.
Article
Full-text available
Além da tese de que a gramática das línguas naturais inclui um nível transformacional, o que distingue o programa Chomskyano de investigação em Teoria da Gramática das outras abordagens é a tese de que o conhecimento gramatical internalizado por todo o ser humano é parcialmente inato (i.e. parcialmente dado a priori por um sistema de viéses cognitivos tarefa-específicos da Gramática Universal), e não um subproduto de mecanismos auto-organizáveis de ‘inteligência geral’. Esta tese científica pode, em princípio, estar certa ou errada, e só pode ser questionada levando-se em conta a sua cobertura empírica e a lógica dos seus argumentos. No cerne desta questão está o Argumento de Pobreza de Estímulo (APS), cuja lógica tem sido alvo de inúmeros mal-entendidos por parte dos anti-inatistas, a exemplo de Geurts (2000), que deixa de reconhecer as distinções entre ‘conhecimento’ e ‘crença’, e entre cognição ‘consciente’ e ‘não-consciente’, as quais são cruciais para a compreensão da lógica do APS. O objetivo deste artigo é desfazer esse mal-entendido.
Article
Full-text available
A classic debate in cognitive science revolves around understanding how children learn complex linguistic patterns, such as restrictions on verb alternations and contractions, without negative evidence. Recently, probabilistic models of language learning have been applied to this problem, framing it as a statistical inference from a random sample of sentences. These probabilistic models predict that learners should be sensitive to the way in which sentences are sampled. There are two main types of sampling assumptions that can operate in language learning: strong and weak sampling. Strong sampling, as assumed by probabilistic models, assumes the learning input is drawn from a distribution of grammatical samples from the underlying language and aims to learn this distribution. Thus, under strong sampling, the absence of a sentence construction from the input provides evidence that it has low or zero probability of grammaticality. Weak sampling does not make assumptions about the distribution from which the input is drawn, and thus the absence of a construction from the input as not used as evidence of its ungrammaticality. We demonstrate in a series of artificial language learning experiments that adults can produce behavior consistent with both sets of sampling assumptions, depending on how the learning problem is presented. These results suggest that people use information about the way in which linguistic input is sampled to guide their learning.
Article
The nominal anaphoric element one has figured prominently in discussions of linguistic nativism because of an important argument advanced by C. L. Baker (1978). His argument has been frequently cited within the cognitive and linguistic sciences, and has provided the topic for a chain of experimental and computational psycholinguistics papers. Baker’s crucial grammaticality facts, though much repeated in the literature, have not been critically investigated. A corpus investigation shows that his claims are not true: one does not take only phrasal antecedents, but can also take nouns on their own, including semantically relational nouns, and can take various of-PP dependents of its own. We give a semantic analysis of anaphoric one that allows it to exhibit this kind of freedom, and we exhibit frequency evidence that goes a long way toward explaining why linguists have been inclined to regard phrases like the one of physics or three ones as ungrammatical when in fact (as corpus evidence shows) they are merely dispreferred relative to available grammatical alternatives. The main implication for the acquisition literature is that one of the most celebrated arguments from poverty of the stimulus is shown to be without force.*
Article
Full-text available
Contends that too much emphasis has been placed on grammar or syntax and too little on the semantics of children's language. A full-scale analysis of children's speech based on the logical tradition of model-theoretic semantics is described. To illustrate this approach, examples of the child's use of the definite article, adjectives, quantifiers, and propositional attitudes are presented, and conceptual and technical tools for studying these aspects of speech are described. The problems of paraphrase, context, processes, and theory verification that arise in the semantical analysis of children's speech are considered. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena. By inducing global knowledge indirectly from local co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren. LSA uses no prior linguistic or perceptual similarity knowledge; it is based solely on a general mathematical learning method that achieves powerful inductive effects by extracting the right number of dimensions (e.g., 300) to represent objects and contexts. Relations to other theories, phenomena, and problems are sketched.
Article
Full-text available
It is a striking fact that in humans the greatest learning occurs precisely at that point in time--childhood--when the most dramatic maturational changes also occur. This report describes possible synergistic interactions between maturational change and the ability to learn a complex domain (language), as investigated in connectionist networks. The networks are trained to process complex sentences involving relative clauses, number agreement, and several types of verb argument structure. Training fails in the case of networks which are fully formed and 'adultlike' in their capacity. Training succeeds only when networks begin with limited working memory and gradually 'mature' to the adult state. This result suggests that rather than being a limitation, developmental restrictions on resources may constitute a necessary prerequisite for mastering certain complex domains. Specifically, successful learning may depend on starting small.
Article
Full-text available
In order to acquire a lexicon, young children must segment speech into words, even though most words are unfamiliar to them. This is a non-trivial task because speech lacks any acoustic analog of the blank spaces between printed words. Two sources of information that might be useful for this task are distributional regularity and phonotactic constraints. Informally, distributional regularity refers to the intuition that sound sequences that occur frequently and in a variety of contexts are better candidates for the lexicon than those that occur rarely or in few contexts. We express that intuition formally by a class of functions called DR functions. We then put forth three hypotheses: First, that children segment using DR functions. Second, that they exploit phonotactic constraints on the possible pronunciations of words in their language. Specifically, they exploit both the requirement that every word must have a vowel and the constraints that languages impose on word-initial and word-final consonant clusters. Third, that children learn which word-boundary clusters are permitted in their language by assuming that all permissible word-boundary clusters will eventually occur at utterance boundaries. Using computational simulation, we investigate the effectiveness of these strategies for segmenting broad phonetic transcripts of child-directed English. The results show that DR functions and phonotactic constraints can be used to significantly improve segmentation. Further, the contributions of DR functions and phonotactic constraints are largely independent, so using both yields better segmentation than using either one alone. Finally, learning the permissible word-boundary clusters from utterance boundaries does not degrade segmentation performance.
Article
Full-text available
What kinds of knowledge underlie the use of language and how is this knowledge acquired? Linguists equate knowing a language with knowing a grammar. Classic "poverty of the stimulus" arguments suggest that grammar identification is an intractable inductive problem and that acquisition is possible only because children possess innate knowledge of grammatical structure. An alternative view is emerging from studies of statistical and probabilistic aspects of language, connectionist models, and the learning capacities of infants. This approach emphasizes continuity between how language is acquired and how it is used. It retains the idea that innate capacities constrain language learning, but calls into question whether they include knowledge of grammatical structure.
Article
Full-text available
We apply a computational theory of concept learning based on Bayesian inference (Tenenbaum, 1999) to the problem of learning words from examples. The theory provides a framework for understanding how people can generalize meaningfully from just one or a few positive examples of a novel word, without assuming that words are mutually exclusive or map only onto basic-level categories.
Article
1. My thanks go to Joe Grimes for comments made on a previous draft of this review, to Ron Langacker for some related discussion, and to Kenneth Holmqvist for a complimentary copy of his work. 2. Strictly speaking, Holmqvist did not implement Langacker's model but rather based his implementation on that model (Holmqvist 1993:3). Another related work is that of George Dunbar, who has drawn extensively on Langacker's approach for his work on the cognitive lexicon (Dunbar 1991).
Article
Naturally occurring speech contains only a limited amount of complex recursive structure, and this is reflected in the empirically documented difficulties that people experience when processing such structures. We present a connectionist model of human performance in processing recursive language structures. The model is trained on simple artificial languages. We find that the qualitative performance profile of the model matches human behavior, both on the relative difficulty of center-embedding and cross-dependency, and between the processing of these complex recursive structures and right-branching recursive constructions. We analyze how these differences in performance are reflected in the internal representations of the model by performing discriminant analyses on these representations both before and after training. Furthermore, we show how a network trained to process recursive structures can also generate such structures in a probabilistic fashion. This work suggests a novel explanation of people’s limited recursive performance, without assuming the existence of a mentally represented competence grammar allowing unbounded recursion.
Article
This article examines a type of argument for linguistic nativism that takes the following form: (i) a fact about some natural language is exhibited that al- legedly could not be learned from experience without access to a certain kind of (positive) data; (ii) it is claimed that data of the type in question are not found in normal linguistic experience; hence (iii) it is concluded that people cannot be learning the language from mere exposure to language use. We ana- lyze the components of this sort of argument carefully, and examine four exem- plars, none of which hold up. We conclude that linguists have some additional work to do if they wish to sustain their claims about having provided support for linguistic nativism, and we offer some reasons for thinking that the relevant kind of future work on this issue is likely to further undermine the linguistic nativist position.
Article
In the Bayesian framework, a language learner should seek a grammar that explains observed data well and is also a priori probable. This paper proposes such a measure of prior probability. Indeed it develops a full statistical framework for lexicalized syntax. The learner’s job is to discover the system of probabilistic transformations (often called lexical redundancy rules) that underlies the patterns of regular and irregular syntactic constructions listed in the lexicon. Specifically, the learner discovers what transformations apply in the language, how often they apply, and in what contexts. It considers simpler systems of transformations to be more probable a priori. Experiments show that the learned transformations are more effective than previous statistical models at predicting the probabilities of lexical entries, especially those for which the learner had no direct evidence.
Article
How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena. By inducing global knowledge indirectly from local co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren. LSA uses no prior linguistic or perceptual similarity knowledge; it is based solely on a general mathematical learning method that achieves powerful inductive effects by extracting the right number of dimensions (e.g., 300) to represent objects and contexts. Relations to other theories, phenomena, and problems are sketched.
Article
A psychological space is established for any set of stimuli by determining metric distances between the stimuli such that the probability that a response learned to any stimulus will generalize to any other is an invariant monotonic function of the distance between them. To a good approximation, this probability of generalization (i) decays exponentially with this distance, and (ii) does so in accordance with one of two metrics, depending on the relation between the dimensions along which the stimuli vary. These empirical regularities are mathematically derivable from universal principles of natural kinds and probabilistic geometry that may, through evolutionary internalization, tend to govern the behaviors of all sentient organisms.
Article
This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive examples on average ("sample complexity") it will take for a learner to correctly identify a target language with high probability. We show how sample complexity varies with input distributions and learning regimes. In particular we find that the average time to converge under reasonable language input distributions for a simple three-parameter system first described by Gibson and Wexler (1994) is psychologically plausible, in the range of 100-150 positive examples. We further find that a simple random step algorithm-that is, simply jumping from one language hypothesis to another rather than changing one parameter at a time-works faster and always converges to the right target language, in contrast to the single-step, local parameter setting method advocated in some recent work.
Article
It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classes of languages are not learnable. However, Gold's results do not apply under the rather common assumption that language presentation may be modeled as a stochastic process. Indeed, Elman (Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition 48, 71-99) demonstrated that a simple recurrent connectionist network could learn an artificial grammar with some of the complexities of English, including embedded clauses, based on performing a word prediction task within a stochastic environment. However, the network was successful only when either embedded sentences were initially withheld and only later introduced gradually, or when the network itself was given initially limited memory which only gradually improved. This finding has been taken as support for Newport's 'less is more' proposal, that child language acquisition may be aided rather than hindered by limited cognitive resources. The current article reports on connectionist simulations which indicate, to the contrary, that starting with simplified inputs or limited memory is not necessary in training recurrent networks to learn pseudonatural languages; in fact, such restrictions hinder acquisition as the languages are made more English-like by the introduction of semantic as well as syntactic constraints. We suggest that, under a statistical model of the language environment, Gold's theorem and the possible lack of explicit negative evidence do not implicate innate, linguistic-specific mechanisms. Furthermore, our simulations indicate that special teaching methods or maturational constraints may be unnecessary in learning the structure of natural language.
Article
Many developmental psycholinguists assume that young children have adult syntactic competence, this assumption being operationalized in the use of adult-like grammars to describe young children's language. This "continuity assumption" has never had strong empirical support, but recently a number of new findings have emerged - both from systematic analyses of children's spontaneous speech and from controlled experiments - that contradict it directly. In general, the key finding is that most of children's early linguistic competence is item based, and therefore their language development proceeds in a piecemeal fashion with virtually no evidence of any system-wide syntactic categories, schemas, or parameters. For a variety of reasons, these findings are not easily explained in terms of the development of children's skills of linguistic performance, pragmatics, or other "external" factors. The framework of an alternative, usage-based theory of child language acquisition - relying explicitly on new models from Cognitive-Functional Linguistics - is presented.
Article
Shepard has argued that a universal law should govern generalization across different domains of perception and cognition, as well as across organisms from different species or even different planets. Starting with some basic assumptions about natural kinds, he derived an exponential decay function as the form of the universal generalization gradient, which accords strikingly well with a wide range of empirical data. However, his original formulation applied only to the ideal case of generalization from a single encountered stimulus to a single novel stimulus, and for stimuli that can be represented as points in a continuous metric psychological space. Here we recast Shepard's theory in a more general Bayesian framework and show how this naturally extends his approach to the more realistic situation of generalizing from multiple consequential stimuli with arbitrary representational structure. Our framework also subsumes a version of Tversky's set-theoretic model of similarity, which is conventionally thought of as the primary alternative to Shepard's continuous metric space model of similarity and generalization. This unification allows us not only to draw deep parallels between the set-theoretic and spatial approaches, but also to significantly advance the explanatory power of set-theoretic models.
Article
In learning the meanings of words, children are guided by a set of constraints that give privilege to some potential meanings over others. These word-learning constraints are sometimes viewed as part of a specifically linguistic endowment. However, several recent computational models suggest concretely how word-learning - constraints included - might emerge from more general aspects of cognition, such as associative learning, attention and rational inference. This article reviews these models, highlighting the link between general cognitive forces and the word-learning they subserve. Ultimately, these cognitive forces might leave their mark not just on language learning, but also on language itself: in constraining the space of possible meanings, they place limits on cross-linguistic semantic variation.
Article
Generative linguistic theory stands on the hypothesis that grammar cannot be acquired solely on the basis of an analysis of the input, but depends, in addition, on innate structure within the learner to guide the process of acquisition. This hypothesis derives from a logical argument, however, and its consequences have never been examined experimentally with infant learners. Challenges to this hypothesis, claiming that an analysis of the input is indeed sufficient to explain grammatical acquisition, have recently gained attention. We demonstrate with novel experimentation the insufficiency of this countervailing view. Focusing on the syntactic structures required to determine the antecedent for the pronoun one, we demonstrate that the input to children does not contain sufficient information to support unaided learning. Nonetheless, we show that 18-month-old infants do have command of the syntax of one. Because this syntactic knowledge could not have been gleaned exclusively from the input, infants' mastery of this aspect of syntax constitutes evidence for the contribution of innate structure within the learner in acquiring a grammar.
Article
Given a small number of examples of sceneutterance pairs of a novel verb, language learners can learn its syntactic and semantic features. Syntactic and semantic bootstrapping hypotheses both rely on cross-situational observation to hone in on the ambiguity present in a single observation. In this paper, we cast the distributional evidence from scenes and syntax in a unified Bayesian probablistic framework. Unlike previous approaches to modeling lexical acquisition, our framework uniquely: (1) models learning from only a small number of sceneutterance pairs (2) utilizes and integrates both syntax and semantic evidence, thus reconciling the apparent tension between syntactic and semantic bootststrapping approaches (3) robustly handles noise (4) makes prior and acquired knowledge distinctions explicit, through specification of the hypothesis space, prior and likelihood probability distributions.
Introduction Explanation in linguistics: The logical problem of language acquisition
  • N Hornstein
  • D Lightfoot
Hornstein, N., & Lightfoot, D. (1981). Introduction. In N. Hornstein, & D. Lightfoot (Eds.), Explanation in linguistics: The logical problem of language acquisition. London: Longman.
Philosophical essay on probabilities. Translated by A. Dale (1995) from the fifth French edition
  • P.-S Laplace
Laplace, P.-S (1825). Philosophical essay on probabilities. Translated by A. Dale (1995) from the fifth French edition. New York: Springer.
Language acquisition Language: An invitation to cognitive science Empirical assessment of stimulus poverty arguments
  • S Pinker
  • G Pullum
  • B Scholz
Pinker, S. (1995). Language acquisition. In L. Gleitman, & M. Liberman (Eds.), Language: An invitation to cognitive science (2nd ed., Vol. 1, pp. 135–182). Cambridge, MA: MIT Press. Pullum, G., & Scholz, B. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19, 9–50.
Language acquisition Language: An invitation to cognitive science
  • S Pinker
Pinker, S. (1995). Language acquisition. In L. Gleitman, & M. Liberman (Eds.), Language: An invitation to cognitive science (2nd ed., Vol. 1, pp. 135–182). Cambridge, MA: MIT Press.
Explanation in linguistics: The logical problem of language acquisition
  • N Hornstein
  • D Lightfoot
Hornstein, N., & Lightfoot, D. (1981). Introduction. In N. Hornstein, & D. Lightfoot (Eds.), Explanation in linguistics: The logical problem of language acquisition. London: Longman.
Word learning as Bayesian inference
  • J Tenenbaum
  • F Xu
Tenenbaum, J., & Xu, F. (2000). Word learning as Bayesian inference. In L. Gleitman, & A. Joshi (Eds.), Proceedings of the 22nd Annual Conference of the Cognitive Science Society (pp. 517-522). Mahwah, NJ: Lawrence Erlbaum.