Article

Learning the unlearnable: The role of missing evidence

October 2004
Cognition 93(2):147-55; discussion 157-65

October 2004
93(2):147-55; discussion 157-65

DOI:10.1016/j.cognition.2003.12.003

Source
PubMed

Authors:

Susanne Gahl

University of California, Berkeley

Syntactic knowledge is widely held to be partially innate, rather than learned. In a classic example, it is sometimes argued that children know the proper use of anaphoric one, although that knowledge could not have been learned from experience. Lidz et al. [Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn't have learned: Experimental evidence for syntactic structure at 18 months. Cognition, 89, B65-B73.] pursue this argument, and present corpus and experimental evidence that appears to support it; they conclude that specific aspects of this knowledge must be innate. We demonstrate, contra Lidz et al., that this knowledge may in fact be acquired from the input, through a simple Bayesian learning procedure. The learning procedure succeeds because it is sensitive to the absence of particular input patterns--an aspect of learning that is apparently overlooked by Lidz et al. More generally, we suggest that a prominent form of the "argument from poverty of the stimulus" suffers from the same oversight, and is as a result logically unsound.

Acquisition without evidence: English infinitives and poverty of stimulus in adult second language acquisition

Article

Jan 2020

This article provides a Poverty of Stimulus argument for the participation of a dedicated linguistic module in second language acquisition. We study the second language (L2) acquisition of a subset of English infinitive complements that exhibit the following properties: (a) they present an intricate web of grammatical constraints while (b) they are highly infrequent in corpora, (c) they lack visible features that would make them salient, and (d) they are communicatively superfluous. We report on an experiment testing the knowledge of some infinitival constructions by near-native adult first language (L1) Spanish / L2 English speakers. Learners demonstrated a linguistic system that includes contrasts based on subtle restrictions in the L2, including aspect restrictions in Raising to Object. These results provide evidence that frequency and other cognitive or environmental factors are insufficient to account for the acquisition of the full spectrum of English infinitivals. This leads us to the conclusion that a domain-specific linguistic faculty is required.

Putting old tools to novel uses: The role of form accessibility in semantic extension

Article

Aug 2017
COGNITIVE PSYCHOL

Nativism and the poverty of the stimulus: A demanding argument for the ‘innateness’ of language

Article

Aug 2017

Domi Dessaix

The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one

Article

Full-text available

Mar 2016
LANGUAGE

Language learners are often faced with a scenario where the data allow multiple generalizations, even though only one is actually correct. One promising solution to this problem is that children are equipped with helpful learning strategies that guide the types of generalizations made from the data. Two successful approaches in recent work for identifying these strategies have involved (i) expanding the set of informative data to include indirect positive evidence, and (ii) using observable behavior as a target state for learning. We apply both of these ideas to the case study of English anaphoric one, using computationally modeled learners that assume one’s antecedent is the same syntactic category as one and form their generalizations based on realistic data. We demonstrate that a learner that is biased to include indirect positive evidence coming from other pronouns in English can generate eighteen-month-old looking-preference behavior. Interestingly, we find that the knowledge state responsible for this target behavior is a context-dependent representation for anaphoric one, rather than the adult representation, but this immature representation can suffice in many communicative contexts involving anaphoric one. More generally, these results suggest that children may be leveraging broader sets of data to make the syntactic generalizations leading to their observed behavior, rather than selectively restricting their input. We additionally discuss the components of the learning strategies capable of producing the ob-served behavior, including their possible origin and whether they may be useful for making other linguistic generalizations.

Language-Specific Constraints on Scope Interpretation in First Language Acquisition

Article

Aug 2007

Takuya Goro

Syntactic Islands and Learning Biases: Combining Experimental Syntax and Computational Modeling to Investigate the Language Acquisition Problem

Article

Full-text available

Jan 2013

The induction problems facing language learners have played a central role in debates about the types of learning biases that exist in the human brain. Many linguists have argued that some of the learning biases necessary to solve these language induction problems must be both innate and language-specific (i.e., the Universal Grammar (UG) hypothesis). Though there have been several recent high-profile investigations of the necessary learning bias types for different linguistic phenomena, the UG hypothesis is still the dominant assumption for a large segment of linguists due to the lack of studies addressing central phenomena in generative linguistics. To address this, we focus on how to learn constraints on long-distance dependencies, also known as syntactic island constraints. We use formal acceptability judgment data to identify the target state of learning for syntactic island constraints and conduct a corpus analysis of child-directed data to affirm that there does appear to be an induction problem when learning these constraints. We then create a computational learning model that implements a learning strategy capable of successfully learning the pattern of acceptability judgments observed in formal experiments, based on realistic input. Importantly, this model does not explicitly encode syntactic constraints. We discuss learning biases required by this model in detail as they highlight the potential problems posed by syntactic island effects for any theory of syntactic acquisition. We find that, although the proposed learning strategy requires fewer complex and domain-specific components than previous theories of syntactic island learning, it still raises difficult questions about how the specific biases required by syntactic islands arise in the learner. We discuss the consequences of these results for theories of acquisition and theories of syntax.

How Far Can Indirect Evidence Take Us? Anaphoric One Revisited

Article

Full-text available

A controversial claim in linguistics is that children face an in-duction problem, which is often used to motivate the need for Universal Grammar. English anaphoric one has been argued to present this kind of induction problem. While the original solution was that children have innate domain-specific knowl-edge about the structure of language, more recent studies have suggested alternative solutions involving domain-specific in-put restrictions coupled with domain-general learning abilities. We consider whether indirect evidence coming from a broader input set could obviate the need for such input restrictions. We present an online Bayesian learner that uses this broader input set, and discover it can indeed reproduce the correct learning behavior for anaphoric one, given child-directed speech. We discuss what is required for acquisition success, and how this impacts the larger debate about Universal Grammar.

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics

Preprint

Aug 2019

In distributional semantics, the pointwise mutual information ($\mathit{PMI}$) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as $\mathit{PMI}$ goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative $\mathit{PMI}$ ($\mathit{\texttt{-} PMI}$) at $0$, also known as Positive $\mathit{PMI}$ ($\mathit{PPMI}$). In this paper, we investigate alternative ways of dealing with $\mathit{\texttt{-} PMI}$ and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different $\mathit{PMI}$ matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive $\mathit{PMI}$ (or both), we find that most of the encoded semantics and syntax come from positive $\mathit{PMI}$, in contrast to $\mathit{\texttt{-} PMI}$ which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel $PMI$ variants and grounding the popular $PPMI$ measure.

The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one: Supplementary Material

Article

Mar 2016
LANGUAGE

Language learners are often faced with a scenario where the data allow multiple generalizations, even though only one is actually correct. One promising solution to this problem is that children are equipped with helpful learning strategies that guide the types of generalizations made from the data. Two successful approaches in recent work for identifying these strategies have involved (i) expanding the set of informative data to include indirect positive evidence , and (ii) using observable behavior as a target state for learning. We apply both of these ideas to the case study of English anaphoric one , using computationally modeled learners that assume one ’s antecedent is the same syntactic category as one and form their generalizations based on realistic data. We demonstrate that a learner that is biased to include indirect positive evidence coming from other pronouns in English can generate eighteen-month-old looking-preference behavior. Interestingly, we find that the knowledge state responsible for this target behavior is a context-dependent representation for anaphoric one , rather than the adult representation, but this immature representation can suffice in many communicative contexts involving anaphoric one . More generally, these results suggest that children may be leveraging broader sets of data to make the syntactic generalizations leading to their observed behavior, rather than selectively restricting their input. We additionally discuss the components of the learning strategies capable of producing the observed behavior, including their possible origin and whether they may be useful for making other linguistic generalizations.

Language Models Can Learn Exceptions to Syntactic Rules

Preprint

Jun 2023

Artificial neural networks can generalize productively to novel contexts. Can they also learn exceptions to those productive rules? We explore this question using the case of restrictions on English passivization (e.g., the fact that "The vacation lasted five days" is grammatical, but "*Five days was lasted by the vacation" is not). We collect human acceptability judgments for passive sentences with a range of verbs, and show that the probability distribution defined by GPT-2, a language model, matches the human judgments with high correlation. We also show that the relative acceptability of a verb in the active vs. passive voice is positively correlated with the relative frequency of its occurrence in those voices. These results provide preliminary support for the entrenchment hypothesis, according to which learners track and uses the distributional properties of their input to learn negative exceptions to rules. At the same time, this hypothesis fails to explain the magnitude of unpassivizability demonstrated by certain individual verbs, suggesting that other cues to exceptionality are available in the linguistic input.

When exceptions matter: bilinguals regulate their dominant language to exploit structural constraints in sentence production

Article

Full-text available

Aug 2022

What we say generally follows distributional regularities, such as learning to avoid "the asleep dog" because we hear "the dog that's asleep" in its place. However, not everyone follows such regularities. We report data on English monolinguals and Spanish-English bilinguals to examine how working memory mediates variation in a-adjective usage (asleep, afraid), which, unlike typical adjectives (sleepy, frightened), tend to resist attributive use. We replicate previous work documenting this tendency in a sentence production task. Critically, for all speakers, the tendency to use a-adjectives attributively or non-attributively was modulated by individual differences in working memory. But for bilinguals, a-adjective use was additionally modulated by an interaction between working memory and category fluency in the dominant language (English), revealing an interactive role of domain-general and language-related mechanisms that enable regulation of competing (i.e. attributive and non-attributive) alternatives. These results show how bilingualism reveals fundamental variation in language use, memory, and attention.

Emerging linguistic universals in communicating neural network agents

Thesis

Mar 2021

Rahma Chaabouni

The ability to acquire and produce a language is a key component of intelligence. If communication is widespread among animals, human language is unique in its productivity and complexity. By better understanding the source of natural language, one can use this knowledge to build better interactive AI models that can acquire human languages as rapidly and efficiently as children. In this manuscript, we build up on the emergent communication field to investigate the well-standing question of the source of natural language. In particular, we use communicating neural networks that can develop a language to solve a collaborative task. Comparing the emergent language properties with human cross-linguistic regularities can provide answers to the crucial questions of the origin and evolution of natural language. Indeed, if neural networks develop a cross-linguistic regularity spontaneously, then the latter would not depend on specific biological constraints. From the cognitive perspective, looking at neural networks as another expressive species can shed light on the source of cross-linguistic regularities – a fundamental research interest in cognitive science and linguistics. From the machine learning perspective, endowing artificial models with human constraints necessary to evolve communicative protocols as productive and robust as natural language would encourage the development of better interactive AI models.In this manuscript, we focus on studying four cross-linguistic regularities related to word length, word order, semantic categorization, and compositionality. Across the different studies, we find that some of these regularities arise spontaneously while others are missing in neural networks’ languages. We connect the former case to the presence of shared communicative constraints such as the discrete nature of the communication channel. On the latter, we relate the absence of human-like regularities to the lack of constraints either on the learners’ side (e.g., the least-effort constraints) or language functionality (e.g., transmission of information). In sum, this manuscript provides several case studies demonstrating how we can use successful neural network models to tackle crucial questions about the origin and evolution of our language. It also stresses the importance of mimicking the way humans learn their language in artificial agents’ training to induce better learning procedures for neural networks, so that they can evolve an efficient and open-ended communication protocol.

L2 transfer of L1 island-insensitivity: The case of Norwegian

Article

Full-text available

Sep 2020

Norwegian allows filler-gap dependencies into embedded questions, which are islands for filler-gap dependency formation in English. We ask whether there is evidence that Norwegian learners of English transfer the functional structure that permits island violations from their first language (L1) to their second language (L2). In two acceptability judgment studies, we find that Norwegians are more likely to accept ‘island-violating’ filler-gap dependencies in L2 English if the corresponding filler-gap dependency is acceptable in Norwegian: Norwegian learners variably accept English sentences with dependencies into embedded questions, but not into subject phrases. These results are consistent with models that permit transfer of abstract functional structure. Norwegians are still less likely to accept filler-gap dependencies into English embedded questions than Norwegian embedded questions. We interpret the latter finding as evidence that, despite transfer, Norwegian speakers may partially restructure their L2 English analysis. We discuss how indirect positive evidence may play a role in helping learners restructure.

Gradual syntactic triggering: The gradient parameter hypothesis Gradual syntactic triggering: The gradient parameter hypothesis

Article

Full-text available

Oct 2020

In this article, we propose a reconceptualization of the principles and parameters (P&P) framework. We argue that in lieu of discrete parameter values, a parameter value exists on a gradient plane that encodes a learner's confidence that a particular parametric structure licenses the utterances in the learner's linguistic input. Crucially, this gradient parameter hypothesis obviates the need for default parameter values. Default parameter values can be put to use effectively from the perspective of linguistic learnability but are lacking in terms of empirical and theoretical consistency. We present findings from a computational implementation of a gradient P&P learner. The findings suggest that the gradient parameter hypothesis provides the basis for a viable alternative to existing computational models of language acquisition in the classic P&P paradigm. We close with a brief discussion of how a gradient parameter space offers a path to address shortcomings that have been attributed to the P&P framework. ARTICLE HISTORY

Running headline: Morphosyntax learning and processing Natural morphosyntax The case for implicit surface learning and processing 2

Article

Full-text available

Jul 2020

Jean Adolphe Rondal

Native fluent speakers do not appear to have conscious knowledge of the linguistic categories and declarative rules that structural linguists use to describe grammar and that most psycholinguists have adopted for explaining language functioning. The implication derived in this paper is that these categories and rules are deprived of psychological reality. It is proposed that a psychologically real morphosyntax is concerned with sentence surface. The pragmatic framework and the semantic relational matrix at the onset of sentence production are converted directly into syntagmatic patterns, flexibly distributed along the sentence line, according to the distributional constraints of the tongue. These constraints are reflected in probabilistic associations between words and sequences of words. Natural morphosyntax is learned incidentally through implicit procedural learning. Children extract frequent syntagmatic patterns from adapted adult input. The resulting knowledge is stored in procedural memory. The cortico-striatal-cerebellar system of the brain has the computational power necessary to deal with sentence sequential patterning and associative regularities. Keywords: morphosyntax, language learnability, syntagmatic processing, probabilistic associations, pattern extraction, implicit procedural learning and memory. 3 Résumé Il est facile de vérifier que les sujets parlants ordinaires et fluents dans leur langue ne dispose pas d'une connaissance claire des catégories linguistiques et des règles déclaratives que la linguistique structurale utilise couramment pour décrire les faits de grammaire et que la plupart des psycholinguistes ont adopté pour expliquer le fonctionnement langagier. L'implication émergeant de la discussion présentée ici est que ces catégories et règles n'ont pas de réalité psychologique. On propose alternativement que la morphosyntaxe naturelle est concernée exclusivement par la surface des énoncés. Un encadrement pragmatique et une matrice sémantique figurent, certes, au point de départ de la production des énoncés. Ils sont convertis directement en patrons syntagmatiques, distribués flexiblement tout au long the la ligne phrastique en accord avec les contraintes distributionnelles de la langue. Ces contraintes sont reflétées dans les associations séquentielles existant entre les mots et séquences de mots dans une langue particulière. la morphosyntaxe est acquise incidemment selon un apprentissage procédural implicite. Les enfants extraient les patrons syntagmatiques, en commençant par les plus fréquents, à partir d'un input langagier adulte adapté à leur niveau de développement. Les connaissances résultantes sont stockées en mémoire procédurale ; le système cérébral cortico-striatal dispose du pouvoir computationnel nécessaire pour traiter les patrons séquentiels et les régularités associatives impliquées dans la morphosyntaxe des langues naturelles ; Mots-clés : morphosyntaxe, apprenabilité du langage, traitement syntagmatique, associations probabilistes, extraction de patron, apprentissage et mémoire procédurale implicite 4

Four Examples of Pseudoscience

Article

Aug 2019

Marcos Villavicencio Sánchez

A relevant issue in the philosophy of science is the demarcation problem: how to distinguish science from nonscience, and, more specifically, science from pseudoscience. Sometimes, the demarcation problem is debated from a very general perspective, proposing demarcation criteria to separate science from pseudoscience, but without discussing any specific field in detail. This article aims to focus the demarcation problem on particular disciplines or theories. After considering a set of demarcation criteria, four pseudosciences are examined: psychoanalysis, speculative evolutionary psychology, universal grammar, and string theory. It is concluded that these theoretical frameworks do not meet the requirements to be considered genuinely scientific.

The role of input revisited: Nativist versus usage-based models

Article

Apr 2009

Eve Zyzik

Learning Morphological Constructions

Chapter

Full-text available

Apr 2018

Vsevolod Kapatsinski

The great variability of morphological structure across languages makes it uncontroversial that morphology is learned. Yet, morphology presents formidable learning challenges, on par with those of syntax. This article takes a constructionist perspective in assuming that morphological constructions are a major outcome of the learning process. However, the existence of morphological paradigms in many languages suggests that they are often not the only outcome. The article reviews domain-general approaches to achieving this outcome. The primary focus is on mechanisms proposed within the associative/connectionist tradition, which are compared with Bayesian approaches. The issues discussed include the role of prediction and prediction error in learning, generative vs. discriminative learning models, directionality of associations, the roles of (unexpectedly) present vs. absent stimuli, general-to-specific vs. specific-to-general learning, and the roles of type and token frequency. In the process, the notion of a construction itself is shown to be more complicated that it first appears.

Why defective paradigms are, and aren't, the result of competing morphological patterns

Article

Full-text available

Jan 2009

Andrea Sims

The extent to which gaps continue to be synchronically motivated by grammar competition after first appearing in a language remains an open question. In this paper I investigate the structure of genitive plural paradigmatic gaps in Modern Greek nouns. These gaps are interesting because they seem at first glance to be synchronically motivated by competing morphological stress patterns, based on the distribution of defective lexemes in the lexicon. However, as we will see, the results of a production and rating experiment indicate that despite the availability of this synchronic motivation, speakers treat genitive plural gaps as examples of lexicalized defectiveness, divorced from issues related to stress. Ultimately, the point will be that the distribution of gaps in Modern Greek is misleading regarding their synchronic structure.

The acquisition of adjunct control: grammar and processing

Thesis

Full-text available

Jan 2016

Juliana Gerard

This dissertation uses children’s acquisition of adjunct control as a case study to investigate grammatical and performance accounts of language acquisition. In previous research, children have consistently exhibited non-adultlike behavior for sentences with adjunct control. To explain children’s behavior, several different grammatical accounts have been proposed, but evidence for these accounts has been inconclusive. In this dissertation, I take two approaches to account for children’s errors. First, I spell out the predictions of previous grammatical accounts, and test these predictions after accounting for some methodological concerns that might have influenced children’s behavior in previous studies. While I reproduce the non-adultlike behavior observed in previous studies, the predictions of previous grammatical accounts are not borne out, suggesting that extragrammatical factors are needed to explain children’s behavior. Next, I consider the role of two different types of extragrammatical factors in predicting children’s non-adultlike behavior. With a new task designed to address the task demands in previous studies, children exhibit significantly higher accuracy than with previous tasks. This suggests that children’s behavior has been influenced by task- specific processing factors. In addition to the task, I also test the predictions of a similarity-based interference account, which links children’s errors to the same memory mechanisms involved in sentence processing difficulties observed in adults. These predictions are borne out, supporting a more continuous developmental trajectory as children’s processing mechanisms become more resistant to interference. Finally, I consider how children’s errors might influence their acquisition of adjunct control, given the distribution in the linguistic input. I discuss the results of a corpus analysis, including the possibility that adjunct control could be learned from the input. The kinds of information that could be useful to a learner become much more limited, however, after considering the processing limitations that would interfere with the representations available to the learner.

One Among Many: Anaphoric One and Its Relationship With Numeral One

Article

Feb 2016
COGNITIVE SCI

One anaphora (e.g., this is a good one) has been used as a key diagnostic in syntactic analyses of the English noun phrase, and “one-replacement” has also figured prominently in debates about the learnability of language. However, much of this work has been based on faulty premises, as a few perceptive researchers, including Ray Jackendoff, have made clear. Abandoning the view of anaphoric one (a-one) as a form of syntactic replacement allows us to take a fresh look at various uses of the word one. In the present work, we investigate its use as a cardinal number (1-one) in order to better understand its anaphoric use. Like all cardinal numbers, 1-one can only quantify an individuated entity and provides an indefinite reading by default. Owing to unique combinatoric properties, cardinal numbers defy consistent classification as determiners, quantifiers, adjectives, or nouns. Once the semantics and distribution of cardinal numbers, including 1-one, are appreciated, many properties of a-one follow with minimal stipulation. We claim that 1-one and a-one are distinct but very closely related lexemes. When 1-one appears without a noun (e.g., Take one), it is nearly indistinguishable from a-one (e.g., take one)—the only differences being interpretive (1-one foregrounds its cardinality while a-one does not) and prosodic (presence versus absence of primary accent). While we ultimately argue that a family of constructions is required to describe the full range of syntactic contexts in which one appears, the proposed network accounts for properties of a-one by allowing it to inherit most of its syntactic and interpretive constraints from its historical predecessor, 1-one.

A Few Words To Do With Multiword Expressions

Chapter

Full-text available

Nov 2019

This paper provides a compositional, lexically based analysis of the infinitival, verb-headed idiom exemplified by the sentences What does this have to do with me? and It may have had something to do with money. 1 Using conventions of Sign-Based Construction Grammar (SBCG, Sag 2012, Kay and Sag 2012, Michaelis 2012), we show that this multiword expression is revealingly represented as an intransitive verb word do, whose subject cannot be locally instantiated, which is necessarily in base form, and which invokes or is invoked by other idiomatically construed lexemes, including a special subject-raising lexeme have, which contributes a (potentially null instantiated) degree argument. We argue that idiomatic do, despite its restricted combinatorial potential, is compositionally interpreted, denoting an association between two entities, the first of which is expressed by the non-locally-instantiated subject and the second of which is expressed by the with-headed PP. We draw several lessons from this study. First, as is perhaps self-evident, multi-word expressions that are composed mostly of idiomatic words, such as have, to, and do in this idiom, may also require the presence of non-idiomatic words. An example is the presence in the to-do-1 We are grateful for the privilege of contributing to a volume honoring Lauri Karttunen. Given that our contribution is about the form and meaning of an idiom, we find it a welcome coincidence that after a brilliant career that began in semantics and pragmatics, and then moved into computational linguistics, Lauri has recently returned to linguistic meaning, dissecting with his accustomed mastery a highly idiomatic class of raising adjectives with protean implicative properties, e.g., You will be lucky to break even (Karttunen 2013). We would also like to express our gratitude for the very helpful comments of an anonymous referee. 2

Semi-supervised lexical acquisition for wide-coverage parsing

Thesis

Jul 2013

Emily Thomforde

State-of-the-art parsers suffer from incomplete lexicons, as evidenced by the fact that they all contain built-in methods for dealing with out-of-lexicon items at parse time. Since new labelled data is expensive to produce and no amount of it will conquer the long tail, we attempt to address this problem by leveraging the enormous amount of raw text available for free, and expanding the lexicon offline, with a semi-supervised word learner. We accomplish this with a method similar to self-training, where a fully trained parser is used to generate new parses with which the next generation of parser is trained. This thesis introduces Chart Inference (CI), a two-phase word-learning method with Combinatory Categorial Grammar (CCG), operating on the level of the partial parse as produced by a trained parser. CI uses the parsing model and lexicon to identify the CCG category type for one unknown word in a context of known words by inferring the type of the sentence using a model of end punctuation, then traversing the chart from the top down, filling in each empty cell as a function of its mother and its sister. We first specify the CI algorithm, and then compare it to two baseline wordlearning systems over a battery of learning tasks. CI is shown to outperform the baselines in every task, and to function in a number of applications, including grammar acquisition and domain adaptation. This method performs consistently better than self-training, and improves upon the standard POS-backoff strategy employed by the baseline StatCCG parser by adding new entries to the lexicon. The first learning task establishes lexical convergence over a toy corpus, showing that CI’s ability to accurately model a target lexicon is more robust to initial conditions than either of the baseline methods. We then introduce a novel natural language corpus based on children’s educational materials, which is fully annotated with CCG derivations. We use this corpus as a testbed to establish that CI is capable in principle of recovering the whole range of category types necessary for a wide-coverage lexicon. The complexity of the learning task is then increased, using the CCGbank corpus, a version of the Penn Treebank, and showing that CI improves as its initial seed corpus is increased. The next experiment uses CCGbank as the seed and attempts to recover missing question-type categories in the TREC question answering corpus. The final task extends the coverage of the CCGbank-trained parser by running CI over the raw text of the Gigaword corpus. Where appropriate, a fine-grained error analysis is also undertaken to supplement the quantitative evaluation of the parser performance with deeper reasoning as to the linguistic points of the lexicon and parsing model.

One among many: anaphoric one and its relationship to numeral one

Research

Full-text available

Sep 2015

One anaphora (e.g., this is a good one) has been used as a key diagnostic in syntactic analyses of the English noun phrase, and ‘one-replacement’ has also figured prominently in debates about the learnability of language. However, much of this work has been based on faulty premises, as a few perceptive researchers, including Ray Jackendoff, have made clear. Abandoning the view of anaphoric one (A-ONE) as a form of syntactic replacement allows us to take a fresh look at various uses of the word one. In the present work, we investigate its use as a cardinal number (1-ONE) in order to better understand its anaphoric use. Like all cardinal numbers, 1-ONE can only quantify an individuated entity and provides an indefinite reading by default. Owing to unique combinatoric properties, cardinal numbers defy consistent classification as determiners, quantifiers, adjectives or nouns. Once the semantics and distribution of cardinal numbers including 1-ONE are appreciated, many properties of A-ONE follow with minimal stipulation. We claim that 1-ONE and A-ONE are distinct but very closely related lexemes. When 1-ONE appears without a noun (e.g., Take ONE), it is nearly indistinguishable from A-ONE (e.g., TAKE one)—the only differences being interpretive (1-ONE foregrounds its cardinality while A-ONE does not) and prosodic (presence versus absence of primary accent). While we ultimately argue that a family of constructions is required to describe the full range of syntactic contexts in which one appears, the proposed network accounts for properties of A-ONE by allowing it to share (inherit) most of its syntactic and interpretive constraints from its historical predecessor, 1-ONE.

Pragmatic computation in language acquisition: evidence from disjunction and conjunction in negative context

Article

Aug 2008

Chunyuan Jing

Psychocomputational linguistics

Conference Paper

Full-text available

Jan 2008

William Sakas

Computational modeling of human language processes is a small but growing subfield of computational linguistics. This paper describes a course that makes use of recent research in psychocomputational modeling as a framework to introduce a number of mainstream computational linguistics concepts to an audience of linguistics, cognitive science and computer science doctoral students. The emphasis on what I take to be the largely interdisciplinary nature of computational linguistics is particularly germane for the computer science students. Since 2002 the course has been taught three times under the auspices of the MA/PhD program in Linguistics at The City University of New York's Graduate Center. A brief description of some of the students' experiences after having taken the course is also provided.

How Nature Meets Nurture: Universal Grammar and Statistical Learning

Article

Full-text available

Jan 2015

Evidence of children’s sensitivity to statistical features of their input in language acquisition is often used to argue against learning mechanisms driven by innate knowledge. At the same time, evidence of children acquiring knowledge that is richer than the input supports arguments in favor of such mechanisms. This tension can be resolved by separating the inferential and deductive components of the language learning mechanism. Universal Grammar provides representations that support deductions about sentences that fall outside of experience. In addition, these representations define the evidence that learners use to infer a particular grammar. The input is compared with the expected evidence to drive statistical inference. In support of this model, we review evidence of (a) children’s sensitivity to the environment, (b) mismatches between input and intake, (c) the need for learning mechanisms beyond innate representations, and (d) the deductive consequences of children’s acquired syntactic representations.

Computation of Context as a Cognitive Tool

Article

Causality and Imagination Causality and Imagination

Article

Full-text available

This review describes the relation between the imagination and causal cognition, particularly with relevance to recent developments in computational theories of human learning. According to the Bayesian model of human learning, our ability to imagine possible worlds and engage in counterfactual reasoning is closely tied to our ability to think causally. Indeed, the purpose and distinguishing feature of causal knowledge is that it allows one to generate counterfactual inferences. We begin with a brief description of the "probabilistic models" framework of causality, and review empirical work in that framework which shows that adults and children use causal knowledge to generate counterfactuals. We also outline a theoretical argument that suggests that the imagination is central to the process of causal understanding. We will then offer evidence that Bayesian learning implicates the imaginative process, and conclude with a discussion of how this computational method may be applied to the study of the imagination, more classically construed.

Modeling inflectional defectiveness as usage-based probability: Lessons from Modern Greek

Article

MINDING THE GAPS: INFLECTIONAL DEFECTIVENESS IN A PARADIGMATIC THEORY

Thesis

Sep 2006

Andrea Sims

Categories, Words and Rules in Language Acquisition

Thesis

Full-text available

Dec 2010

Jean-Rémy Hochmann

One Cue's Loss Is Another Cue's Gain—Learning Morphophonology Through Unlearning

Article

May 2024

Open access. Click on DOI to read. A word often expresses many different morphological functions. Which part of a word contributes to which part of the overall meaning is not always clear, which raises the question as to how such functions are learned. While linguistic studies tacitly assume the co‐occurrence of cues and outcomes to suffice in learning these functions (Baer‐Henney, Kügler, & van de Vijver, 2015; Baer‐Henney & van de Vijver, 2012), error‐driven learning suggests that contingency rather than contiguity is crucial (Nixon, 2020; Ramscar, Yarlett, Dye, Denny, & Thorpe, 2010). In error‐driven learning, cues gain association strength if they predict a certain outcome, and they lose strength if the outcome is absent. This reduction of association strength is called unlearning. So far, it is unclear if such unlearning has consequences for cue–outcome associations beyond the ones that get reduced. To test for such consequences of unlearning, we taught participants morphophonological patterns in an artificial language learning experiment. In one block, the cues to two morphological outcomes—plural and diminutive—co‐occurred within the same word forms. In another block, a single cue to only one of these two outcomes was presented in a different set of word forms. We wanted to find out, if participants unlearn this cue's association with the outcome that is not predicted by the cue alone, and if this allows the absent cue to be associated with the absent outcome. Our results show that if unlearning was possible, participants learned that the absent cue predicts the absent outcome better than if no unlearning was possible. This effect was stronger if the unlearned cue was more salient. This shows that unlearning takes place even if no alternative cues to an absent outcome are provided, which highlights that learners take both positive and negative evidence into account—as predicted by domain general error‐driven learning.

The fundamental importance of method to theory

Article

Nov 2022

Many domains of inquiry in psychology are concerned with rich and complex phenomena. At the same time, the field of psychology is grappling with how to improve research practices to address concerns with the scientific enterprise. In this Perspective, we argue that both of these challenges can be addressed by adopting a principle of methodological variety. According to this principle, developing a variety of methodological tools should be regarded as a scientific goal in itself, one that is critical for advancing scientific theory. To illustrate, we show how the study of language and communication requires varied methodologies, and that theory development proceeds, in part, by integrating disparate tools and designs. We argue that the importance of methodological variation and innovation runs deep, travelling alongside theory development to the core of the scientific enterprise. Finally, we highlight ongoing research agendas that might help to specify, quantify and model methodological variety and its implications. Philosophers of science have identified epistemological criteria for evaluating the promise of a scientific theory. In this Perspective, Dale et al. propose that a principle of methodological variety should be one of these criteria, and argue that psychologists should actively cultivate methodological variety to advance theory.

References

Article

May 2022

Morphological structures interact dynamically with lexical processing and storage, with the parameters of morphological typology being partly dependent on cognitive pathways for processing, storage and generalization of word structure, and vice versa. Bringing together a team of well-known scholars, this book examines the relationship between linguistic cognition and the morphological diversity found in the world's languages. It includes research from across linguistic and cognitive science sub-disciplines that looks at the nature of typological diversity and its relationship to cognition, touching on concepts such as complexity, interconnectedness within systems, and emergent organization. Chapters employ experimental, computational, corpus-based and theoretical methods to examine specific morphological phenomena, and an overview chapter provides a synthesis of major research trends, contextualizing work from different methodological and philosophical perspectives. Offering a novel perspective on how cognition contributes to our understanding of word structure, it is essential reading for psycholinguists, theoreticians, typologists, computational modelers and cognitive scientists.

Understanding the effects of negative (and positive) pointwise mutual information on word vectors

Article

Feb 2022
J EXP THEOR ARTIF IN

Despite the recent popularity of contextual word embeddings, static word embeddings still dominate lexical semantic tasks, making their study of continued relevance. A widely adopted family of such static word embeddings is derived by explicitly factorizing the Pointwise Mutual Information (PMI) weighting of the cooccurrence matrix. As unobserved cooccurrences lead PMI to negative infinity, a common workaround is to clip negative PMI at 0. However, it is unclear what information is lost by collapsing negative PMI values to 0. To answer this question, we isolate and study the effects of negative (and positive) PMI on the semantics and geometry of models adopting factorization of different PMI matrices. Word and sentence-level evaluations show that only accounting for positive PMI in the factorization strongly captures both semantics and syntax, whereas using only negative PMI captures little of semantics but a surprising amount of syntactic information. Results also reveal that incorporating negative PMI induces stronger rank invariance of vector norms and direction, as well as improved rare word representations.

Learning Through Processing: Toward an Integrated Approach to Early Word Learning

Article

Jan 2022

Children's linguistic knowledge and the learning mechanisms by which they acquire it grow substantially in infancy and toddlerhood, yet theories of word learning largely fail to incorporate these shifts. Moreover, researchers’ often-siloed focus on either familiar word recognition or novel word learning limits the critical consideration of how these two relate. As a step toward a mechanistic theory of language acquisition, we present a framework of “learning through processing” and relate it to the prevailing methods used to assess children's early knowledge of words. Incorporating recent empirical work, we posit a specific, testable timeline of qualitative changes in the learning process in this interval. We conclude with several challenges and avenues for building a comprehensive theory of early word learning: better characterization of the input, reconciling results across approaches, and treating lexical knowledge in the nascent grammar with sufficient sophistication to ensure generalizability across languages and development. Expected final online publication date for the Annual Review of Linguistics, Volume 8 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Poverty of the Stimulus Without Tears

Article

Full-text available

Oct 2021

Lisa Pearl

Poverty of the stimulus has been at the heart of ferocious and tear-filled debates at the nexus of psychology, linguistics, and philosophy for decades. This review is intended as a guide for readers without a formal linguistics or philosophy background, focusing on what poverty of the stimulus is and how it’s been interpreted, which is traditionally where the tears have come in. I discuss poverty of the stimulus from the perspective of language development, highlighting how poverty of the stimulus relates to expectations about learning and the data available to learn from. I describe common interpretations of what poverty of the stimulus means when it occurs, and approaches for determining when poverty of the stimulus is in fact occurring. I close with illustrative examples of poverty of the stimulus in the domains of syntax, lexical semantics, and phonology, and discuss the value of identifying instances of poverty of the stimulus when it comes to understanding language development.

Adjunct control and the poverty of the stimulus: Availability vs. evidence

Chapter

Full-text available

Aug 2021

Juliana Gerard

Subject control in non-finite adjuncts is observed across languages (as in ‘John called Mary after drawing a picture’). Research on the acquisition of adjunct control has generally focused on the relevant grammatical components and when they are acquired. This paper considers these components in the context of the linguistic input to ask how control in adjuncts is acquired. Although adjunct control is available in the input, the instances themselves do not provide evidence for abstract syntactic relations. Implications are considered for linguistic dependencies and the evidence in the input.

Tuning in to non-adjacencies: Exposure to learnable patterns supports discovering otherwise difficult structures

Article

Sep 2020
COGNITION

Non-adjacent dependencies are ubiquitous in language, but difficult to learn in artificial language experiments in the lab. Previous research suggests that non-adjacent dependencies are more learnable given structural support in the input – for instance, in the presence of high variability between dependent items. However, not all non-adjacent dependencies occur in supportive contexts. How are such regularities learned? One possibility is that learning one set of non-adjacent dependencies can highlight similar structures in subsequent input, facilitating the acquisition of new non-adjacent dependencies that are otherwise difficult to learn. In three experiments, we show that prior exposure to learnable non-adjacent dependencies - i.e., dependencies presented in a learning context that has been shown to facilitate discovery - improves learning of novel non-adjacent regularities that are typically not detected. These findings demonstrate how the discovery of complex linguistic structures can build on past learning in supportive contexts.

Teaching the unlearnable: a training study of complex yes/no questions

Article

Full-text available

Apr 2020

A central question in language acquisition is how children master sentence types that they have seldom, if ever, heard. Here we report the findings of a pre-registered, randomised, single-blind intervention study designed to test the prediction that, for one such sentence type, complex questions (e.g., Is the crocodile who’s hot eating? ), children could combine schemas learned, on the basis of the input, for complex noun phrases ( the [THING] who’s [PROPERTY] ) and simple questions ( Is [THING] [ACTION]ing? ) to yield a complex-question schema ( Is [the [THING] who’s [PROPERTY]] ACTIONing? ). Children aged 4;2 to 6;8 ( M = 5;6, SD = 7.7 months) were trained on simple questions (e.g., Is the bird cleaning? ) and either (Experimental group, N = 61) complex noun phrases (e.g., the bird who’s sad ) or (Control group, N = 61) matched simple noun phrases (e.g., the sad bird ). In general, the two groups did not differ on their ability to produce novel complex questions at test. However, the Experimental group did show (a) some evidence of generalising a particular complex NP schema ( the [THING] who’s [PROPERTY] as opposed to the [THING] that’s [PROPERTY] ) from training to test, (b) a lower rate of auxiliary-doubling errors (e.g., *Is the crocodile who’s hot is eating? ), and (c) a greater ability to produce complex questions on the first test trial. We end by suggesting some different methods – specifically artificial language learning and syntactic priming – that could potentially be used to better test the present account.

Cognitive Science: An Introduction to the Science of the Mind

Book

Nov 2019

Jose Luis Bermudez

A Complex-Adaptive-Systems Approach to the Evolution of Language and the Brain

Chapter

Mar 2017

P. Thomas Schoenemann

The question of complexity, as in what makes one language more 'complex' than another, is a long-established topic of debate amongst linguists. Recently, this issue has been complemented with the view that languages are complex adaptive systems, in which emergence and self-organization play major roles. However, few students of the phenomenon have gone beyond the basic assessment of the number of units and rules in a language (what has been characterized as 'bit complexity') or shown some familiarity with the science of complexity. This book reveals how much can be learned by overcoming these limitations, especially by adopting developmental and evolutionary perspectives. The contributors include specialists of language acquisition, evolution and ecology, grammaticization, phonology, and modeling, all of whom approach languages as dynamical, emergent, and adaptive complex systems.

Complexity and language contact: A socio-cognitive framework [Complejidad y contacto lingüístico: un marco sociocognitivo] [Complexitat i contacte lingüístic: un marc sociocognitiu]

Chapter

Full-text available

Jul 2017

Albert Bastardas-Boada

Throughout most of the 20th century, analytical and reductionist approaches have dominated in biological, social, and humanistic sciences, including linguistics and communication. We generally believed we could account for fundamental phenomena in invoking basic elemental units. Although the amount of knowledge generated was certainly impressive, we have also seen limitations of this approach. Discovering the sound formants of human languages, for example, has allowed us to know vital aspects of the ‘material’ plane of verbal codes, but it tells us little about significant aspects of their social functions. I firmly believe, therefore, that alongside a linguistics that looks ‘inward’ there should also be a linguistics that looks ‘outward’, or one even that is constructed ‘from the outside’, a linguistics that I refer to elsewhere as ‘holistic’ though it could be identified by a different name. My current vision is to promote simultaneously the perspective that goes from the part to the whole and that which goes from the whole to the parts, i.e., both from the top down and from the bottom up. This goal is shared with other disciplines which recognize that many phenomena related to life are interwoven, self-organising, emergent and processual. Thus, we need to re-examine how we have conceived of reality, both the way we have looked at it and the images we have used to talk about it. Several approaches now grouped under the label of complexity have been elaborated towards this objective of finding new concepts and ways of thinking that better fit the complex organisation of facts and events. https://books.google.es/books?id=b1pEDgAAQBAJ&pg=PA218&lpg=PA218&dq=complexity+and+language+a+sociocognitive&source=bl&ots=5GyrP3_8Gs&sig=wKZxQLPJ01MHGubSSl63J82Y3dU&hl=ca&sa=X&ved=2ahUKEwjCwaW12dPbAhUGwxQKHQ1uDJ0Q6AEwB3oECAcQAQ#v=onepage&q=complexity%20and%20language%20a%20sociocognitive&f=false

Natural morphosyntax: The case for implicit surface learning and processing

Article

Full-text available

Dec 2015

Jean Adolphe Rondal

Fluent speakers do not appear to have conscious knowledge of the linguistic categories and declarative rules that linguists use to describe grammar and that most psycholinguists have adopted for explaining language functioning. The implication derived in this paper is that these categories and rules are deprived of psychological reality. It is proposed that a psychologically real morphosyntax is concerned with sentence surface. The pragmatic framework and the semantic relational matrix at the onset of sentence production are converted directly into syntagmatic patterns, flexibly distributed along the sentence line. These patterns are reflected in probabilistic associations between words and sequences of words. Natural morphosyntax is learned incidentally through implicit procedural learning. Children extract frequent syntagmatic patterns from adapted adult input. The resulting knowledge is stored in procedural memory. The cortico-striatal -cerebellar system of the brain has the computational power necessary to deal with sentence sequential patterning and associative regularities.

The Island (in)sensitivity of sluicing and sprouting

Chapter

Full-text available

Oct 2013

Ellipsis constructions present many challenges to incremental sentence processing. One challenge is that most partial sentences that are compatible with ellipsis continuations are also compatible with non-ellipsis continuations. Example (1) is a case in point. This partial sentence is compatible with the ellipsis of the material following the wh-phrase in an embedded interrogative as in (1a) (a construction known as sluicing in the syntax literature), and various non-ellipsis continuations such as those in (1b) and (1c). John was writing something, but I don't know what … a. Ellipsis b…he was writing c… motivates him to write so much. Furthermore, there does not seem to be an obvious cue that can tell the parser whether ellipsis follows or not. In other words, environments where ellipsis is typically found show structural ambiguity. Therefore, there is always a danger that inducing ellipsis may turn out to be an incorrect analysis, and it may require structural reanalysis. Such reanalysis is costly and is avoided by the parser whenever possible (Schneider and Phillips 2001; Sturt et al. 2001). This in turn suggests that it is always safer for the parser to choose a non-ellipsis structure, since it can rely on bottom-up information in non-ellipsis continuations. If this is the case, the parser should choose ellipsis if and only if bottom-up information confirms that ellipsis is there. That is, the parser should not induce or infer ellipsis incrementally.

Distinguishing knowledge from belief in understanding the logic of the Poverty of Stimulus Argument

Article

Full-text available

Aug 2011

Maximiliano Guimarães

Além da tese de que a gramática das línguas naturais inclui um nível transformacional, o que distingue o programa Chomskyano de investigação em Teoria da Gramática das outras abordagens é a tese de que o conhecimento gramatical internalizado por todo o ser humano é parcialmente inato (i.e. parcialmente dado a priori por um sistema de viéses cognitivos tarefa-específicos da Gramática Universal), e não um subproduto de mecanismos auto-organizáveis de ‘inteligência geral’. Esta tese científica pode, em princípio, estar certa ou errada, e só pode ser questionada levando-se em conta a sua cobertura empírica e a lógica dos seus argumentos. No cerne desta questão está o Argumento de Pobreza de Estímulo (APS), cuja lógica tem sido alvo de inúmeros mal-entendidos por parte dos anti-inatistas, a exemplo de Geurts (2000), que deixa de reconhecer as distinções entre ‘conhecimento’ e ‘crença’, e entre cognição ‘consciente’ e ‘não-consciente’, as quais são cruciais para a compreensão da lógica do APS. O objetivo deste artigo é desfazer esse mal-entendido.

Sampling assumptions in language learning 1 Running head: SAMPLING ASSUMPTIONS IN LANGUAGE LEARNING Sampling assumptions affect use of indirect negative evidence in language learning

Article

Full-text available

Jun 2016
PLOS ONE

A classic debate in cognitive science revolves around understanding how children learn complex linguistic patterns, such as restrictions on verb alternations and contractions, without negative evidence. Recently, probabilistic models of language learning have been applied to this problem, framing it as a statistical inference from a random sample of sentences. These probabilistic models predict that learners should be sensitive to the way in which sentences are sampled. There are two main types of sampling assumptions that can operate in language learning: strong and weak sampling. Strong sampling, as assumed by probabilistic models, assumes the learning input is drawn from a distribution of grammatical samples from the underlying language and aims to learn this distribution. Thus, under strong sampling, the absence of a sentence construction from the input provides evidence that it has low or zero probability of grammaticality. Weak sampling does not make assumptions about the distribution from which the input is drawn, and thus the absence of a construction from the input as not used as evidence of its ungrammaticality. We demonstrate in a series of artificial language learning experiments that adults can produce behavior consistent with both sets of sampling assumptions, depending on how the learning problem is presented. These results suggest that people use information about the way in which linguistic input is sampled to guide their learning.

Bayesian Probabilistic Model and its Applications in Pragmatics

Chapter

Full-text available

Jan 2011

Anaphoric one and its implications

Article

Dec 2013
LANGUAGE

The nominal anaphoric element one has figured prominently in discussions of linguistic nativism because of an important argument advanced by C. L. Baker (1978). His argument has been frequently cited within the cognitive and linguistic sciences, and has provided the topic for a chain of experimental and computational psycholinguistics papers. Baker’s crucial grammaticality facts, though much repeated in the literature, have not been critically investigated. A corpus investigation shows that his claims are not true: one does not take only phrasal antecedents, but can also take nouns on their own, including semantically relational nouns, and can take various of-PP dependents of its own. We give a semantic analysis of anaphoric one that allows it to exhibit this kind of freedom, and we exhibit frequency evidence that goes a long way toward explaining why linguists have been inclined to regard phrases like the one of physics or three ones as ungrammatical when in fact (as corpus evidence shows) they are merely dispreferred relative to available grammatical alternatives. The main implication for the acquisition literature is that one of the most celebrated arguments from poverty of the stimulus is shown to be without force.*

Semantics of childrens language

Article

Full-text available

Feb 1974

Patrick Suppes

Contends that too much emphasis has been placed on grammar or syntax and too little on the semantics of children's language. A full-scale analysis of children's speech based on the logical tradition of model-theoretic semantics is described. To illustrate this approach, examples of the child's use of the definite article, adjectives, quantifiers, and propositional attitudes are presented, and conceptual and technical tools for studying these aspects of speech are described. The problems of paraphrase, context, processes, and theory verification that arise in the semantical analysis of children's speech are considered. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge

Article

Full-text available

Apr 1997

How do people know as much as they do with as little information as they get? The problem takes many forms; learning vocabulary from text is an especially dramatic and convenient case for research. A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena. By inducing global knowledge indirectly from local co-occurrence data in a large body of representative text, LSA acquired knowledge about the full vocabulary of English at a comparable rate to schoolchildren. LSA uses no prior linguistic or perceptual similarity knowledge; it is based solely on a general mathematical learning method that achieves powerful inductive effects by extracting the right number of dimensions (e.g., 300) to represent objects and contexts. Relations to other theories, phenomena, and problems are sketched.

Learning and Development in Neural Networks: the Importance of Starting Small

Article

Full-text available

Aug 1993
COGNITION

Jeffrey Elman

It is a striking fact that in humans the greatest learning occurs precisely at that point in time--childhood--when the most dramatic maturational changes also occur. This report describes possible synergistic interactions between maturational change and the ability to learn a complex domain (language), as investigated in connectionist networks. The networks are trained to process complex sentences involving relative clauses, number agreement, and several types of verb argument structure. Training fails in the case of networks which are fully formed and 'adultlike' in their capacity. Training succeeds only when networks begin with limited working memory and gradually 'mature' to the adult state. This result suggests that rather than being a limitation, developmental restrictions on resources may constitute a necessary prerequisite for mastering certain complex domains. Specifically, successful learning may depend on starting small.

Distributional regularity and phonotactic constraints are useful for segmentation

Article

Full-text available

Oct 1996
COGNITION

In order to acquire a lexicon, young children must segment speech into words, even though most words are unfamiliar to them. This is a non-trivial task because speech lacks any acoustic analog of the blank spaces between printed words. Two sources of information that might be useful for this task are distributional regularity and phonotactic constraints. Informally, distributional regularity refers to the intuition that sound sequences that occur frequently and in a variety of contexts are better candidates for the lexicon than those that occur rarely or in few contexts. We express that intuition formally by a class of functions called DR functions. We then put forth three hypotheses: First, that children segment using DR functions. Second, that they exploit phonotactic constraints on the possible pronunciations of words in their language. Specifically, they exploit both the requirement that every word must have a vowel and the constraints that languages impose on word-initial and word-final consonant clusters. Third, that children learn which word-boundary clusters are permitted in their language by assuming that all permissible word-boundary clusters will eventually occur at utterance boundaries. Using computational simulation, we investigate the effectiveness of these strategies for segmenting broad phonetic transcripts of child-directed English. The results show that DR functions and phonotactic constraints can be used to significantly improve segmentation. Further, the contributions of DR functions and phonotactic constraints are largely independent, so using both yields better segmentation than using either one alone. Finally, learning the permissible word-boundary clusters from utterance boundaries does not degrade segmentation performance.

Language Acquisition and Use: Learning and Applying Probabilistic Constraints

Article

Full-text available

Apr 1997

Mark S. Seidenberg

What kinds of knowledge underlie the use of language and how is this knowledge acquired? Linguists equate knowing a language with knowing a grammar. Classic "poverty of the stimulus" arguments suggest that grammar identification is an intractable inductive problem and that acquisition is possible only because children possess innate knowledge of grammatical structure. An alternative view is emerging from studies of statistical and probabilistic aspects of language, connectionist models, and the learning capacities of infants. This approach emphasizes continuity between how language is acquired and how it is used. It retains the idea that innate capacities constrain language learning, but calls into question whether they include knowledge of grammatical structure.

Word learning as Bayesian inference

Article

Full-text available

Dec 2002

We apply a computational theory of concept learning based on Bayesian inference (Tenenbaum, 1999) to the problem of learning words from examples. The theory provides a framework for understanding how people can generalize meaningfully from just one or a few positive examples of a novel word, without assuming that words are mutually exclusive or map only onto basic-level categories.

Competition, attention, and young children's lexical processing

Article

Jan 1999

Bill Merriman

The Human Semantic Potential: Spatial Language and Constrained Connectivism

Book

Jan 1996

T Regier

The Human Semantic Potential: Spatial Language and Constrained Connectionism

Article

Mar 1999

1. My thanks go to Joe Grimes for comments made on a previous draft of this review, to Ron Langacker for some related discussion, and to Kenneth Holmqvist for a complimentary copy of his work. 2. Strictly speaking, Holmqvist did not implement Langacker's model but rather based his implementation on that model (Holmqvist 1993:3). Another related work is that of George Dunbar, who has drawn extensively on Langacker's approach for his work on the cognitive lexicon (Dunbar 1991).

Toward a Connectionist Model of Recursion in Human Linguistic Performance

Article

Apr 1999

Naturally occurring speech contains only a limited amount of complex recursive structure, and this is reflected in the empirically documented difficulties that people experience when processing such structures. We present a connectionist model of human performance in processing recursive language structures. The model is trained on simple artificial languages. We find that the qualitative performance profile of the model matches human behavior, both on the relative difficulty of center-embedding and cross-dependency, and between the processing of these complex recursive structures and right-branching recursive constructions. We analyze how these differences in performance are reflected in the internal representations of the model by performing discriminant analyses on these representations both before and after training. Furthermore, we show how a network trained to process recursive structures can also generate such structures in a probabilistic fashion. This work suggests a novel explanation of people’s limited recursive performance, without assuming the existence of a mentally represented competence grammar allowing unbounded recursion.

Language acquisition

Conference Paper

Jan 1993

Steven Pinker

Language acquisition

Conference Paper

Jan 1990

Steven Pinker

The semantics of children''s language

Article

Jan 1973
AM PSYCHOL

P Suppes

Learning from Positive-Only Examples: The Subset Principle and Three Case Studies

Article

Jan 1986

Robert Berwick

Empirical Assessment of Stimulus Poverty Arguments

Article

Jan 2002

This article examines a type of argument for linguistic nativism that takes the following form: (i) a fact about some natural language is exhibited that al- legedly could not be learned from experience without access to a certain kind of (positive) data; (ii) it is claimed that data of the type in question are not found in normal linguistic experience; hence (iii) it is concluded that people cannot be learning the language from mere exposure to language use. We ana- lyze the components of this sort of argument carefully, and examine four exem- plars, none of which hold up. We conclude that linguists have some additional work to do if they wish to sustain their claims about having provided support for linguistic nativism, and we offer some reasons for thinking that the relevant kind of future work on this issue is likely to further undermine the linguistic nativist position.

Distributional regularity and phonotactics are useful for segmentation

Article

Lectures on Government and Binding: The Pisa Lectures

Article

Jan 1981

Noam Chomsky

Introduction to Generative-Transformational Syntax

Article

Jan 1978

Carl L. Baker

Do Young Children Have Adult Syntactic Competence? Cognition

Article

Jan 2000

Michael Tomasello

A Philosophical Essay on Probabilities

Article

Jan 1825

P. S. Laplace

Emergent constraints on word-learning: A computational review

Article

T. Regier

Learnability and Cognition: The Acquisition of Argument Structure

Book

Jan 1989

Steven Pinker

Discovering Syntactic Deep Structure via Bayesian Statistics

Article

May 2002
COGNITIVE SCI

Jason Eisner

In the Bayesian framework, a language learner should seek a grammar that explains observed data well and is also a priori probable. This paper proposes such a measure of prior probability. Indeed it develops a full statistical framework for lexicalized syntax. The learner’s job is to discover the system of probabilistic transformations (often called lexical redundancy rules) that underlies the patterns of regular and irregular syntactic constructions listed in the lexicon. Specifically, the learner discovers what transformations apply in the language, how often they apply, and in what contexts. It considers simpler systems of transformations to be more probable a priori. Experiments show that the learned transformations are more effective than previous statistical models at predicting the probabilities of lexical entries, especially those for which the learner had no direct evidence.

A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge

Article

Apr 1997
PSYCHOL REV

Toward a Universal Law of Generalization for Psychological Science

Article

Oct 1987

Roger N. Shepard

A psychological space is established for any set of stimuli by determining metric distances between the stimuli such that the probability that a response learned to any stimulus will generalize to any other is an invariant monotonic function of the distance between them. To a good approximation, this probability of generalization (i) decays exponentially with this distance, and (ii) does so in accordance with one of two metrics, depending on the relation between the dimensions along which the stimuli vary. These empirical regularities are mathematically derivable from universal principles of natural kinds and probabilistic geometry that may, through evolutionary internalization, tend to govern the behaviors of all sentient organisms.

A language learning model for finite parameter spaces

Article

Oct 1996
COGNITION

This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive examples on average ("sample complexity") it will take for a learner to correctly identify a target language with high probability. We show how sample complexity varies with input distributions and learning regimes. In particular we find that the average time to converge under reasonable language input distributions for a simple three-parameter system first described by Gibson and Wexler (1994) is psychologically plausible, in the range of 100-150 positive examples. We further find that a simple random step algorithm-that is, simply jumping from one language hypothesis to another rather than changing one parameter at a time-works faster and always converges to the right target language, in contrast to the single-step, local parameter setting method advocated in some recent work.

Language Acquisition in the Absence of Explicit Negative Evidence: How Important is Starting Small?

Article

Sep 1999
COGNITION

It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proof that, under assumptions of virtually arbitrary positive presentation, most interesting classes of languages are not learnable. However, Gold's results do not apply under the rather common assumption that language presentation may be modeled as a stochastic process. Indeed, Elman (Elman, J.L., 1993. Learning and development in neural networks: the importance of starting small. Cognition 48, 71-99) demonstrated that a simple recurrent connectionist network could learn an artificial grammar with some of the complexities of English, including embedded clauses, based on performing a word prediction task within a stochastic environment. However, the network was successful only when either embedded sentences were initially withheld and only later introduced gradually, or when the network itself was given initially limited memory which only gradually improved. This finding has been taken as support for Newport's 'less is more' proposal, that child language acquisition may be aided rather than hindered by limited cognitive resources. The current article reports on connectionist simulations which indicate, to the contrary, that starting with simplified inputs or limited memory is not necessary in training recurrent networks to learn pseudonatural languages; in fact, such restrictions hinder acquisition as the languages are made more English-like by the introduction of semantic as well as syntactic constraints. We suggest that, under a statistical model of the language environment, Gold's theorem and the possible lack of explicit negative evidence do not implicate innate, linguistic-specific mechanisms. Furthermore, our simulations indicate that special teaching methods or maturational constraints may be unnecessary in learning the structure of natural language.

Do Young Children Have Adult Syntactic Competence?

Article

Apr 2000
COGNITION

Michael Tomasello

Many developmental psycholinguists assume that young children have adult syntactic competence, this assumption being operationalized in the use of adult-like grammars to describe young children's language. This "continuity assumption" has never had strong empirical support, but recently a number of new findings have emerged - both from systematic analyses of children's spontaneous speech and from controlled experiments - that contradict it directly. In general, the key finding is that most of children's early linguistic competence is item based, and therefore their language development proceeds in a piecemeal fashion with virtually no evidence of any system-wide syntactic categories, schemas, or parameters. For a variety of reasons, these findings are not easily explained in terms of the development of children's skills of linguistic performance, pragmatics, or other "external" factors. The framework of an alternative, usage-based theory of child language acquisition - relying explicitly on new models from Cognitive-Functional Linguistics - is presented.

Generalization, similarity, and Bayesian inference

Article

Sep 2001

Shepard has argued that a universal law should govern generalization across different domains of perception and cognition, as well as across organisms from different species or even different planets. Starting with some basic assumptions about natural kinds, he derived an exponential decay function as the form of the universal generalization gradient, which accords strikingly well with a wide range of empirical data. However, his original formulation applied only to the ideal case of generalization from a single encountered stimulus to a single novel stimulus, and for stimuli that can be represented as points in a continuous metric psychological space. Here we recast Shepard's theory in a more general Bayesian framework and show how this naturally extends his approach to the more realistic situation of generalizing from multiple consequential stimuli with arbitrary representational structure. Our framework also subsumes a version of Tversky's set-theoretic model of similarity, which is conventionally thought of as the primary alternative to Shepard's continuous metric space model of similarity and generalization. This unification allows us not only to draw deep parallels between the set-theoretic and spatial approaches, but also to significantly advance the explanatory power of set-theoretic models.

Emergent constraints on word-learning: A computational perspective

Article

Jul 2003

Terry Regier

In learning the meanings of words, children are guided by a set of constraints that give privilege to some potential meanings over others. These word-learning constraints are sometimes viewed as part of a specifically linguistic endowment. However, several recent computational models suggest concretely how word-learning - constraints included - might emerge from more general aspects of cognition, such as associative learning, attention and rational inference. This article reviews these models, highlighting the link between general cognitive forces and the word-learning they subserve. Ultimately, these cognitive forces might leave their mark not just on language learning, but also on language itself: in constraining the space of possible meanings, they place limits on cross-linguistic semantic variation.

What infants know about syntax but couldn't have learned: Experimental evidence for syntactic structure at 18 months

Article

Nov 2003
COGNITION

Generative linguistic theory stands on the hypothesis that grammar cannot be acquired solely on the basis of an analysis of the input, but depends, in addition, on innate structure within the learner to guide the process of acquisition. This hypothesis derives from a logical argument, however, and its consequences have never been examined experimentally with infant learners. Challenges to this hypothesis, claiming that an analysis of the input is indeed sufficient to explain grammatical acquisition, have recently gained attention. We demonstrate with novel experimentation the insufficiency of this countervailing view. Focusing on the syntactic structures required to determine the antecedent for the pronoun one, we demonstrate that the input to children does not contain sufficient information to support unaided learning. Nonetheless, we show that 18-month-old infants do have command of the syntax of one. Because this syntactic knowledge could not have been gleaned exclusively from the input, infants' mastery of this aspect of syntax constitutes evidence for the contribution of innate structure within the learner in acquiring a grammar.

Bayesian Learning at the Syntax-Semantics Interface

Article

Jul 2002

Sourabh Niyogi

Given a small number of examples of sceneutterance pairs of a novel verb, language learners can learn its syntactic and semantic features. Syntactic and semantic bootstrapping hypotheses both rely on cross-situational observation to hone in on the ambiguity present in a single observation. In this paper, we cast the distributional evidence from scenes and syntax in a unified Bayesian probablistic framework. Unlike previous approaches to modeling lexical acquisition, our framework uniquely: (1) models learning from only a small number of sceneutterance pairs (2) utilizes and integrates both syntax and semantic evidence, thus reconciling the apparent tension between syntactic and semantic bootststrapping approaches (3) robustly handles noise (4) makes prior and acquired knowledge distinctions explicit, through specification of the hypothesis space, prior and likelihood probability distributions.

Introduction Explanation in linguistics: The logical problem of language acquisition

Jan 1981

N Hornstein
D Lightfoot

Hornstein, N., & Lightfoot, D. (1981). Introduction. In N. Hornstein, & D. Lightfoot (Eds.), Explanation in linguistics: The logical problem of language acquisition. London: Longman.

Philosophical essay on probabilities. Translated by A. Dale (1995) from the fifth French edition

Jan 1825

P.-S Laplace

Laplace, P.-S (1825). Philosophical essay on probabilities. Translated by A. Dale (1995) from the fifth French edition. New York: Springer.

Language acquisition Language: An invitation to cognitive science Empirical assessment of stimulus poverty arguments

Jan 1995
135-182

S Pinker
G Pullum
B Scholz

Pinker, S. (1995). Language acquisition. In L. Gleitman, & M. Liberman (Eds.), Language: An invitation to cognitive science (2nd ed., Vol. 1, pp. 135–182). Cambridge, MA: MIT Press. Pullum, G., & Scholz, B. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19, 9–50.

Language acquisition Language: An invitation to cognitive science

Jan 1995
135-182

S Pinker

Pinker, S. (1995). Language acquisition. In L. Gleitman, & M. Liberman (Eds.), Language: An invitation to cognitive science (2nd ed., Vol. 1, pp. 135–182). Cambridge, MA: MIT Press.

Explanation in linguistics: The logical problem of language acquisition

Jan 1981

N Hornstein
D Lightfoot

Hornstein, N., & Lightfoot, D. (1981). Introduction. In N. Hornstein, & D. Lightfoot (Eds.), Explanation in linguistics: The logical problem of language acquisition. London: Longman.

Word learning as Bayesian inference

Jan 2000
517-522

J Tenenbaum
F Xu

Tenenbaum, J., & Xu, F. (2000). Word learning as Bayesian inference. In L. Gleitman, & A. Joshi (Eds.), Proceedings of the 22nd Annual Conference of the Cognitive Science Society (pp. 517-522). Mahwah, NJ: Lawrence Erlbaum.

Learning the unlearnable: The role of missing evidence

Abstract

No full-text available

Recommended publications

Why Regulate Prices? Some Notes on the Price Cap Methods

A framework for linguistic modeling

Is There a Free Lunch in Inference?

Bayesian Argumentation and the Value of Logical Validity