ArticlePDF AvailableLiterature Review

Structures, Not Strings: Linguistics as Part of the Cognitive Sciences


Abstract and Figures

There are many questions one can ask about human language: its distinctive properties, neural representation, characteristic uses including use in communicative contexts, variation, growth in the individual, and origin. Every such inquiry is guided by some concept of what 'language' is. Sharpening the core question - what is language? - and paying close attention to the basic property of the language faculty and its biological foundations makes it clear how linguistics is firmly positioned within the cognitive sciences. Here we will show how recent developments in generative grammar, taking language as a computational cognitive mechanism seriously, allow us to address issues left unexplained in the increasingly popular surface-oriented approaches to language.
Content may be subject to copyright.
Feature Review
Structures, Not Strings:
Linguistics as Part of the
Cognitive Sciences
Martin B.H. Everaert,
Marinus A.C. Huybregts,
Noam Chomsky,
Robert C. Berwick,
Johan J. Bolhuis
There are many questions one can ask about human language: its distinctive
properties, neural representation, characteristic uses including use in commu-
nicative contexts, variation, growth in the individual, and origin. Every such
inquiry is guided by some concept of what languageis. Sharpening the core
question what is language? and paying close attention to the basic property
of the language faculty and its biological foundations makes it clear how
linguistics is rmly positioned within the cognitive sciences. Here we will show
how recent developments in generative grammar, taking language as a compu-
tational cognitive mechanism seriously, allow us to address issues left unex-
plained in the increasingly popular surface-oriented approaches to language.
Grammar from a Cognitive Science Perspective: Generative Grammar
Language is a structured and accessible product of the human mind. We choose to study
language for this reason, as one possible way to gain understanding about the human mind. This
particular choice language as part of the mind, so cognitive science arose as the result of the
seminal discoveries by the mid-20th century regarding the mathematics of computation, which
permitted a shift from the more conventional perspective of language as a cultural/social object
of study. This new perspective regarding computation [14] enabled for the rst time a clear
formulation of what we should recognize as the most basic property of language: providing a
discretely innite array of hierarchically structured expressions that receive systematic inter-
pretations at two interfaces, roughly, thought and sound [58]. We take externalization (see
Glossary) at the sensorymotor level (for instance, speech) as an ancillary process, reecting
properties of the sensory modality, sign or speech. Therefore, communication, a particular use of
externalized language, cannot be the primary function of language, a dening property of the
language faculty, suggesting that a traditional conception of language as an instrument of
thought might be more appropriate. At a minimum then, each language incorporates via its
syntax computational procedures (Box 1) satisfying this basic property. As a result, every theory
of a particular language constitutes by denition what is called a generative grammar:a
description of the tacit knowledge of the speakerhearer that underlies their actual production
and perception (understanding) of speech. We take the property of structure dependence of
grammatical rules to be central. We will illustrate the puzzling feature that the computational rules
of language rely on the much more complex property of hierarchical structure rather than the
much simpler surface property of linear order.
Viewing sentences as just linear word strings has long held a prominent place in areas of natural
language processing such as speech recognition and machine translation. Warren Weaver
The computations of the mind rely on
the structural organization of phrases
but are blind to the linear organization
of words that are articulated and per-
ceived by input and output systems at
the sensorimotor interface (speech/
sign). The computational procedure
that is universally adopted is computa-
tionally much more complex than an
alternative that relies on linear order.
Linear order is not available to the sys-
tems of syntax and semantics. It is an
ancillary feature of language, probably
areex of properties of the sensorimo-
tor system that requires it for externa-
lization, and constrained by conditions
imposed by sensorimotor modalities.
It follows that language is primarily an
instrument for the expression of
thought. Language is neither speech/
sign (externalized expression) nor com-
munication (one of its many possible
Utrecht Institute of Linguistics,
Utrecht University, 3512 JK Utrecht,
The Netherlands
Department of Linguistics and
Philosophy, Massachusetts Institute of
Technology, Cambridge, MA 02139,
Department of Electrical Engineering
and Computer Science and
Department of Brain and Cognitive
Sciences, Massachusetts Institute of
Technology, Cambridge, MA 02139,
Cognitive Neurobiology and
Helmholtz Institute, Departments of
Psychology and Biology, Utrecht
University, 3584 CH Utrecht, The
TICS 1501 No. of Pages 15
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 1
© 2015 Elsevier Ltd. All rights reserved.
TICS 1501 No. of Pages 15
famously made the case for a string-based approach to machine translation as a type of code
breaking, using statistical methods [9]. This position seems intuitively plausible because it
parallels the familiar way foreign language travel guides are organized, with phrases in one
language matched to corresponding phrases in another. The intuition is that simply pairing
matching sentence strings that are selected on the basis of statistical likelihood sufces, and that
accuracy does not require linguistic analysis, simply the compilation of a database of larger and
longer sentence pairs along with more powerful computers for data storage and selection.
Boosted by exactly this increased computing power along with innovative statistical work at IBM
Research (the late Fred Jelinek and John Lafferty among many others) [10], this approach rapidly
gained ascendancy in the late 1980s, gradually pushing out rule-based machine translation
approaches. But this surface-oriented big dataapproach is now all encompassing, not only in
computational linguistics.
The focus on the non-hierarchical aspects of language is evident in the work of some
typologists [11], and is at the basis of usage-based, constructionist linguistic theories
[12,13]. These approaches focus on inductive mechanisms that explain the acquisition and
use of low-level patterns,not predictable from general rules or principles, allowing us to
create novel utterances based on [constructional] schemas.[14]. Such approaches focus on
words or word-like constructions, usage patterns, do not acknowledge the relevance of
structure, and view acquisition essentially as statistical [15]. Introductions to psycholinguistics
generally do not mention notions such as hierarchy, structure, or constituent.
Department of Zoology and Sidney
Sussex College, University of
Cambridge, Cambridge, UK
(J.J. Bolhuis).
Box 1. Merge: The Basic Property of Language
Merge is a (dyadic) operation that takes two syntactic objects, call them X and Y, and constructs from them a single
new syntactic object, call it Z. X,Y can be building blocks that are drawn from the lexicon or previously constructed
objects. Put simply, Merge (X,Y) just forms the set containing X and Y. Neither X nor Y is modied in the course of the
operation Merge.
If X and Y are merged there are only two logical possibilities. Either X and Y are distinct, and neither one is a term of the
other, or else one of the two elements X or Y is a term of the other, where Z is a term of W if it is a subset of the other or the
subset of a term of the other. We can call the former operation External Merge: two distinct objects are combined[27_TD$DIFF].
(i)[28_TD$DIFF] Merge (read,that book)){read, that book}
If alternatively X is a term of Y or vice versa and X and Y are merged, we call this Internal Merge. So for example, we can
(Internal) Merge which book and John read which book, yielding the following:
(ii)[29_TD$DIFF] Merge (which book,John read which book)){which book, John read which book}
In this case, the result of merging X and Y contains two copies of Y. Following further operations, this structure will surface
as in (iii), under a constraint to externalize (pronounce) only the structurally most prominent copy of which book:
(iii)[30_TD$DIFF] (Guess) which book John read
This sentence may be understood as (iv):
(iv)[31_TD$DIFF] (Guess) for which book x, John read the book x
Internal merge is a ubiquitous property of language, sometimes called displace ment. Phrases are heard in one place but
they are interpreted both there and somewhere else.
Human language generates a digitally innite array of hierarchically structured expressions with systematic interpretations
at the interfaces with a sensorymotor (sound/sign) and a conceptualintentional (meaning) system. Thus, language
comprises a system to generate hierarchical syntax along with asymmetric mappings to the interfaces, a basic mapping
to the conceptualintentional interface and an ancillary mapping to the sensorymotor interface. Merge is the basic
operation underpinning the human capacity for language, UG, connecting these interface systems. Characterizing UG in
[32_TD$DIFF]terms of recursive merge is just a way of saying that whatever is going on in the brain neurologically can be properly
understood in these terms.
2Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
A variety of evidence can be brought to bear on the study of language. This can include language
use, acquisition, cognitive dissociations, other detailed neuroscience investigations, cross-
language comparisons, and much else besides. All this follows[6_TD$DIFF] from the well-conrmed
assumption that[44_TD$DIFF] the human capacity for language rests on shared biological properties.
However, for the development of generative grammar one particular type of evidence has
proved most useful: the ability of children to rapidly and effortlessly acquire the intricate principles
and properties of the language of their environment. All normally developing children acquire
most of the crucial elements of their language long before school age. By contrast, adults exhibit
a very different developmental path when they attempt to acquire a second language [16]. Often
they do not come close to the level of native speakers, even after a much longer time frame for
learning. Most researchers would agree that the distinctive ontogenesis of child language arises
from the interplay of several factors, including innate mechanisms, language-independent
properties, and external experience. On our view, the ability of children to rapidly and effortlessly
acquire the intricate principles and properties of their native language can best be explained by
looking for innate, language-dedicated cognitive structures (collectively known as Universal
Grammar) that guide learning.
Dened in this way, the study of language focuses on three questions:
(i) What constitutes knowledge of language? This amounts to understanding the nature of the
computational system behind human language.
(ii) How is knowledge of language acquired? This amounts to unraveling the cognitive pro-
cesses underlying primary language acquisition, so as to understand how primary language
acquisition differs from secondary, subsequent language acquisition.
(iii) How is knowledge of language put to use? This amounts to studying the linguistic processes
underlying language production, perception, and interpretation under varying conditions,
such as modality, social environment, and speech context and the way in which language
helps fulll our communicative needs.
A commitment to some answer to (i) is a logical precondition for addressing (ii) and (iii). Inquiry
into language acquisition and language use can proceed most effectively insofar as it is based on
careful description and understanding of the system that has evolved. The study of language
we believe has made sufcient progress answering question (i) to attempt to pursue answers to
questions (ii) and (iii).
The Innite Use of Finite Means
One feature of language that distinguishes it from all non-human communication systems we
know of is its ability to yield an unbounded array of hierarchically structured expressions,
permitting innite use of nite means[17]. To see how and why, we need to introduce the
notion of recursion, which underlies this niteinnite distinction. Much has been written about
recursion from different perspectives. There is no need to repeat this here [1820]. What is more
important to understand is that recursion in its original context based on the recursive function
theory developed by Gödel, Church, and Turing [2124] served as the formal grounding for
generative grammar and the solution to the niteinnite puzzle. The picture of Turing machine
computation provides a useful explanation for why this is so. In a Turing machine, the output of a
function fon some input xis determined via stepwise computation from some previously dened
value, by carrying forward or recursingon the Turing machine's tape previously dened
information. This enabled for the rst time a precise, computational account of the notion of
denition by induction (denition by recursion), with f(x)dened by prior computations on some
earlier input y,f(y), y<xcrucially so as to strongly generate arbitrarily complex structures [19].
Why is recursion important? As it is formulated above, recursion is important because it supplies
part of an answer to the seemingly unbounded creativity of language, so central to linguistic
Clitic: a syntactic element that
cannot occur freely in syntax but is in
need of a host. A typical clitic will
attach itself to a host, that is, a (fully
inected) word or phrase, for
example, French te youin Je taime.
Compositionality: a principle that
constrains the relation between form
and meaning by requiring that the
meaning of a complex expression is
built up from the meanings of its
constituent expressions and the way
they are combined. This principle
plays an important role in formal
semantic theories.
C(onstituent)-command: c-
command is a binary relation
between nodes in a tree structure
that is dened as follows: node /c-
commands node biff (i) /b, (ii) /
does not dominate band bdoes not
dominate /, and (iii) every gthat
dominates /also dominates b.
Context-free language: a language
(set of sentences) generated by a
context-free grammar, namely, a
grammar whose rules are all
restricted to be in the form X !w,
where X is a single phrase name
(such as VP or NP), and w is some
string of phrase names or words.
Externalization: the mapping from
internal linguistic representations to
their ordered output form, either
spoken or manually gestured.
Gap: any node in the phrase
structure that has semantic content
but is without phonological content,
for example, children should be seen
and not heard.
Generative grammar: generative
grammar is a research program that
includes different competing
frameworks, and takes linguistics as
a science whose goal it is to try to
provide a precise (explicit and formal)
model of a cognitively embedded
computational system of human
language, and to explain how it is
Merge: in human language, the
computational operation that
constructs new syntactic objects Z
(e.g., ate the apples) from already
constructed syntactic objects X
(ate), Y (the apples), without
changing X or Y, or adding to Z, that
is, set formation.
Negative concord items: negative
polarity items with a more restricted
distribution. They can only be
licensed by clausemate sentential
negation and can sometimes express
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 3
TICS 1501 No. of Pages 15
theorizing since the mid-20th century. This essential property of language provides a means for
expressing indenitely many thoughts and for reacting appropriately in an indenite range of new
situations [25].
This approach to the unbounded character of language may be contrasted with the conventional
empiricist position that assumes inductive generalizations from observable distributional regu-
larities to be sufcient for learning and use of language. For American structuralism, a standard
concept was that of Leonard Bloomeld, the leading theoretician, for whom language is an array
of habits to respond to situations with conventional speech sounds and to respond to these
sounds with actions[26]. Another leading gure, Charles Hockett, attributed language use to
analogy, and this meant that we construct and understand novel sentences on the basis of
those we have constructed and understood before. For Hockett, similarityplayed the central
role in language learning, production, and use [25]. This line of thought is still at the forefront of
many modern-day stochastic learning algorithms, generalized learning procedures, and natural
language parsers. The crucial question, however, is whether a notion of analogy can be
properly dened so as to adequately explain how children acquire language (Box 2).
Syntax: What You See is Not What You Get
Given the view set out above, Aristotle's dictum that language is sound with meaningcould
arguably be reformulated as language is meaning with sound, since the mappings of expres-
sions to the two interfaces are asymmetric, as noted above. The mapping to the systems of
inference, interpretation, and the like we assume to be simple, principled, and close to invariant,
following structural principles unexceptionally and possibly in harmony with the methodological
principle of compositionality [27]. The mapping to the sensory modalities (speech, sign) is
more complex, clearly subject to parameterization and is more likely to have exceptions [28].
Linking a cognitive system to one or other of the sensory modalities amounts to the difcult
problem of relating two different categories of systems with different properties and different
evolutionary histories. But the syntactic operations that map linguistic objects to the semantic
interface do not use the simple properties of sequential string order, that is, linear precedence.
Instead, they rely exclusively on the hierarchical structural position of phrases, that is, hierarchical
structural distance and hierarchical structural relations (Box 3). In the following we illustrate the
reliance of language on hierarchical structure rather than linear precedence in all areas of
language by providing examples from semantics, syntax, morphology, and phonology.
The Syntax of Semantics
A simple textbook illustration of the reliance of language on hierarchical structure is provided by
syntactic properties of negative polarity items (NPIs) such as the English word [45_TD$DIFF]anybody or
negative concord items such as the Japanese word nani-mo (anything). These items require
an overt negative element such as not or nakat. If we omit the negative items, the sentences
become ill-formed[3_TD$DIFF] (*), cf. (1a,b) and (2a,b):
(1) [4_TD$DIFF]a. [4_TD$DIFF]The book I bought did not appeal to [45_TD$DIFF]anybody.
b. *The book I bought appealed to [45_TD$DIFF]anybody.
(2) [4_TD$DIFF]a. [4_TD$DIFF]Taroo-wa [4_TD$DIFF]nani-mo [4_TD$DIFF]tabe-nakat-ta.
Taroo-TOP what-MO [4_TD$DIFF]eat-NEG-PST
[5_TD$DIFF]Taro didnt eat anything
[5_TD$DIFF]b. *Taroo-wa [4_TD$DIFF]nani-mo [4_TD$DIFF]tabe-ta.
Taroo-TOP what-MO [4_TD$DIFF]eat-PST
[5_TD$DIFF]From (1a,b) one might also conclude, wrongly, that the English NPI anybody must appear in the
sentence sequentially after not. This conclusion is immediately refuted by the Japanese example
negation on their own as in fragment
Negative polarity items: a word or
word group that is restricted to
negative contexts needing the
scope of a negation (or more
precisely a monotone decreasing
Parasitic gap (PG): is a gap (a null
variable) that depends on the
existence of another gap RG, sharing
with it the same operator that locally
binds both variables. PG must
conform to a binding condition
asserting that PG cannot be c-
commanded by RG.
Parsers: a natural language parser is
a program for analyzing a string of
words (sentence) and assigning it
syntactic structure in accordance
with the rules of grammar. Ideally, the
relation between basic parsing
operations and basic operations of
grammar approximates the identity
function. Probabilistic parsers use
statistical information to provide the
most likely grammatical analyses of
new sentences.
Phonology: the study of the abstract
sound patterns of a particular
language, usually according to some
system of rules.
Phrase structure rules: rewrite
rules that generate phrase structure.
These have the general form of (i),
where X is the name of the phrase
and Y Z W denes its structure. Y, Z,
and W are either phrases, and
therefore must themselves occur to
the left of the arrow in other rules of
this type, or non-phrasal (terminal)
categories (such as noun, verb, or
determiner). (i) X !YZW
Prosody: the description of rhythm,
loudness, pitch, and tempo. It is
often used as a synonym for
suprasegmentals, although its
meaning is narrower: it only refers to
the features mentioned above.
Recursion: a property of a nitely
specied generative procedure that
allows an operation to reapply to the
result of an earlier application of the
same operation. Since natural
language is unbounded, at least one
combinatorial operation must be
applicable to its own output (via
recursion or some logical equivalent).
And given such an operation, any
derivational sequence for a generable
string will determine a hierarchical
structure, thus providing one notion
of structure generation (strong
generation) distinct from the weakly
generated string.
4Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
in (2a), where nakat follows the negative concord item nani-mo. Example (3) also shows that the
operative constraint cannot be linear order since (3) is ill-formed despite the fact that not appears
sequentially before anybody, just as it does in the well-formed example (1a).
(3) [4_TD$DIFF]*The book I did not buy appealed to [45_TD$DIFF]anybody.
What is the correct constraint governing this pattern? It depends on hierarchical structure and
not on sequential or linear structure [29,30].
Consider Figure 1A, which shows the hierarchical structure corresponding to example (1a): the
hierarchical structure dominating not also immediately dominates the hierarchical structure
Selectional properties: the
semantic restrictions that a word
imposes on the syntactic context in
which it occurs: a verb such as eat
requires that its subject refers to an
animate entity and its object to
something edible.
Syntax: the rules for arranging items
(sounds, words, word parts, phrases)
into their possible permissible
combinations in a language.
Universal Grammar (UG): is the
theory of the genetic component of
the faculty of language, the human
capacity for language that makes it
possible for human infants to acquire
and use any internalized language
without instruction and on the basis
of limited, fragmentary, and often
poor linguistic input. UG is the general
theory of internalized languages and
determines the class of generative
procedures that satisfy the basic
property, besides the atomic
elements that enter into these
Box 2. Simple Rules
Consider the following noun phrases (i), and their description in terms of context-free phrase structure rules (ii), and
accompanying gures (Figure I):
(i) a man (ii) [33_TD$DIFF]a man on the moon
(G) [4_TD$DIFF]a. [4_TD$DIFF]N(oun) P(hrase) !Det(erminer) N(oun)
b. [4_TD$DIFF]NP !Det N Prep Det N
Our grammarin (Ga,b) (in which !means consists of) would allow one to create an enormous variety of noun
phrases given a vocabulary of determiners, nouns, and prepositions. However, observing that (iii) is also possible, we
would have to add a rule (Gc) to our grammar:
(iii) [4_TD$DIFF]a girlfriend of the man from the team
(G) [4_TD$DIFF]c. [4_TD$DIFF]NP !Det N Prep Det N Prep Det N
But now we are missing a linguistically signicant generalization: every noun phrase can have a prepositional phrase
tacked on the end, which is accounted for by replacing grammar G by the following simpler set of rules:
(G0)[4_TD$DIFF]a. [4_TD$DIFF]NP !Det N (PP)[34_TD$DIFF] (noun phrases consist of a determiner and a noun and may be followed by a prepositional
b. [4_TD$DIFF]PP !Prep NP[35_TD$DIFF] (prepositional phrases consist of a preposition followed by a noun phrase)
(G0) is a simpler grammar. But note that (G0) represents (part of) a g rammar yielding a discrete innityof possible phrases,
allowing us to generate ever longer noun phrases taking prepositional phrases. We could only circumvent this
unboundedness by returning to a grammar that explicitly lists the congurations we actually observe, such as (G).
But such a list would be arbitrarily limited and would fail to characterize the linguistic knowledge we know native speakers
have. This recursive generation of potential structures (linguistic competence) should not be incorrectly equated with
real-time production or parsing of actual utterances (linguistic performance). Note that this distinction is no different from
the rules for addition or multiplication. The rules are nite, but the number of addition or multiplication problems we can
solve is unbounded (given enough internal or external resources of time and memory).
Grammar (G0) also reects the fact that phrases are not simple concatenations of words, but constitute structured
objects. (G0), contrary to (G), therefore correctly reects properties of constituency as illustrated in (v):
(v) [4_TD$DIFF]He gave me [a book [about [the pope]]]
It is [the pope]
, he gave me [a book [about[36_TD$DIFF] X]]
It is [about the pope]
, he gave me [a book [37_TD$DIFF]X].
Det N Prep Det N
a man on the moon
Figure I. Structures for (i) and (ii) on the basis of Grammar G.
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 5
TICS 1501 No. of Pages 15
containing anybody. (This structural conguration is called c(onstituent)-command in the
linguistics literature [31].) When the relationship between not and anybody adheres to this
structural conguration, the sentence is well-formed.
In sentence (3), by contrast, not sequentially precedes anybody, but the triangle dominating not
in Figure 1B fails to also dominate the structure containing anybody. Consequently, the sentence
is not well-formed.
The reader may conrm that the same hierarchical constraint dictates whether the examples in
([47_TD$DIFF]45) are well-formed or not, where we have depicted the hierarchical sentence structure in
terms of conventional labeled brackets:
([48_TD$DIFF]4) [7_TD$DIFF][
The book [
I bought]
did not [
appeal to anyone]
[49_TD$DIFF](5) *[
The book [
I did not buy]
appealed to anyone]
Only in example ([50_TD$DIFF]4) does the hierarchical structure containing not (corresponding to the sentence
The book [51_TD$DIFF]I bought did not appeal to anyone) also immediately dominate the NPI anybody. In (5[8_TD$DIFF])
not is embedded in at least one phrase that does not also include the NPI. So ([50_TD$DIFF]4) is well-formed
and (5[8_TD$DIFF]) is not, exactly the predicted result if the hierarchical constraint is correct.
Even more strikingly, the same constraint appears to hold across languages and in many other
syntactic contexts. Note that Japanese-type languages follow this same pattern if we assume
that these languages have hierarchically structured expressions similar to English, but linearize
these structures somewhat differently verbs come at the end of sentences, and so forth [32].
Linear order, then, should not enter into the syntacticsemantic computation [33,34]. This is
rather independent of possible effects of linearly intervening negation that modulate acceptability
in NPI contexts [35].
The Syntax of Syntax
Observe an example as in (6):
(6) [4_TD$DIFF]Guess which politician your interest in clearly appeals to[52_TD$DIFF].
The construction in (6) is remarkable because a single wh-phrase is associated both
with the prepositional object gap of to and with the prepositional object gap of in,asin
(7a). We talk about gapsbecause a possible response to (6) might be as in (7b):
(7) [4_TD$DIFF]a. [4_TD$DIFF]Guess which politician your interest in GAP clearly appeals to GAP[52_TD$DIFF].
b. response to (7a): Your interest in [53_TD$DIFF]Donald Trump clearly appeals to [53_TD$DIFF]Donald Trump
(A) (B)
The book X X X The book X appealed to anybody
did not
that I bought appeal to anybody that I did not buy
Figure 1. Negative Polarity. (A) Negative polarity licensed: negative element c-commands negative polarity item.
(B) Negative polarity not licensed. Negative element does not c-command negative polarity item.
6Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
The construction is called parasitic gap(PG) because the rstgap in the nominal expression,
the subject, is parasitic on the real gap(RG) in the verbal expression: (8b) is well-formed and
occurs independently of (6), while (8a) is ill-formed and does not occur independently of (6).
(8) [4_TD$DIFF]a. [4_TD$DIFF]*Guess which politician [
your interest in PG]
clearly appeals to Jane]
b. Guess which politician [
your interest in Jane]
clearly appeals to RG]
In other words, the gap in (8a) cannot exist unless it co-occurs with the independently licensed
gap of (8b), resulting in (6/7a). Parasitic gap constructions are rarely attested, virtually absent
from the empirical record. Nevertheless, language learners attain robust knowledge of parasitic
gap constructions. Although such constructions had been observed to exist long ago (J.R.
Ross, PhD thesis, Massachusetts Institute of Technology, 1967; [36]), the properties of parasitic
gaps were predicted to exist on theoretical grounds [37], and were (re)discovered as a result of
precise generative analysis [3842]. Applying analytical or statistical tools to huge corpora of
data in an effort to elucidate the intriguing properties of parasitic gaps will not work.
However, not every co-occurrence of RG and PG yields a grammatical result:
(9) [4_TD$DIFF]a. [4_TD$DIFF]*Guess which politician clearly loves your interest in[52_TD$DIFF].
b. Guess which politician [
RG clearly loves [
your interest in PG]
Hierarchical structure and structure dependence of rules are basic factors in explaining parasitic
gaps and the asymmetry between (6) and (9), a subjectobject asymmetry. The PG is parasitic on
an independently occurring RG but may not be linked to a RG that is in a structurally higher
position. This is illustrated in Figure 2A and 2B for (6) and (9), respectively.
InFigure 2Awho is structurally higherthan both the RG and the PG,butthePG, being embedded in the
noun phrase subject, is not structurally higher than the RG.InFigure 2B, by contrast, the RG in the
subject position is in a hierarchicallyhigher position thanthe PG in lower prepositional object position.
The contrasting ller-gap cases of (6) and (9) cannot be characterized by their linear properties. It
would be incorrect to state that PGs must precede their licensing RGs, as shown by (10):
(10) [4_TD$DIFF]Who did you [[talk to RG] without recognizing PG][54_TD$DIFF]?
Crucially, the RG licensing the PG is not in a structurally higher position in (10): the verb phrase
dominating the RG does not dominate the adverbial phrase containing the PG. Why this restriction
precisely holds we leave undiscussed here, but is[55_TD$DIFF] discussed at length[9_TD$DIFF] in the literature on
parasitic gaps.
which polician X
your interest X clearly appeals X
in PG to RG
which polician X
clearly loves X
your interest X
in PG
Figure 2. Parasitic Gap. (A) Well-formed parasitic gap construction: which politician c-commands both real gap (RG) and
parasitic gap (PG). RG does not c-command PG (and PG does not c-command RG either). (B) Ill-formed parasitic gap
construction: which politician c-commands both real gap (RG) and parasitic gap (PG). RG c-commands PG.
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 7
TICS 1501 No. of Pages 15
The same concepts apply across empirical domains in language. For example, adopting these
concepts enables us to explain certain unexpected and surprising phenomena in Dutch.
Compare (11a) to its counterpart (11b) with a phonologically weak pronoun (clitic).
(11) [4_TD$DIFF]a. [4_TD$DIFF]Ik ben speciaal voor het klimaat [4_TD$DIFF]naar de Provence toe [56_TD$DIFF]gereden.
I am especially for the climate [4_TD$DIFF]to the Provence [57_TD$DIFF]driven.
I[58_TD$DIFF]drove [59_TD$DIFF]to Provence especially for the climate.
b. Ik ben er speciaal voor [4_TD$DIFF]naar toe [4_TD$DIFF]vertrokken.
I am it especially for [4_TD$DIFF]to [57_TD$DIFF]driven.
I[60_TD$DIFF]drove there especially for it.
The clitic er it/thereis linked to two gaps, the NP [61_TD$DIFF]complements of the preposition voor and the
complex preposition/postposition naar...toe. A single clitic position er simultaneously binds two
structural positions that have different selectional properties but meet the structural con-
ditions of standard parasitic gaps. This old puzzle of structuralist and generative grammar,
sometimes referred to as Bech's Problem[4345], may now turn out to be explainable as a
special case of a parasitic gap construction (if a language-specic property of Dutch morphology
is added to the equation). The simple lesson to take home is that a few assumptions about the
structure of language sufce to give a unied account of supercially unrelated and disparate
phenomena that are left unexplained in models that are restricted to concepts such as linear
precedence. In fact, proposals that restrict themselves to just linear order are both too weak
(incorrectly permitting ill-formed PGs) and too strong (incorrectly ruling out well-formed PGs). They
are therefore neither sufcient nor necessary to deal with natural language and should be
The Syntax of Morphology
Sound and meaning in morphology can also be shown to be dependent on hierarchical
structure. But there is an asymmetry. As discussed above, computational rules of language
invariably keep to the complex property of hierarchical structure and never use the far simpler
option of linear order. But, of course, linear order must be available for externalization since the
sensorymotor system requires that whatever structure is generated must pass through some
type of lter that makes it [62_TD$DIFF]come out in linear order.
For further evidence of the relevance of hierarchical structure, consider the compounds in (12)
and their respective structures in Figure 3A,B.
tIcket Union
(A) (B)
Figure 3. Prosodic Prominence. Right-branching (A) and left-branching (B) nominal compound structures. Bold capital
letters in initial syllables of each word denote position of primary word stress. Compound stress rule is applied successively,
rst to the lower, embedded compound, then to the next higher compound containing it. The syllable cons istently assigned
strong prosodic prominence (s) on each application of the rule carries compound stress.
8Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
(12) [4_TD$DIFF]a. [4_TD$DIFF]lábor union president, kítchen towel rack (rack for kitchen [63_TD$DIFF]towels)
b. theatre tícket ofce[5_TD$DIFF], kitchen tówel rack (towel rack in the kitchen)
The correct interpretations of these compounds, both at the sensorymotor interface (namely,
different[10_TD$DIFF] [65_TD$DIFF]prosodies) and at the semantic interface (namely, different meanings) follow directly
from applying the relevant rules to their radically different hierarchical structures. Here we will limit
our illustration to prosodic prominence. The rule describing prosodic prominence is given in (13):
(13) [4_TD$DIFF]Assign prosodic prominence to the rst noun N
of a compound [
] if and only
if the second noun N
does not branch.
(More precisely: In a compound N, [
], assign prosodic prominence (s) to the
primary stressed syllable of N
if N
does not branch.)
The recursive application of this structure-dependent rule, based on [4648], to the different
hierarchically structured expressions in Figure 3A and 3B yields the correct prosodic prominence
patterns in each case. If none of the parts of a compound branches, as in ticket ofce or labor
union, prosodic prominence (s) is assigned by (13) to the left-hand noun N
because its right-hand noun N
(ófce,únion) does not branch. As a corollary effect, the N
becomes prosodically weak ([66_TD$DIFF]w[67_TD$DIFF]). The noun theatre tícket ofce (Figure 3A) is a compound N
consisting of a simple noun N
(théatre) and a noun N
(tícket ofce), which is itself a compound
noun with prosodic prominence already assigned by previous application of (13), as just
discussed. It is a right-branching hierarchical structure. Therefore, the N
cannot be prosodically
prominent because N
branches. Consequently, prominence must be assigned to N
, the inner
compound noun. The repeated application of (13) yields the correct result. Analogously, the
compound noun lábor union president has a left-branching hierarchical structure (Figure 3B).
Prosodic prominence, again, falls on the left-hand noun of the inner compound, which, in this
case, is the left-hand member of the full compound structure. The reason is that the right-hand
[68_TD$DIFF]member [69_TD$DIFF]is non-branching and must therefore be prosodically weak. A derivation working from
the bottom up guarantees a correct result.
If prosodic prominence would have been constrained by conditions on linear structure we would
have expected stress to fall uniformly and rigidly on a xed linear position in the string. But language
does not work that way. Patterns of prosodical prominence are neither random nor rigid but
determinate and they universally depend on a more complex hierarchical structure of compounds
such as lábor union president election,evening compúter class teacher,community centre
búilding council, which have each a different stress pattern that is sensitive to structure and is
assigned in accordance with (13). Depending on specic hierarchical structure, stress falls on a
word-stressed vowel of the rst, second, or penultimate noun but never on the nal noun. These
results would be totally unexpected if we just assume conditions on linear properties of language.
The Syntax of Phonology
In spoken English certain sequences of words can be contracted, for example, dontvs do not.
Similarly, want to can be contracted to wanna:
(14) a. I want to persuade the [6_TD$DIFF]biologist vs c. I wanna persuade the biologist.
b. Who do you want to persuade? [7_TD$DIFF]vs d. Who do you wanna persuade?
But this contraction is not always possible. There are some cases where one cannot substitute
wanna for want to, as in (15):
(15) [4_TD$DIFF]a. [4_TD$DIFF]I want my colleague to persuade the biologist.
b. *I wanna my colleague persuade the biologist.
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 9
TICS 1501 No. of Pages 15
Here the constraint seems clear: one can only contract to wanna if no words intervene between
them. Apparently, the phonological process of contraction is sensitive to an adjacency condition.
However, some examples such as in (16a) and (17a) below seem to meet this adjacency
constraint, yet the contraction is still blocked, as in (16b) and (17b):
(16) [4_TD$DIFF]a. [4_TD$DIFF]Who do you want to persuade the biologist?
b. *Who do you wanna persuade the biologist?
(17) [4_TD$DIFF]a. [4_TD$DIFF]We expect parents who want to long for luxury
[72_TD$DIFF](that is, want[13_TD$DIFF] meaning to be needy[73_TD$DIFF])
b. *We expect parents who wanna long for luxury
Why is this so? (16a) asks Who should persuade the biologist?’–in other words, who is the
subject of persuade. In (14b) who is the object of persuade. The hierarchical syntactic structure
for these two sentences is therefore different, and it is this difference that allows contraction in
(14d) while blocking it in (16b). The syntactic structure of the two examples is representable as
(14b0) and (16b0), where we have struck through the original position of who, its place of
interpretation, before the basic operation of generative grammar has applied that put who at the
front of the sentence. The crossed-out who is not pronounced, which is why the externalized
output appears only as who do you want to persuade.
(14b0) [Who [do you want [to persuade who]]]?
(16b0) [Who [do you want [who to persuade the biologist]]]
Note that in (16b0) the crossed-out who[74_TD$DIFF] (i.e. not pronounced) intervenes between want and to,
just as my colleague does in (15a). But as we have seen, the contraction rule that yields wanna
does not tolerate any elements intervening between want and to. The complex case of (16b) thus
reduces to the simple case of (15b), and contraction is blocked [49,50].
The examples in (17), from [51], show that for contraction c-command between the verb want
and to is also a necessary condition. Contraction is not allowed in (17) because want (in the
meaning to be needy) is part of the subject and, therefore, structurally not higher than to[75_TD$DIFF] (cf.
17b0). Absence of c-command is the relevant factor blocking contraction despite the availability
of linear adjacency.
(17b0)[4_TD$DIFF]We expect [[
parents who want] to long for luxury]
Once again then, it is ultimately the structural properties of a sentence that run the show. For
speakers, the hiddenproperties, non-pronounced words (like in 16b0) are just as substantial as
pronounced words. The linguistic computations of the mind hearwhat the ear does not. Just as
color and edges do not exist out in the worldbut rather are internal constructions of the mind,
language is not a property of external sound sequences and does not exist apart from mind-
internal computations (Box 1). In this sense, language behaves just like every other cognitive
ability that scientists have so far uncovered.
Summarizing the discussion above, we have shown that for
(i) the mapping to the conceptualintentional interface, our discussion on negative polarity
items and parasitic gaps:
!hierarchical structure is necessary and sufcient
!linear structure is irrelevant, that is, order is inaccessible
(ii) the mapping to the sensorymotor interface, our discussion of stress assignment and
10 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
Box 3. Constituents: Weak versus Strong Generative Capacity
We experience language, written or spoken, linearly, and therefore it seems straightforward to take order as a central
feature of language. But take the example a blue striped suit. We are instantaneously capable of assessing that this
phrase is ambiguous between a reading in which the suit is both blue and striped (Figure I) and a reading where the suit is
blue-striped (Figure [38_TD$DIFF]I).
[5_TD$DIFF]In the trees above this meaning difference is reected in a different structuring of the same words with the same linear
order. In generative grammar these aspects (structure and order) are distinguished by the notions of weak and strong
generative capacity. In weak generative capacity, what counts is whether a grammar will generate correct strings of
words; strong generative capacity adds the requirement that the right hierarchical structure is accounted for. And this
latter point is of the essence for the study of natural language as we just illustrated.
Let us explain the difference more precisely. For example, the context-free language characterized as a
can be
correctly generated by the grammars G
and G
in (i).
(i) a. [4_TD$DIFF]G
b. [4_TD$DIFF]G
These two grammars are weakly equivalent in that they both generate exactly the same string set, accepting the string
aabb, but not aabbb. However, these two grammars differ in their strong generative capacity. For example, the substring
aab is a constituent in G
, but it is not in G
([41_TD$DIFF]Figure II).
Weak generative capacity may play a signicant role in formal language theory, where it is stipulated, as in formal
arithmetic. But for natural language the concept of weak generative capacity is unnatural, unformulable, and inapplicable.
It is important to realize that many possible phrase structure grammars that weakly generate some set of words or linear
pattern fail as soon as strong generative capacity is taken into account. The main text illustrates serious challenges for any
system based solely on weak generative capacity, as was forcibly argued from the very beginning of the modern
generative enterprise [[42_TD$DIFF]1,73]. In this respect, natural languages behave very differently from formal languages.
Det Det
blue striped
(A) (B)
Figure I. [23_TD$DIFF]Constituency Natural Language. Two structures for the ambiguous a blue striped suit,reecting its syntax
and semantics: ([24_TD$DIFF]A) a reading in which the suit is both blue and striped, and ([25_TD$DIFF]B) a reading where the suit is blue-striped.
ab ab
Figure [26_TD$DIFF]II. Constituency Formal Language. The string aabb on the basis of grammar G
[2_TD$DIFF] and grammar G
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 11
TICS 1501 No. of Pages 15
!hierarchical structure is necessary, but not sufcient
!linear structure is relevant, that is, order is needed for externalization.
What reaches the mind is unordered, what reaches the ear is ordered.
Language and Communication
The generative research tradition has never assumed that the communicative function of
language underpins the essential properties of language. Note that generative grammar does
not claim that language cannot be used for communicative purposes, rather that its design
features are not to be understood in communicative terms [52]. For many, both linguists and
non-linguists, it is difcult to imagine that some of the core properties of human language are not
derived from its communicative functions. This seems to follow from the observation that
language is so deeply embedded in human social interaction, facilitating the communicative
and social needs of a community of speakers to share information. Communication provides a
vehicle for sharing information with others. Viewed this way, language is closely intertwined with
non-verbal modes of communication, such as gestures, eye contact, pointing, facial expres-
sions, music, and the like, any of which may have communicative signicance. For this approach
to be well-founded, one must be precise about what communicationmeans. One can, for
instance, somewhat naturally talk about owers communicating with bees. The (often tacit)
assumption is that one can pursue non-human comparisons by comparing human communi-
cation to animal communication, and more precisely the natural communication systems that
use auditory, visual, or audiovisual signals [53]. And it is this notion of communication that one
has in mind when one denes language as The systematic, conventional use of sounds, signs,
or written symbols in a human society for communication and self-expression.[54].
What then makes such verbal behavior, language, different from non-verbal systems of
communication? Communicating how to assemble an Ikea bookcase proceeds without (much)
language, via a manual consisting of just pictures, or by a video manual combining picture and
accompanying speech. But explaining what compositionality or impeachment mean is not done
via music, or facial expressions. So could it be that language as we know it might be particularly
useful in hardcommunicative situations, and is, therefore, far more complex than any animal
communication system?[55]. On such a view, animal communication systems would not be so
far removed from what humans do: less complex, but not qualitatively different. By contrast, we
believe that animal communication systems differ qualitatively from human language [5658]:
animal communication systems lack the rich expressive and open-ended power of human
language, the creative aspect of normal language use in the Cartesian sense. Moreover, even the
atomsof natural language and animal communication systems are crucially different. For animal
systems, symbols(e.g., vervet calls) are linked directly to detectable physical events, associ-
ated with some mind-independent entity. For natural language it is radically different [59]. The
evolutionary puzzle, therefore, lies in working out how this apparent discontinuity arose [60,61],
demonstrating how the basic property ts this discontinuity both to the known evolutionary facts
and evolutionary theory [62].
As illustrated above, structure dependency is a paramount feature of natural language, which
only makes sense if solutions that rely on linear order are not available to the system that
computes the mapping to the conceptualintentional system. But if this is the case, using
language for communicative purposes can only be a secondary property, making externalization
(e.g., as speech or sign) an ancillary process, a reection of properties of the sensorymotor
system that might have nothing special to do with language in the restricted sense we take it to
be: uniquely human (species-specic) and uniquely linguistic (domain-specic). The fact that we
share a wide variety of cognitive and perceptual mechanisms with other species, for instance,
vocal learning in songbirds, would then come as no surprise [63]. It would also follow that what is
12 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
externally produced might yield difculties for perception, hence communication. For example,
consider the sentence They asked if the mechanics xed the cars. In response to this statement,
one can ask how many cars? yielding How many cars did they ask if the mechanics xed?
However, one cannot ask how many mechanics, yielding How many mechanics did they ask if
xed the cars, even though it is a perfectly ne thought. To ask about the number of mechanics,
one has to use some circumlocution, one that impedes communication. In this case, commu-
nicative efciency is sacriced for the sake of internal computational efciency, and there are
many instances of this sort. Examples running in the other direction, where communicative
function is favored over internal computational function (Box 1), seem impossible to nd. Thus,
the functional relationship between efcient language-as-internal-computation versus language-
as-communication is asymmetric in every case that can be carefully posed. The asymmetry is:
the mapping to meaning is primary and is blind to order (language as a system for thought), the
mapping to sound/sign is secondary and needs order (imposed by externalization of language).
The empirical claim is, therefore, that linear order is available for the mapping to sound/sign, but
not for the mapping to meaning.
Structures, Not Strings
The examples we have just given illustrate what is perhaps the most signicant aspect of
language: utterances are not simple linear concatenations of simpler building blocks (words,
morphemes, phonemes). Rather, utterances are hierarchically structured objects built out of
these simpler elements. We have to take this property into account if we want to correctly
describe linguistic phenomena, whether semantic, syntactic, morphological, or phonological in
nature. Structure dependence of rules is a general property of language that has been exten-
sively discussed from the 1950s onwards and is not just restricted to the examples we have
Box 4. String Linguistics
To illustrate the type of problems an approach to human language that adopts a purely sequential structure is confronted
with, we use Google Translate, a powerful string-based machine translation service that supports the non-hierarchical,
linear view on language. Google Translate [used through Firefox on June 8, 2015] maps the French La pomme mange le
garçon, lit. the apple eats the boy, into the boy eats the apple, precisely because the most likelyoutput sentence is the
product of the probabilities of linear word strings or pairs, and the probability of the latter string vastly dominates the
probability of the former. This problem pervades the entire approach. For example, observe Dutch (i) and its Google
(i) De man van mijn tante kust de vrouw.
(ii) The husband of my aunt kissing the woman.
While not perfect it should be The husband of my aunt is kissing the woman this certainly approximates what one
would like. But the system fails dismally when translating the question equivalent: Dutch (iii) becomes (iv), rather than (v)[27_TD$DIFF].
(iii) Kust de man van mijn tante de vrouw?
(iv) Shore man of my aunt's wife?
(v) Is the husband of my aunt kissing the woman?
Here, kust (kisses), derived from kussen (to kiss), is translated as shore, having been misinterpreted as the Dutch noun
kust for shore/coast. Moreover, the subject de man van mijn tante is analyzed as the possessive of the object de vrouw.
What has gone wrong? Omitting much detail along with trade secrets, what such systems do is roughly this: given a
particular Dutch sentence, notated (Diii), iterate over all English strings of words to nd that bestEnglish string, E0, which
maximizes the probability of E0the probability (Diii jE0), that is, the probability of the Dutch (iii) given E0. Note that this
statistical decomposition is linear. It will tend to select commonly occurring word pairs, for instance, kust/coast,ifno
longer pairing is readily available or inferred. For example, no English pairing for the Dutch kust de man because the
phrase bookis still not dense enough in the space of pairings.
Adopting the view that hierarchy is only relevant when the language user is particularly attentive, when it is important for
the task at hand[[43_TD$DIFF]71] comes at a price. For a practical business solution the price is right, for a scientic approach to the
study of language the price is wrong.
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 13
TICS 1501 No. of Pages 15
presented so far. These are phenomena that, in our view, must be explained in terms of intrinsic
and domain-specic properties of a biolinguistic system.
Native speakers have robust knowledge of the constraints that we discussed above, and often
that knowledge is tacit again analogous to the reconstruction of colorand edges. Some-
times relevant examples are rarely attested in adult language, but children acquire them
nonetheless. Furthermore, it has been shown repeatedly that infants acquiring language do
not solely engage in statistical learning by approximating the target language [6470]. For these
and other reasons, usage-based approaches that reject generative procedures, and apply
statistical methods of analysis to unanalyzed data (Box 4), probing into huge but nite lists of data
that are not extendable, fail to distinguish these cases properly. By contrast, generative
procedures succeed in amalgamating a large, diverse set of individual examples into just a
few constraints such as the hierarchical dominance example.
Linear statistical analysis fails to account for how semantic readings are specically linked to
syntactic structures or to explain why ambiguity is constrained in some cases but not in others. A
major problem is not just the failure to succeed, but more importantly the apparent unwillingness
to come to terms with simple core puzzles of language structure such as those we have noted
[71]. There have been a handful of other efforts to provide alternative accounts for structure
dependence [[76_TD$DIFF]72,74], but these have been shown to fail [69]. However, if we are really interested
in the actual mechanisms of the internal system we should ask about the properties that
determine how and why the syntaxsemantics mappings are established in the way they
are and not otherwise (see Outstanding Questions).
[77_TD$DIFF]Concluding Remarks
Approximating observational phenomena is very different from formulating an explanatory
account of a signicant body of empirical data. Equating likelihood probabilities of language
use with grammaticality properties of internal systems does not succeed because structural
properties of phrases and the generative capacity of internal systems to build structure cannot
be reduced to linear properties of strings. These somewhat elementary but important insights
have been recognized since the very origins of generative grammar[78_TD$DIFF] [1,18], but seem to have
been forgotten, ignored, or even denied[14_TD$DIFF] without serious argument in recent times.
J.J.B. is part of the Consortium on Individual Development (CID), which is funded through the Gravitation program of the
Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientic Research (NWO; grant
number 024.001.003).
1. Chomsky, N. (1956) Three models for the description of language.
IRE Trans. Inform. Theor. IT2, 113124
2. Miller, G.A. (1956) The magical number seven, plus or minus two:
some limits on our capacity for processing information. Psychol.
Rev. 63, 8197
3. Newell, A. and Simon, H.A. (1956) Logic Theory Machine: a
complex information processing system. IRE Trans. Inform. Theor.
IT2, 6179
4. Shannon, C.E. (1956) The zero error capacity of a noisy channel.
IRE Trans. Inform. Theor. IT2, 819
5. Chomsky, N. (1995) The Minimalist Program, MIT Press
6. Reinhart, T. (2006) Interface Strategies: Optimal and Costly Com-
putations, MIT Press
7. Rizzi, L. (2012) Core linguistic computations: how are they
expressed in the mind/brain? J. Neuroling. 25, 489499
8. Selkirk, E. (2011) The syntaxphonology interface. In The Hand-
book of Phonological Theory (2nd edn) (Goldsmith, J. et al., eds),
pp. 435484, Blackwell
9. Weaver, W. (1947) Translation. In Machine Translation of Lan-
guages (Locke, W.N. and Booth, D.A., eds), pp. 1523, MIT Press
10. Brown, P. et al. (1988) A statistical approach to language transla-
tion. In COLING 88 Proceedings of the 12th Conference on
Computational Linguistics (Vol. 1), pp. 7176, Association for
Computational Linguistics
11. Evans, N. and Levinson, S. (2009) The myth of language univer-
sals. Behav. Brain Sci. 32, 429492
12. Tomasello, M. (2003) Constructing A language: A Usage-Based
Theory of Language Acquisition, Harvard University Press
13. Langacker, W. (2008) Cognitive Grammar: A Basic Introduction,
Oxford University Press
14. Da˛browska, E. (2015)What exactlyis UniversalGrammar, and has
anyone seen it? Front. Psychol. 6, 852
15. Elman, J.L. et al. (1996) Rethinking Innateness: A Connectionist
Perspective on Development, MIT Press
16. Meisel, J. (2011) First and Second Language Acquisition, Cam-
bridge University Press
Outstanding Questions
What operating principles are there
besides SIMPLEST MERGE (yielding hierar-
chical, structure-preserving structure
without linear order) and MINIMAL SEARCH
[79_TD$DIFF](a domain-general condition of minimal
computation that restricts application
of rules of agreement and displace-
ment to strictly local domains and min-
imal structural[16_TD$DIFF] distance[80_TD$DIFF])?
What can we nd out about the neural
organization underlying higher-order
computation of merge-based hierar-
chical structure of language and what
are its evolutionary roots? Concentrat-
ing on the basic property, how does
the discontinuity t the known evolu-
tionary facts and evolutionary theory?
What is the precise division of labor
between domain-general and
domain-specic learning systems that
enter into the explanation of learnability
and evolvability of natural language?
How does the Strong Minimalist Thesis
the conjecture that, optimally, UG
reduces to the simplest computational
principles that operate in accordance
with conditions of computational ef-
ciency enhance the prospects of
explaining the emergence and learning
of human language, permitting acqui-
sition of rich languages from poor
inputs (poverty of stimulus)?
How can we attain a better under-
standing of the mind-dependent
nature, development, and evolutionary
origins of the word-like elements
(atoms) of human language that enter
into core computational operations of
language, yielding its basic property[54_TD$DIFF]?
What is the role of morphosyntactic
features in identifying phrases of exo-
centric constructions, that is, phrases
not containing a head capable of
uniquely identifying them, and demar-
cating minimal domains of computa-
tion? How do these features function
in the language architecture?
[81_TD$DIFF]If an improved understanding of the
sources of complexity, diversity, and
malleability of languages helps us
explain their signicance for the exter-
nalization process[17_TD$DIFF], which linearization
principles and strategies govern the
externalization of the syntactic prod-
ucts generated by the basic combina-
torial operation of language[82_TD$DIFF]?
14 Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy
TICS 1501 No. of Pages 15
17. Moro, A. (2014) On the similarity between syntax and actions.
Trends Cogn. Sci. 18, 109110
18. Chomsky, N. (1959) On certain formal properties of grammars.
Inform. Control 2, 137167
19. Watumull, J. et al. (2014) On recursion. Front. Psychol. 4, 17
20. Lobina, D.J. (2011) A running back; and forth: a review of
Recursion and Human Language.Biolinguistics 5, 151169
21. Church, A. (1936) An unsolvable problem of elementary number
theory. Am. J. Math. 58, 345363
22. Gödel, K. (1986) On undecidable propositions of formal mathe-
matical systems. In Kurt Gödel: Collected Works Vol. I: Publica-
tions 19291936 (Feferman, S. et al., eds), pp. 346371, Oxford
University Press
23. Turing, A.M. (1936) On computable numbers, with an application
to the Entscheidungsproblem. Proc. Lond. Math. Soc. 42,
24. Kleene, S.C. (1936) General recursive functions of natural num-
bers. Math. Ann. 112, 727742
25. Chomsky, N. (1966) Cartesian Linguistics, Harper & Row
26. Bloomeld, L. (1933) Language, Holt
27. Hauser, M.D. et al. (2014) The mystery of language evolution.
Front. Psychol. 5, 401
28. Arregui, K. and Nevins, A. (2012) Morphotactics: Basque Auxil-
iaries and the Structure of Spellout, Springer
29. Giannakidou, A. (2011) Negative polarity and positive polarity:
licensing, variation, and compositionality. In The Handbook of
Natural Language Meaning (2nd edn) (von Heisinger, K. et al.,
eds), pp. 16601712, Mouton de Gruyter
30. Kuno, M. (2008) Negation, focus, and negative concord in Japa-
nese. Toronto Work. Pap. Ling. 28, 195211
31. Reinhart, T. (1981) Denite NP-anaphora and c-command
domains. Ling. Inq. 12, 605635
32. Baker, M. (2003) Language differences and language design.
Trends Cogn. Sci. 7, 349353
33. Musso, M. et al. (2003) Broca's area and the language instinct.
Nat. Neurosci. 6, 774781
34. Smith, N. and Tsimpli, I. (1995) The Mind of a Savant: Language
Learning and Modularity, Oxford University Press
35. Vasishth, S. et al. (2008) Processing polarity: how the ungram-
matical intrudes on the grammatical. Cogn. Sci. 32, 685712
36. Ross, J.R. (1986) Innite Syntax! Ablex
37. Chomsky, N. (1981) Lectures on Government and Binding, Foris
38. Taraldsen, K.T. (1980) The theoretical interpretation of a class of
marked extractions. In The Theory of Markedness in Generative
Grammar (Belletti, A. et al., eds), pp. 475516, Scuola Normale
Superiore di Pisa
39. Engdahl, E. (1983) Parasitic gaps. Ling. Philos. 6, 534
40. Chomsky, N. (1982) Some Concepts and Consequences of the
Theory of Government and Binding (LI Monograph 6), MIT Press
41. Huybregts, M.A.C. and van Riemsdijk, H.C. (1985) Parasitic gaps
and ATB. In Proceedings of the NELS XV Conference, pp. 168
187, GSLA, University of Massachusetts
42. Hoekstra, T. and Bennis, H. (1984) Gaps and parasitic gaps. Ling.
Rev. 4, 2987
43. Bech, G. (1952) Über das Niederländische Adverbialpronomen er.
Travaux du Cercle Linguistique de Copenhague 8, 532
44. Bennis, H. (1986) Gaps and Dummies, Foris Publications
45. Huybregts, M.A.C. (1991) Clitics. In Grammatische Analyse
(Model, J., ed.), pp. 279330, Foris Publications
46. Chomsky, N. et al. (1956) On accent and juncture in English. In For
Roman Jakobson: Essays on the Occasion of his Sixtieth Birthday
(Halle, M. et al., eds), pp. 6580, Mouton
47. Chomsky, N. and Halle, M. (1968) The Sound Pattern of English,
Harper and Row
48. Liberman, M. and Prince, A. (1977) On stress and linguistic
rhythm. Ling. Inq. 8, 249336
49. Lakoff, G. (1970) Global rules. Language 46, 627639
50. Chomsky, N. and Lasnik, H. (1978) A remark on contraction. Ling.
Inq. 9, 268274
51. Aoun, J. and Lightfoot, D. (1984) Government and contraction.
Ling. Inq. 15, 465473
52. Chomsky, N. (2013) What kind of creatures are we? The Dewey
Lectures. Lecture I: What is language? Lecture II: What can we
understand? J. Philos. 12, 645700
53. Hauser, M.D. (1997) The Evolution of Communication, MIT Press
54. Crystal, D. (1992) An Encyclopedic Dictionary of Language and
Languages, Blackwell
55. Hurford, J. (2008) The evolution of human communication and
language. In Sociobiology of Communication: An Interdisciplinary
Perspective (DEttorre, P. and Hughes, D., eds), pp. 249264,
Oxford University Press
56. Hauser, M. et al. (2002) The faculty of language: What is it, who has
it, and how did it evolve? Science 298, 15691579
57. Berwick, R.C. et al. (2013) Evolution, brain, and the nature of
language. Trends Cogn. Sci. 17, 8998
58. Bolhuis, J.J. and Everaert, M.B.H. (2013) Birdsong, Speech and
Language. Exploring the Evolution of Mind and Brain, MIT Press
59. Chomsky, N. (2013) Notes on denotation and denoting. In From
Grammar to Meaning: The Spontaneous Logicality of Language
(Caponigro, I. and Cecchetto, C., eds), pp. 3846, Cambridge
University Press
60. Berwick, R.C. (2010) All you need is merge: a biolinguistic opera in
two acts. In Biolinguistic Approaches to Language Evolution (Di
Sciullo, A.M. and Boeckx, C., eds), pp. 461491, Oxford Univer-
sity Press
61. Bolhuis, J.J. et al. (2014) How could language have evolved? PLoS
Biol. 12, e1001934
62. Berwick, R.C. and Chomsky, N. (2016) Why Only Us: Language
and Evolution, MIT Press
63. Chomsky, N. (2005) Three factors in language design. Ling. Inq.
36, 122
64. Crain, S. (2012) The Emergence of Meaning, Cambridge University
65. Lidz, J.and Gagliardi, A. (2015)How nature meets nurture:Universal
Grammar and statistical learning. Annu. Rev. Ling. 1, 333353
66. Medina, T.N. et al. (2011) How words can and cannot be learned
by observation. Proc. Natl. Acad. Sci. U.S.A. 108, 90149019
67. Gleitman, L. and Landau, B. (2012) Every child an isolate: Nature's
experiments in language learning. In Rich Languages from Poor
Inputs (Piattelli-Palmarini, M. and Berwick, R.C., eds), pp. 91104,
Oxford University Press
68. Yang, C. (2016) Negative knowledge from positive evidence.
Language 92, in press
69. Berwick, R.C. et al. (2011) Poverty of the stimulus revisited. Cogn.
Sci. 35, 12071242
70. Chomsky, N. (2011) Language and other cognitive systems. What
is special about language? Lang. Learn. Dev. 7, 263278
71. Frank, S. et al. (2012) How hierarchical is language use? Proc. R.
Soc. B 297, 45224531
72. Reali, F. and Christiansen, M.H. (2005) Uncovering the richness of
the stimulus: structure dependence and indirect statistical evi-
dence. Cogn. Sci. 29, 10071028
73. Chomsky, N. (1965) Aspects of the Theory of Syntax, MIT Press
74. Perfors, A. et al. (2011) Poverty of the stimulus: a rational
approach. Cognition 118, 306338
Trends in Cognitive Sciences, Month Year, Vol. xx, No. yy 15
... With respect to production, there are no examples demonstrating that animals generate hierarchically structured vocal sequences with changes in meaning 5,[17][18][19][20] where a sequence is broadly defined as the production of two or more different types of single vocal units within a short time of each other 21 . ...
... For example, the expression the man drew a boy with a pencil can either mean that a man used a pencil to draw a boy, or that he drew a boy who's holding a pencil. This is often taken as evidence that linear ordering alone is insufficient to capture language 18,92,93 . ...
Full-text available
The origins of human language remains a major question in evolutionary science. Unique to human language is the capacity to flexibly recombine a limited sound set into words and hierarchical sequences, generating endlessly new sentences. In contrast, sequence production of other animals appears limited, stunting meaning generation potential. However, studies have rarely quantified flexibility and structure of vocal sequence production across the whole repertoire. Here, we used such an approach to examine the structure of vocal sequences in chimpanzees, known to combine calls used singly into longer sequences. Focusing on the structure of vocal sequences, we analysed 4826 recordings of 46 wild adult chimpanzees from Taï National Park. Chimpanzees produced 390 unique vocal sequences. Most vocal units emitted singly were also emitted in two-unit sequences (bigrams), which in turn were embedded into three-unit sequences (trigrams). Bigrams showed positional and transitional regularities within trigrams with certain bigrams predictably occurring in either head or tail positions in trigrams, and predictably co-occurring with specific other units. From a purely structural perspective, the capacity to organize single units into structured sequences offers a versatile system potentially suitable for expansive meaning generation. Further research must show to what extent these structural sequences signal predictable meanings. An analysis of the structural complexity of vocal sequences in chimpanzees in the Taï National Park reveal that single vocal units are combined into numerous structured sequences with adjacency dependencies between units.
... A key idea is that derived relational responding is viewed as a generalised operant (i.e., learned behaviour) in RFT (but note that it is not accepted as such by all behaviour analytic psychologists). It should be acknowledged that RFT's position that relational reasoning is a core feature of cognition, is also largely consistent (see McLoughlin et al., 2020a, for a thorough overview) with some recent developments and perspectives in education (e.g., Alexander, 2019;Goldwater & Schalk, 2016), linguistics (Everaert et al., 2015;Goldwater, 2017), and cognitive psychology (Goldwater et al., 2018). ...
... Recent research on language modeling has found that a language model that incorporates explicit notions of hierarchical syntactic structures, Recurrent Neural Network Grammars (Dyer et al., 2016, RNNGs), achieves better perplexity (Dyer et al., 2016;Kim et al., 2019), systematic syntactic generalization Hu et al., 2020), and correlation with human brain signals (Hale et al., 2018) than a comparable LSTM (Hochreiter and Schmidhuber, 1997) language model, which processes the input string in a sequential and non-hierarchical fashion. This is in agreement with what the linguistic theory suggests as the right model of language (Chomsky, 1957;Everaert et al., 2015). To that end, the syntactic inductive bias of RNNGs is derived from a recursive syntactic composition operation, where the fixed-size vector representation of each constituent is computed by a recursive, learned function of the vector representations of its children (Goller and Kuchler, 1996;Socher et al., 2011Socher et al., , 2013aLe and Zuidema, 2015;Tai et al., 2015;Dyer et al., 2015;Bowman et al., 2016, inter alia). ...
Full-text available
Transformer language models that are trained on vast amounts of data have achieved remarkable success at various NLP benchmarks. Intriguingly, this success is achieved by models that lack an explicit modeling of hierarchical syntactic structures, which were hypothesized by decades of linguistic research to be necessary for good generalization. This naturally leaves a question: to what extent can we further improve the performance of Transformer language models, through an inductive bias that encourages the model to explain the data through the lens of recursive syntactic compositions? Although the benefits of modeling recursive syntax have been shown at the small data and model scales, it remains an open question whether -- and to what extent -- a similar design principle is still beneficial in the case of powerful Transformer language models that work well at scale. To answer these questions, we introduce Transformer Grammars -- a novel class of Transformer language models that combine: (i) the expressive power, scalability, and strong performance of Transformers, and (ii) recursive syntactic compositions, which here are implemented through a special attention mask. We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics, in addition to sentence-level language modeling perplexity. Nevertheless, we find that the recursive syntactic composition bottleneck harms perplexity on document-level modeling, providing evidence that a different kind of memory mechanism -- that works independently of syntactic structures -- plays an important role in the processing of long-form text.
... Advocates of RFT suggest that relating stimuli based on their symbolic properties is central to general cognitive ability (McLoughlin, Tyndall, & Pereira, 2020a). This proposal, that relational reasoning is a core feature of cognition, is broadly consistent with recent findings from within cognitive psychology (Goldwater et al., 2018), linguistics (Everaert et al., 2015;Goldwater, 2017), and education (Alexander, 2019;Goldwater & Schalk, 2016). ...
Full-text available
Online interventions promoted to enhance cognitive ability hold great appeal for their potential positive impact in social, employment, and educational domains. Cognitive training programs have, thus far, not been shown to influence performance on tests of general cognitive aptitude. Strengthening Mental Abilities with Relational Training (SMART) is an online program which claims to raise Intelligence Quotient (IQ). This systematic review and meta-analysis evaluates the effect of SMART on indices of cognitive aptitude and academic performance. The review protocol was registered at PROSPERO (CRD42019132404). A systematic literature search of bibliographic databases (ERIC, PsycINFO, PubMed, Applied Social Sciences Index and Abstracts, Scopus, Proquest Psychology) identified five studies (N = 195) that met the criterion for inclusion. Risk of bias was assessed using the Cochrane Collaboration Risk of Bias ‘RoB 2’ tool. Overall, there was a moderate overall impact of SMART on measures of nonverbal IQ (g = 0.57, 95% CI [0.24, 0.89]). There was insufficient evidence to determine the impact of SMART on any other domain. All studies included in the review were judged to be at a high risk of bias for their primary outcome. Despite the methodological limitations of published studies to date, these initial findings suggest that a large-scale study of SMART is warranted.
... It has long been argued that the semantic interpretation of phrases and sentences is linked to this hierarchical constituent structure (e.g. Chomsky, 1957;Everaert et al., 2015;Heim & Kratzer, 1998;Jackendoff, 1972;Partee, 1975;Pinker, 1999). That is, syntactic operations are defined over hierarchical structure rather than linear order (i.e. they are structuredependent; Chomsky, 1957), and semantic dependencies (like scope, the fact that two applies to blue balls rather than balls alone 1 ) directly follow from such hierarchically organised constituent structure. ...
Full-text available
It has long been recognised that phrases and sentences are organised hierarchically, but many computational models of language treat them as sequences of words without computing constituent structure. Against this background, we conducted two experiments which showed that participants interpret ambiguous noun phrases, such as second blue ball, in terms of their abstract hierarchical structure rather than their linear surface order. When a neural network model was tested on this task, it could simulate such “hierarchical” behaviour. However, when we changed the training data such that they were not entirely unambiguous anymore, the model stopped generalising in a human-like way. It did not systematically generalise to novel items, and when it was trained on ambiguous trials, it strongly favoured the linear interpretation. We argue that these models should be endowed with a bias to make generalisations over hierarchical structure in order to be cognitively adequate models of human language.
This chapter discusses some examples of behaviors studied in what is known as behavioral embryology. It argues that many concepts and findings concerning behavioral development in animals have had important consequences for the study of human development. Learning is often interpreted as being part of behavioral development; this is because learning, like other developmental processes, involves changes in the mechanisms underlying behavior over time. Imprinting has often been regarded as a showcase for behavioral development in general. K. Lorenz suggested that the main consequence of imprinting is the determination of adult sexual preferences. In sexual imprinting, as in filial imprinting, it has been found that the strong claim of irreversibility cannot be maintained, whereas there is evidence that supports the weak version of the claim. There are some interesting parallels between birdsong learning and the development of speech and language in human infants. Sensitive periods are an important characteristic of developing behavior.
Full-text available
It is important for the theory of knowledge to understand the factors involved in the generation of the capacities of knowledge. In the history of modern philosophy, knowledge is generally held to originate in either one or two sources, and the debates about these sources between philosophers have concerned their existence, or legitimacy. Furthermore, some philosophers have advocated scepticism about the human capacity to understand the origins of knowledge altogether. However, the developmental aspects of knowledge have received relatively little attention both by past philosophers and in current philosophical discussions. This dissertation provides a historical approach to this developmental problem of knowledge by interpreting the developmental theories of knowledge of Maine de Biran (1766–1824) and Henri Bergson (18591941) from the perspective of a theory of the ‘generative factors of knowledge.’ It first studies the philosophies of Maine de Biran and Bergson separately and then brings together and compares the metaphilosophical aims drawn from these philosophers. The dissertation’s novel analysis, provided by its theory and structure, has far-reaching consequences. From a wide point of view, it fills in considerable scholarly gaps and provides great opportunities for future research in the study of the history of philosophy. From more specific points of view, it provides its most decisive contributions in such metaphysical and epistemological topics as the nature of causality, self-generated activity, the role of effort in knowing and learning, the complementary relationship between philosophy and science, and the non-conceptual basis of knowledge.
A traditional view on sentence comprehension holds that the listener parses linguistic input using hierarchical syntactic rules. Recently, physiological evidence for such a claim has been provided by Ding et al.’s (2016) MEG study that demonstrated, using a frequency-tagging paradigm, that regularly occurring syntactic constituents were spontaneously tracked by listeners. Even more recently, this study’s results have been challenged as artifactual by Frank & Yang (2018) who successfully re-created Ding’s results using a distributional semantic vector model that relied exclusively on lexical information and did not appeal to any hierarchical syntactic representations. The current MEG study was designed to dissociate the two interpretations of Ding et al.’s results. Taking advantage of the morphological richness of Russian, we constructed two types of sentences of different syntactic structure; critically, this was achieved by manipulating a single affix on one of the words while all other lexical roots and affixes in the sentence were kept the same. In Experiment 1, we successfully verified the intuition that due to almost complete lexical overlap the two types of sentences should yield the same activity pattern according to Frank and Yang’s (2018) lexico-semantic model. In Experiment 2, we recorded Russian listeners’ MEG activity while they listened to the two types of sentences. Contradicting the hierarchical syntactic account and consistent with the lexico-semantic one, we observed no difference across the conditions in the way participants tracked the stimuli properties. Corroborated by other recent evidence, our findings show that peaks interpreted by Ding et al. as reflecting higher-level syntactic constituency may stem from non-syntactic factors.
Full-text available
Drawing upon the work of Carol Chomsky, this chapter focuses on two specific issues. The first concerns the robustness of language acquisition to variability in learners' access to input that appear crucial to the function being acquired, as seen from language studies in people who became both deaf and blind during infancy. The second concerns the abilities of children to reconstruct the meanings of sentences with covert structure, as in Carol Chomsky's landmark studies of whether blindfolded dolls might be hard to see. These two themes exemplify the general problem known as 'the poverty of the stimulus'; in the present case, how humans reconstruct linguistic form and meaning from the blatantly inadequate information offered in their usable environment. © editorial matter and organization Massimo Piattelli-Palmarini and Robert C. Berwick 2013 © the chapters their several authors 2013. All rights reserved.
The bulk of evidence concerning the syntax-phonology interface shows an influence of syntax on phonology. The domain structure for sentence-level phonological and phonetic phenomena, which forms part of the surface phonological representation (PR) of the sentence, is defined through an interaction of two types of constraints: syntax-prosodic structure interface constraints, which call for certain properties of the surface syntactic representation of the sentence (PF) to be reflected in domain structure in PR, and prosodic structure markedness constraints, which call for the surface prosodic structure to display patterns of unmarked prosodic structure. The effects of prosodic markedness constraints argue against direct access theories, which see phonological phenomena as defined directly on the surface syntax. Distinguishing PF and PR raises the question whether PF is input to the phonological component, with PR the output, as in standard models of generative grammar, or whether there may be mutual influence. Current models of grammar would countenance effects in the other direction, with the possibility of phonological principles constraining the range of acceptable surface syntactic representations, and research is beginning to explore this area.
In their now classic introduction to the foundations of semantic theory, Chierchia and McConnell-Ginet (1990) observe that “denotation might constitute the fundamental semantic relation” if it is possible, as they argue, to extend the elementary case of a proper name to other expressions, perhaps “to expressions of any kind whatsoever.” In the elementary case, a name like Pavarotti “refers to or denotes its bearer (the popular singer)”; and generally, “from a denotational point of view, symbols stand for objects.” This core notion – the referentialist doctrine – is standard, as indicated even in the titles of some of the founding works on these topics in the early days of contemporary linguistic semantics over half a century ago: Words and Things (Brown 1958), Word and Object (Quine 1960). And of course the referentialist doctrine has much deeper roots. Chierchia and McConnell-Ginet argue that it should serve a dual function, leading to explanation of the two fundamental questions of semantics: the link between symbols and their information content, the “aboutness of language,” its connection to the external world; and “language as a social activity.” To illustrate the critical role of denotation beyond the elementary case, Chierchia and McConnell-Ginet provide examples of language use in which noun phrases “besides proper names seem to derive their significance or semantic power from their reference.” In these cases, “an act or demonstration” individuates the reference of the expression “in our perceptual space” – e.g., the expression “this” in an utterance of “this is yellow.” And we would not “understand the meaning of the NPs in these [cases] if we didn’t know what they referred to.” Accordingly, “the notion of reference appears to be a fundamental component of what the NPs in question mean.”.
Why can’t we say the asleep cat? There is a class of adjectives in English, all of which start with a schwa (e.g. afraid, alone, asleep, away, etc.), that cannot be used attributively in a prenominal position. A frequently invoked strategy for the acquisition of such negative constraints in language is to use indirect negative evidence. For instance, if the learner consistently observes paraphrases such as the cat that is asleep, then the conspicuous absence of the asleep cat may be a clue for its ungrammaticality (Boyd & Goldberg 2011). This article provides formal and quantitative evidence from child-directed English data to show that such learning strategies are untenable. However, the child can rely on positive data to establish the distributional similarities between this apparently idiosyncratic class of adjectives and locative particles (e.g. here, over, out, etc.) and prepositional phrases. With the use of an independently motivated principle of generalization (Yang 2005), the ungrammaticality of attributive usage can be effectively extended to the adjectives in question.