ArticlePDF Available

The Evolution of the Narrow Faculty of Language: The skeptical view and a Reasonable Conjecture

Authors:

Abstract

Enard et al. (2002) date the last mutation in FOXP2, the gene implicated in some central aspects of human language-crucially including contextsensitive syntax- to ca. 120,000 before the present (B.P.). Traits in modern humans that arguably presuppose the narrow faculty of language can be independently dated to ca. 95,000 to 60,000 B.P. This leaves few generations in between to evolve full-blown syntax. We argue that such a rapid transition cannot be easily accounted for in customary «adaptive» evolutionary terms. We propose that only a brain reorganization, of a drastic and sudden sort, could have given raise to such a state of affairs. Two facts suggest that this reorganization may have been epidemic in origin. First, recent evolutionary accounts for the emergence of broad systems in organisms point in the direction of «horizontal» transmission of nucleic material, often of viral origin (e.g. the origin of the Adaptive Immune System); in situations of this sort, evolutionary rapidity is a consequence of boosting relevant numbers by having not individuals, but entire populations, carry relevantly mutated genetic material. Second, when the formal properties behind context-sensitivity in grammar (e.g. as displayed in syntactic displacement) are studied in Minimalist terms, a surprising parallelism surfaces with the workings of the Adaptive Immunity. Thus we discuss an «immune syntax» scenario that relates these two facts and is also informative both about the putative role of FOXP2 in grammar and the workings of syntax as studied in the Minimalist system.
MPP&JU for Lingue e Linguaggio
1
[IN PRESS in
Lingue e Linguaggio
Special issue on Language Evolution,
Delfitto, D, S. Scalise and G. Graffi (Eds), to appear in 2005]
The Evolution of the Narrow Faculty of Language:
The skeptical view and a Reasonable Conjecture
Massimo Piattelli-Palmarini (University of Arizona)
Juan Uriagereka (Universities of Maryland and The Basque Country)
ABSTRACT
Enard et al. 2002 date the last mutation in FOXP2, the gene implicated in some central
aspects of human language –crucially including context-sensitive syntax– to ca. 120,000
before the present (B.P.). Traits in modern humans that arguably presuppose the narrow
faculty of language can be independently dated to ca. 95,000 to 60,000 B.P. This leaves
few generations in between to evolve full-blown syntax. We argue that such a rapid
transition cannot be easily accounted for in customary ‘adaptive’ evolutionary terms. We
propose that only a brain reorganization, of a drastic and sudden sort, could have given
raise to such a state of affairs. Two facts suggest that this reorganization may have been
epidemic in origin. First, recent evolutionary accounts for the emergence of broad systems
in organisms point in the direction of ‘horizontal’ transmission of nucleic material, often of
viral origin (e.g. the origin of the Adaptive Immune System); in situations of this sort,
evolutionary rapidity is a consequence of boosting relevant numbers by having not
individuals, but entire populations, carry relevantly mutated genetic material. Second,
when the formal properties behind context-sensitivity in grammar (e.g. as displayed in
syntactic displacement) are studied in Minimalist terms, a surprising parallelism surfaces
with the workings of the Adaptive Immunity. Thus we discuss an ‘immune syntax’
scenario that relates these two facts and is also informative both about the putative role of
FOXP2 in grammar and the workings of syntax as studied in the Minimalist system.
1. Perspective
We have explained elsewhere our reasons to be skeptical about merely adaptationist
approaches to the evolution of language (Piattelli-Palmarini 1989, 1998; Uriagereka 1998).
In effect, it is because no such proposal has said much about the baseline syntactic
conditions we outline in (1) below. This paper takes such conditions as given, in some
appropriate form whose ultimate detail is immaterial here; furthermore, we also take it as
given that evolutionary accounts that provide no account for these central findings are at
best only marginally relevant. We remain largely skeptical about the possibility of
addressing that concern, but we would like to sketch at least the sort of hypothesis that we
would consider plausible in this regard. Although they will be further discussed in more
detail in the body of the paper, two of its evolutionary assumptions need a brief focus: the
much discussed Science piece by Hauser, Chomsky & Fitch, 2002 (hereinafter HCF) and
the unfolding of the FOXP2 gene saga (Hauser, Chomsky & Fitch, 2002).
MPP&JU for Lingue e Linguaggio
2
1. 1. The Baseline
We take the following to be the main results of the linguistic theory to which we adhere,
namely generative grammar:
(1) a. Syntactic dependencies arrange themselves in terms of formal objects that can
be quite high within the universal formal characterization of the power of
grammars, the Chomsky Hierarchy (specifically, the grammars of natural
languages have to be context-sensitive, see infra).
b. Context-sensitive dependencies are severely limited by locality considerations
(i.e. they hold between close neighbors in a sentence, the precise technical
characterization of such closeness being a central component of the theory), and
possibly also uniformity and last resort considerations (see infra).
c. Semantic dependencies are strictly determined by syntactic dependencies and
obey the mapping principles of Full Interpretation, Strict Compositionality and
Conservativity (to which we return in what follows).
d. Language acquisition involves the fixation of a small fixed number of open
syntactic options (mostly binary in kind), normally of a morphological sort;
morphological variation, of the sort patent across languages, in many instances
involves uninterpretable formatives.
We don’t intend this list as exhaustive (we are not going, for instance, into phonological
findings); we simply present it as both relevant to our concerns and close to definitive1.
The ideas in (1) are interrelated; a specific proposal that attempts to unify these
sorts of ideas into a coherent whole is referred to as the Minimalist Program (MP). One
strict version of this program, the Strong Minimalist Thesis (SMT), goes as far as
proposing that it is virtually essential (conceptually necessary) for the language faculty that
conditions of this sort should exist, so much so that the faculty would not have emerged in
significantly different conditions. SMT is a very sensible way of conceiving of the results
in (1) as a unified whole, especially inasmuch as it bears on the possible evolution of this
faculty within the human species. With regards to this matter, SMT holds that the rationale
behind the unification of the facts in (1) into a theory responds mainly (though not
exclusively) to one of the three ‘forces’ gearing evolution: the structural elegance that the
physico-chemical channel imposes on evolving life (the other two ‘forces’ being historical
accident and natural selection). Our paper is an attempt to provide an account of the
possible evolutionary path of the language faculty within the SMT perspective. At the same
time, we insist on regarding (1) as factual, independently of the SMT.
1. 2. Two Dimensions within the Faculty of Language
The methodological centerpiece in HCF is the separation of two dimensions within the
faculty of language: the broad language faculty FLB and the narrow language faculty
FLN, whose characteristic property is the use of recursion (not presupposed for FLB). The
authors rightly emphasize that every plausible reconstruction of the evolution of language
(in the traditional sense) must maintain these components distinct and work out separate,
albeit related, stories for each. The conclusion of that paper is, in a nutshell, that there are
components of FLB that are not unique to our species, and/or not unique to language.
MPP&JU for Lingue e Linguaggio
3
Some plausible candidates for an adaptive functional reconstruction of some of the traits of
FLB are not just conceded, but explicitly suggested. As for FLN, however, HCF insists that
the most plausible hypothesis still is that it is unique to humans and not the product of any
adaptation2. In light of this useful demarcation between FLN and FLB, our skepticism
bears mainly upon the adaptive reconstructions that (often implicitly) involve FLN.
1. 3. FOXP2 and an Outline of our Proposal
While studying the now famous KE family, a so-called language gene was found.
Corresponding headlines triggered all sorts of reactions, and much unfortunate hype. Still,
when things settled down, several remarkable claims were made, not just in terms of our
understanding of the genetic bases of human language, but moreover its evolution as well.
We will give our own evaluation of this event in section 2. Suffice it to say that the human
version of the gene appears to have mutated at the verge of the last human diaspora, less
than 200,000 years ago. That is a fascinating period, for as both the archeological record
and recent genetic studies on mitochondrial DNA and the Y chromosome converge to
show, our species colonized virtually the entire planet within a few millennia, plausibly
presupposing modern language for the task. What the last mutation in FOXP2 has arguably
given us is a boundary condition to the onset of that period, if it predates FLN and it is
somehow involved in its evolution. Prior to this mutation we had individuals who must
have been not too dissimilar in their verbal behavior from present-day individuals with
deficiencies in FOXP2 (except of course in having a community around with en evolved
FLN). Surely that is more than chimps can do, suggesting that, arguably unlike other apes,
even our Neanderthal cousins and other sapiens already had FLB. That said, though, there
obviously is a later change to account for, and we take that to be FLN.
In this paper we have something to contribute to the emergence that FOXP2
seems to have precipitated: both a condition and a conjecture. In effect, the condition is
that the numbers don’t really add up for a gradual change from the estimated time of
FOXP2’s mutation to the complete diffusion of the present allele. At best, we are talking
about a span of a (few) thousand human generations, too fast for a standard adaptive
evolutionary story to hold, under most imaginable scenarios. We develop this point in
section 3. We then provide a line of reasoning as to how a drastic evolutionary event may
have taken place, in particular co-opting mechanisms from a well attested evolutionary
saga, that of the Immune System which, in our view, shows remarkable similarities with
FLN. This tentative idea is offered as a positive suggestion in section 5, after having
presented some detailed syntactic presuppositions in section 4. Section 2 is a critical review
of relevant FOXP2 facts; readers familiar with them can go directly to section 3, which
zooms in on an interesting proposal regarding the plausible role of FOXP2 in grammar.
2. FOXP2: lights and shadows
Although a genetic foundation to the heterogeneous syndrome labeled Specific Language
Impairment (SLI) is generally presumed, it was until recently impossible to ascertain. This
trend started to change when one particular family was encountered for which, unlike in
others where SLI is inherited in ways so complicated that they are impossible to
disentangle, the disorder was found to pass on with the text-book signature of a single gene
defect. Half of the members in three generations of the KE family have serious difficulties
MPP&JU for Lingue e Linguaggio
4
articulating speech, but are otherwise mostly normal. By studying its inheritance pattern,
the speech problem was discovered to be transmitted as an autosomal dominant monogenic
trait. The affected individuals present characteristic SLI symptoms, plus obvious motor
impairments. It was Simon Fischer at Anthony Monaco’s laboratory who set to hunt for
some of the genetic bases behind this apparent manifestation of SLI. Lai et al. 2001 had
already tracked the gene down to a small interval of a region in chromosome 7 (7q31)
containing a few dozen genes, when an unrelated individual, CS, was brought to Monaco’s
lab with a language impairment of the SLI sort. CS’s problems were associated with a
chromosomal translocation3. One of the elements involved was none other than
chromosome 7, and the affected region was close to the interval previously identified for
the KE family. Then the gene Lai et al. 2001 dubbed FOXP2 was shown to be disrupted by
the translocation breakpoint in CS (Fisher, Lai & Monaco, 2003).
2. 1. The facts
Let’s start with what can be regarded as uncontroversial about FOXP2:
(2) a. Never had linguistics and genetics been brought into such intimate and detailed
contact; the discovery that a well attested linguistic deficit may be caused by a
single point mutation in a gene connects linguistics with the ‘atom’ of genetics.
b. The gene in question belongs to a sub-family of a large family of genes,
previously identified as controlling DNA replication; it is present throughout
mammals and well beyond, and its sequence is highly conserved (only three point
mutations between humans and mice, of which 2 between the chimps and us); it
controls the development of several distinct organs and functions.4
c. The point mutation present in KE family affected members is necessary and
sufficient to cause their particular language deficit (a rare state of affairs in the
field of genetics); this mutation affects only one of the two alleles of this gene
(maternal copy); the remaining, intact, allele suffices to compensate for the other
functions of FOXP2, not for the brain anomaly that causes the language deficit.
d. The brain regions in which this gene is expressed in the fetus and in the adult
are: the caudate nucleus, the cerebellum, the thalamus, and the medulla (inferior
olivary nuclei). A detailed tracking of the expression of the human gene and of
the mouse version during development shows remarkable parallelism between
these species, substantiating the high conservativeness of the gene sequence over
120 million-plus years of evolution.
e. The attested consequences of the mutation found in affected KE family
members include: oro-facial dispraxia, impaired receptive grammar (comprehension
of relative clauses), impaired non-word repetition (especially complex non-words),
inflectional and derivational morphology (production), regular and irregular past
tense production, lexical decision (identification of words versus non-words), non-
-word reading, object naming from drawings, memorizing the pairing of symbols
with arbitrarily assigned numbers.
As of today, the summary list of uncontroversial findings about FOXP2 stops here. As
soon as we begin to derive lessons from this set of facts, controversies flare up.
MPP&JU for Lingue e Linguaggio
5
2. 2. Controversies
The first controversy in order of time was about the specificity of the linguistic deficit.
Myrna Gopnik was the first psycholinguist to study the affected members of the KE
family; she thought she was witnessing a perfect case of language specificity, as in her
view normal or better than normal general intelligence was accompanied by a specific
morpho-syntactic deficit (Gopnik 1990, 1994). Neurologist Faraneh Vargha-Khadem and
her collaborators rejected this claim; according to them the main deficit was a mere
impediment of fine oral movements. Moreover they argued that affected members of the
KE family do not, in the end, show normal general intelligence, and other non-linguistic
cognitive tasks are also impaired in them (for recent reviews see Watkins, Dronkers &
Vargha-Khadem, 2002; Watkins et al. 2002; Lai et al. 2003)5.
The next controversy centers around what FOXP2 can tell us about the evolution
of language. After collecting several DNA sequences of the homologs of the FOXP2 gene
in various mammals (called FoxP2 –lowercase), and having then focused on the two point
mutations separating us from chimps, statistical methods have been applied to address two
questions: (i) what is the likely date of the last mutation in FOXP2, and (ii) whether it is
likely to have occurred randomly and spontaneously, or rather as the result of selective
pressures for language (Enard at al. 2002). Whereas the conclusion that the mutation in
point took place within the last 200,000 years seems solid, the claim that this resulted in a
selective sweep is less so. Two risky idealizations have been applied to the statistical
model: A constant size of the reproductive population and nonselective mating. If these
idealizations are relaxed, the statistical methods these authors have used to conclude that
there has been, indeed, intense selective pressure cease to support this conclusion (Robert
Berwick, p.c. and MIT Class Lecture 2003)6. To complicate things, as we see in sections
3.1 and 3.2, one thing is to show a rapid genetic change (a ‘sweep’) within a population,
and a different thing is to demonstrate that its bases are a standard adaptation.
A final difficulty that is relevant in the present context pertains to the striking
asymmetry between, on one side, the refined molecular-genetics techniques used in
identifying the FOXP2 gene and its products, and on the other side the imprecision of the
functional characterization, in humans, of the brain regions affected by the gene. Surely the
buck has to stop somewhere, and the localization of brain functions is a plausible terminus;
however, imprecision at this end (which we discuss in more detail immediately below)
unfortunately lends itself to wildly different interpretations of the data, depending on one’s
favorite conception of the relations between language and other functions.
2. 3. Which conception of language?
Surprisingly assuming that language production is rooted in –possibly consists of–
‘subvocal rehearsal’ of speech, and that that the rules of syntax are learned implicitly via
the construal of analogies between the articulation patterns of words7, researchers have
speculated that ‘the accompanying linguistic and grammatical impairments observed in the
KE family are secondary consequences of basic deficits in motor planning and sequencing’
(Lai et al. 2003, p. 2458). The point is far from anecdotal; after quoting other sources
discussing the role of motor circuits in learning and cognition, these same authors
conclude: ‘Our data are consistent with the emerging view that subcortical structures play a
MPP&JU for Lingue e Linguaggio
6
significant role in linguistic functioning’ (ibid)8. From the perspective of generative
grammar, these conclusions are very troublesome indeed –and we think also wrong.
PET and fMRI of the affected KE family members revealed abnormal bilateral
development of the caudate nucleus, part of the basal ganglia, and abnormalities in the
cerebellum –the arch-locus of motor control (Liegeois et al. 2003). As studies of unrelated
patients with acquired lesions (not congenital abnormalities) had revealed a role of the
cerebellum in procedural learning (detection/generation of event sequences) and linguistic
functions, the core of the KE family deficit has been identified as an impairment in oro-
facial motor control, tracking and producing sequential movements, detecting rhythm, and
related notions. If, against all the findings of generative grammar, one is persuaded that
linguistic and grammatical ‘skills’, notably including the learning of syntactic ‘rules’, is
just an application of those cognitive capacities to the organization of the domain of speech
sounds, then there is nothing else to explain. The affected members of the KE family, in
that view of things, constitute a perfect experiment in nature, allowing us to see through the
basic assembly of normal language learning, by selective deprivation of those very basics.
But in fact, even within this narrow perspective, the alleged intimate relation between
motor control and language via FOXP2 is, at best, uncertain.
To begin with, the association between language and motor control is radically
down-sized by other neurological findings: ‘Although the mammalian basal ganglia have
long been implicated in motor behavior, it is generally recognized that the behavioral
functions of this subcortical group of structures are not exclusively motoric in nature.
Extensive evidence now indicates a role for the basal ganglia, in particular the dorsal
striatum, in learning and memory. One prominent hypothesis is that this brain region
mediates a form of learning in which stimulus-response (S-R) associations or habits are
incrementally acquired. Support for this hypothesis is provided by numerous
neurobehavioral studies in different mammalian species, including . . . humans’ (Packard
and Knowlton 2002). Since the most evident target of the FOXP2 mutation is the caudate
nucleus, part of the basal ganglia, the possibility just mentioned makes it an open question
whether the ensuing linguistic deficit represents evidence for a causal link between motor
control and language. Significantly, a reduced size of the caudate nucleus, and its intrinsic
pathology, have been linked also with abnormalities in working memory, a point that we
return to in section 3.5 (one compatible with the generative perspective).
An interesting revision of the ‘meaning’ of a hyper-classic association between
brain localization and function concerns the cerebellum, one of the other brain areas that
are a target of the FOXP2 mutation. Lesions to this domain cause a variety of problems
(gait ataxia, dysmetria, dysarthia, abnormal eye movements, vertigo; for non-motor
impairments caused by cerebellar damage, see Fiez et al. 1992). The cerebellum is also
involved in timing, rhythm and the sequencing of events. Significantly to our purposes
here, Gebhart, Petersen and Thach 2002 report that the cerebellum plays a role in language
independently of movement, even independently of ‘mental movement’ (sic)9. To cut a long
story short, previous PET evidence had shown that in normal adults, appropriate noun-verb
pairings (generated aloud) strongly excite the left prefrontal regions (the canonical
MPP&JU for Lingue e Linguaggio
7
language and ‘decision’ ones). This is not surprising; totally unexpected, however, was that
the task strongly excited also the right lateral regions of the cerebellum.
Conclusion: damage to the right posterior-lateral region of the cerebellum
impairs verb generation, blocks insight into the errors, and any tendency to self-correct
even when presented with already seen nouns from a list10. These authors insist that motor
control has nothing to do with this task, nor does ‘mental movement’ in the sense just
discussed. So when tested selectively and specifically, even sub-loci of the canonical focal
seat of motor control in the brain reveal functional specializations that seem totally
different from motor control. This is a model story, we think. When the analyses of brain
correlates of cognitive functions end up detecting a brain locus, we cannot be sure, as of
today, of what direct implications that finding has. This is why, in spite of the (implicitly
and indirectly) anti-generativist conclusions offered by some of the protagonists of the
FOXP2 saga, we stick to our guns. In fact we think that in doing so we can contribute to a
better interpretation of both the genetic and the neurological findings.
3. SLI, FOXP2, and the Grammar
Tentatively, though more positively, let’s now turn to the specific role that FOXP2 may be
playing in grammar and its evolution. Although the claim that FOXP2 is causally related to
SLI is a popular one, and it has historical truth to it (the KE family was initially diagnosed
with SLI), it is actually not obvious. SLI refers to a relatively broad class of language
deficits covering disorders that cannot be attributed to retardation, deafness, autism or
general cognitive impairments. Cases of SLI have been studied in a considerable variety of
languages and places. The identification of the most characteristic symptoms of SLI is not
unanimously agreed upon, and several specific impairments have been reported11. For our
logic in this paper, however, the diagnosis of SLI is not as relevant as whatever the affected
members of the KE family have, for there we know for a virtual fact that FOXP2 is
implicated. Moreover, it doesn’t seem unreasonable to infer that hominids in possession of
a version of FOXP2 prior to the last mutation may have had a linguistic performance not
unlike that of affected KE family members (again, without a relevantly evolved linguistic
community around). This seems more central to our concerns on the evolution of FLN than
whether SLI or something else (and more specific) is what linguists should be focusing on.
3. 1. FOXP2 and language evolution
Consider the dating of FOXP2’s last mutation. By analyzing changes in amino acid coding
and patterns of nucleotide polymorphism, and given a computational model of a randomly-
mating population, Enard et al. (2002) can determine that fixation of the beneficial allele
was 95% likely to have occurred no more than 120,000 years B.P., and virtually certain to
have occurred no earlier than 200,000 B.P. Those numbers are interesting with respect to
the issue of modern human expansion and how much it may have been affected by
language, as per familiar (controversial) arguments by Richard Klein (1999, 2003).
Enard et al. rule out an expansion of a population which already carried the allele as
being responsible for its proliferation; they rather take the gene to be selected for
immediately when it arose, spreading rapidly. This is based on a detailed genetic
comparison of other genes with the human version of FOXP2, which Enard et al. found to
MPP&JU for Lingue e Linguaggio
8
present no human polymorphisms (individuals from diverse ethnic groups were analyzed).
Indeed, most other genes in the human nuclear genome (as FOXP2 is) show no evidence
for a putative, alternative expansion. In addition, when examining the frequencies of the
silent polymorphism –fortuitously linked to the selected allele– they discovered that the
new allele is at higher-than-random frequency; this is expected of a selective sweep, not of
a mere, slow and gradual expansion of a population with the relevant allele.
That sweep poses a serious, indeed old, puzzle, eloquently expressed by Alec
Christensen in the Palanth Forum (8/15/02), given Enard et. al’s assumption (of the sort
criticized above) that the relevant allele allows humans to control the vocal articulations
needed for spoken language.‘[S]uch a mutation could be very adaptive to individuals in a
population that already had some form of language: those who can communicate a greater
quantity and quality of information to each other will be better off than those who cannot.’
In other words, for a mutation to somehow enhance a behavior in such a way that the
behavior distributes rapidly, the behavior in question must have existed in some form. But
this is problematic. To begin with, all indications are that the functioning of the language
faculty is spread over several genes, indeed over various chromosomes (Mueller, in press
mentions sites on chromosomes 19, 16, 13, and possibly 2 that could be significant). Even
if all other relevant genes had been previously affected –we don’t know– a non-trivial issue
is how the FOXP2 mutation can affect a behavior already controlled by so many different
factors in such a way that it entails a selective sweep.
To put that in perspective, Haesler et al. (2004) compare the activity of FOXP2 in
the brains of birds and their closest living relative, crocodiles. The birds did not have the
specific mutation discussed above; however, just as in the case of humans, the gene is
active in their basal ganglia, just before the bird begins to change its songs. The initial
studies were conducted on vocal-learning species like parakeets, but a more recent study by
Teramitsu et al. (2004) shows similar results with zebra finches. Scharff and White (2004)
extend the comparative analysis of genetic components of vocal learning to humans, zebra
finches, hummingbirds and budgerigars, concluding that the striking conservation of the
expressed regions of FoxP2 and the overall similarity of the patterns of brain expression in
reptilian, avian and mammalian brains ‘indicates that FoxP2 has a more general role than
to specifically enable vocal learning’ (Scharff and White 2004: 342). The crucial role of a
‘permissive environment’ for vocal learning is stressed. Brain loci ascertained to be linked
to experience-dependent brain plasticity are among the sites of FoxP2 expression12.
These results could mean three different things: (i) We have no clue as to what is
going on with the gene we are discussing; the role of FOXP2 in humans (FoxP2 in other
species) is ‘permissive’ in a wide sense (it cooperates in complex ways with other genes);
(ii) activation of FOXP2 may have little to do with human language directly, constituting a
modulation factor interspersed with many other; or (iii) song-learning birds (see Zeigler
and Marler 2004) have a ‘permissive environment’ conducive to vocal learning, but of
course do not have proto-language, and hence all other relevant changes precipitated by
FoxP2 fall in a niche without any conceptual consequences of the linguistic sort. We are
inclined to suspect (i) or (ii), but (iii) is not senseless and it emphasizes how refined an
interplay of genetic action and prior architecture a complex behavior can presuppose.
MPP&JU for Lingue e Linguaggio
9
3. 2. A Matter of Time
Ultimately the problem is that, for a selective sweep to be relevant to linguistic concerns, it
must entail a rapid systemic (brain) reorganization, and the more intricate that
reorganization needs to be, the less likely that the sweep will be a result of adaptive forces
at play –which require lots of time. Enard et al. tell us with 95% accuracy that the FOXP2
mutant appears 120,000 years B.P. It is interesting to consider how much longer after that
modern language (with the sorts of context-sensitive properties in (1)) must have been in
place. A variety of converging factors suggest that only a few thousand years after the
relevant FOXP2 mutation things look much like they do today within the language faculty.
Just as a key genetic element provides us with a lower boundary for the emergence
of FLN, so too ancient behaviors can shed some light over when FLN must have been
usable. In this respect, the Chomsky Hierarchy of grammars alluded to in (1a) amounts to
an interesting lens to observe the archeological record of human artifacts and interactions
with an eye on the complexity that the underlying behaviors must have possessed.
Formally, aside from purely isolated, such behaviors can be sequential (walking a
distance), associative (building a hierarchical structure), or even manipulative (threading a
needle to stitch two pieces together), each level of complexity involving ‘more
computational mind’. By finding the fossilized remains of such behaviors we can at least
surmise the computational complexity in the minds of their executors. That doesn’t
guarantee a linguistic correlate, but it makes one wonder, especially if the surmised
behavior turns out to emerge after the FOXP2 mutation, and it is unique to humans.
Consider knotting, whose formal base must be context-sensitive in the Chomsky
Hierarchy sense –i. e., requiring operational memory to take place (more on these notions
shortly)– as Mount (1989) shows13. Such behaviors are not uncontroversially attested until
ca. 27,000 B.P., when evidence of weaving is clear (Soffer 2000). However, one can infer
knots long before. One line of argumentation comes from carefully perforated minute
ornamental pieces (beads, shells, teeth), interpreted as having been either sown to clothes
or hung to the body. Given how labor-intensive these items are (see Figure I, (a) and (b)) –
and, as Alison Brooks points out through personal communication, what they represent for
social status within a large clan– they must have been firmly secured to their bearers, much
as corresponding jewelry is nowadays. This presupposes some form of knotting. Such
systematically perforated pieces are not found with any sapiens other than ourselves, their
latest dating in Africa going back 75,000 years. A second line of argumentation comes
from the secure mounting of points as (bone) harpoon (see Figure I, (c)) and (stone) arrow
heads, as well as their flinging by way of arches or throwing aids –microliths are useless as
projectiles unless propelled with critical force. This again presupposes serious knotting.
Relevant pieces are dated by Brooks et al. (forthcoming) at ca. 90,000 B.P. Aside from
inferred knottings, unmistakably notational pieces (see Figure I, (a)) can be found, also,
whose complex elaboration must have involved context-sensitive computation, and which
have been dated to as early as 77,000 B.P. (Henshilwood et al, 2002).
FIGURE HERE (See the pdf version)
MPP&JU for Lingue e Linguaggio
10
What we find impressive about these simple points regarding context-sensitive
dependencies is their coincidence in a period that goes from 95,000 to 75,000 B.P. As
McBrearty and Brooks (2000) show, several milestones appear in Africa around that time:
sophisticated bone tools, barbed points for harpooning, mining, even notational pieces.
These, coupled with the fact that current estimates place the last human diaspora between
98,000 B.P. and 74,000 B.P. –precisely the relevant period– conspires to suggest that
Homo sapiens sapiens had achieved modern language by that time, with its full-blown
context-sensitive potential. This leaves some 25,000 years of interval between the dated
FOXP2 mutation and what looks like the latest viable onset for FLN –at any rate a capacity
to knot and engage in tasks that presuppose complex operational memory. 25,000 years at
best is biologically a short time, somewhere around 1000-2000 human generations.
To complicate things, various studies argue for a population bottleneck (first
proposed by Haigh and Maynard Smith 1972) consistent with a population size of around
10,000 individuals somewhere in the 200,000 to 30,000 B.P. period. The causes for such a
bottleneck remain unknown, but assuming (non-obviously) that it was operative already at
the time of the FOXP2 mutation, it certainly affects the combinatorial possibilities of a
plausible evolutionary scenario: a bottleneck would reduce the competition space for
individuals with the new FOXP2 allele (although population bottlenecks also increase
genetic drift, its rate being inversely proportional to the now reduced population size).
At the same time, the evolutionary pattern of FOXP2 is effectively recessive in its
phenotypic consequence for language. As noted in (2c), both the paternal and the maternal
copies of FOXP2 are necessary in order not to fall into the deficit that the KE family
exhibits, while an intact paternal copy is sufficient to maintain the other functions of the
gene –an in fact dominant one, in that respect. To recall, the mutation in the KE family was
found in the maternal copy14. Enard et al. (2002), in turn, showed that the relevant
chimpanzee protein carries one difference from the mouse and two from the human
protein15. If the recessive pattern found in the KE family was also relevant in whatever was
the ancient bio-chemical event in FOXP2 that concerns us, complications emerge for any
standard adaptive scenario, bottleneck-based or otherwise: a recessive pattern leads to the
marginalization of the relevant allele unless the selective pressure for it is extraordinary
even so, diffusion within a population takes a very long time16.
Precisely how long that is turns out to be hard to calculate, as it requires complex
statistical and computational modeling, if realistic assumptions about clan life-styles in the
Upper Paleolithic are to be taken earnestly. What is clear, however, is that the less time one
has for a standard adaptationist scenario like the one Enard et al. entertain, the more
troubling that such issues turn out to be as ‘insular’ mating conditions or recessive patterns
in the phenotypic transmission that sets the stage for competition, both of which clearly
reduce combinatorial possibilities for the relevant allele to prosper within the population17.
Thus the nutshell of this paper is that, if the relevant allele of FOXP2 is implicated
as it seems to be, we must provide a mechanism for the evolution of FLN that makes it
possible in ‘flashing’ terms, more of the sort witnessed in the spread of infectious disease
(cf. the AIDS pandemic) than in what we expect from competitive adaptation to an
MPP&JU for Lingue e Linguaggio
11
environment. Inasmuch as they are just talking about oro-facial adjustments that the new
FOXP2 allele provides which would presumably enhance communication, Enard et al. do
not worry about those time constraints just raised. They either do not consider it relevant to
evolve the sorts of FLN properties we sketched in (1), or else they must be taking said
properties as part already of the proto-language repertoire, in which case the FOXP2
mutation would not have to be affecting them. However, there is virtually no evidence that,
behaviorally at least, the syntax of protolanguage users was even recursive. In the next
sections we develop this point, starting from HCF’s claim that FLN’s characteristic
property is the use of recursion.
3. 3. A Genuinely Linguistic Point
Let’s begin by addressing the apparent paradox posed by the following reasoning:
(3) a. If a given syntax σ is to be mapped to a compositional semantics λ, σ must
involve at least (any equivalent of) context-free phrase structure.
b. One cannot have context-free phrase structure without also having recursion.
c. The kinds of thought-processes that all sapiens (including archaic ones)
exhibited were clearly highly inferential, and therefore (at least) compositional.
d. So all sapiens must have enjoyed recursion.
We admit to having no troubles with the logic in (3) as such; however, two crucial, yet a
priori non-obvious, assumptions need to be carefully clarified.
(3a) is unobjectionable, as it correlates context-free syntax with compositional
structure in predicate calculus (involving sister dependencies, in one popular version of the
claim). So too is (3b) unobjectionable; whatever rule system allows us to rewrite X as X
…Y… will also allow us (unless we prevent this explicitly) to rewrite X as X …X… (or
any rule combination where X reappears), which obviously constitutes a recursive
procedure. (3c) is a factual claim. Let’s grant first of all that all sapiens sub-species (not
just sapiens sapiens) did engage in highly inferential behaviors18, and furthermore, as per
familiar arguments of the sort in Fodor (2000), that this sort of behavior presupposes some
kind of Language of Thought. Whatever that system is, it must at least be compositional19.
The issue is posed by (3d): shouldn’t all sapiens then allow recursion?
HCF assumes not; it takes the use of standard recursion to be a property of FLN.
We think this is right. But how can it be, given (3)? The key to understanding that this is
only an apparent paradox stems from recognizing the familiar separation between
competence (or language knowledge) and performance (or language use). This is the first
central assumption we need to make in resolving the apparent paradox behind (3). Having
the capacity for recursion is necessary, but not sufficient to use this capacity in actual
communication. We develop this point next, clarifying in particular the sort of memory that
is necessary to deploy in performance the various levels in the Chomsky Hierarchy.
Notice to start with that, competence-wise, the non-terminals involved in
context-free grammar presuppose a lexicon, or list of pairings between terminal and non-
terminal elements. This logically separates a procedural system of dynamic instructions to
MPP&JU for Lingue e Linguaggio
12
assemble structures from a set of static correspondences that must be committed to
memory, or the capacity to carry information in time (be it derivational or real time). No
such lexicon is necessary in a simpler Markovian system, which can organize anything
(symbolic or not) into a list. In such a system, there is no meaningful distinction between
the procedural system and its memory: the memory is in effect the system. In contrast, a
rewrite rule matches a meta-linguistic non-terminal with a sequence of grammatical
elements20. This meta-linguistic object has to be a typing, thus a representational formal
object21, stored somewhere in a mind that uses it. Such storage is what implies a memory of
the ‘long-term’ sort: a grammar must have access to this type of information, or it won’t be
able to determine, as a system of knowledge, familiar constituency relations.
In addition, for the relevant system just described to be usable in performance by
speakers, it must presuppose some ‘short-term’ memory. Consider concretely how some
set of categories is assembled from the lexicon stored to long-term memory every time we
want to use (produce or understand) a sentence, and in particular how this usable set
interacts with the type-identity formally implied in recursive rule systems22. In current
thinking, a set along these lines is often postulated in actual derivations, indeed as a
hypothetical competence object called numeration from which derivations are taken to be
assembled. A derivation is not directly assembled from the lexicon stored to long-term
memory; rather a short-term memory object is constructed in a derivational workspace
from which various syntactic operations take stock. As Drury (2005) shows, in any simple
version of a set like this there simply could not be the same type of object twice (or more
times); for example, there is no trivial way of distinguishing within such a set the two the’s
involved in the numeration yielding the man saw the tiger23.
A grammatical system is limited to two sorts of procedures to introduce the same
type of object more than once in a derivation. It can code each instance of the type with
some formal tokenization mechanism of a first order sort (in the example just mentioned,
somehow signaling a thei vs. a different thej by way of different morphemes); alternatively,
the derivation could separate each token by explicitly subdividing the relevant numeration,
so that each token-to-be-distinguished falls into a different subset where it is actually
appropriately unique, and thus identifiable (in our example, we need to somehow subdivide
the relevant numeration in two sub-numerations, {…the, {… the}}, so that each token the
is thus distinguished by belonging to different sets). That is a second order solution.
Neither alternative seems easy to evolve, despite the fact that such evolution
tends to be summarily presupposed. We discuss this matter next, but we anticipate that it is
the second central assumption in resolving the apparent paradox created around (3). A
psychologically real grammar that has not evolved a tokenization mechanism may be
advanced as a competence system, but essentially useless in performative interactions. One
can in principle have a brilliant system of thought underlying highly inferential behaviors,
and still be trapped in a solipsistic linguistic mind. Evolving an apparently minor
tokenization mechanism in such circumstances could have drastic societal consequences.
MPP&JU for Lingue e Linguaggio
13
3. 4. The Role of Uninterpretable Morphology
The evolution of a formal tokenization mechanism is our core technical concern in this
paper. In modern language, uninterpretable morphology, whose role in grammar we return
to, can certainly perform the tokenization trick, for instance for case-marked noun
phrases24. But for such a first order mechanism to take us out of our impasse we obviously
need to have evolved uninterpretable morphology. Judging from the mastery of precisely
that sort of task in the affected members of the KE family, it is not clear that proto-
language had (control of) uninterpretable morphology. Evolution of the alternative (higher
order) tokenization involving set partition seems even more difficult, as it presupposes
some mechanism whereby not just short-term memory objects, but in fact structured ones,
can be created to build derivations from. In actual languages, this form of, in effect,
‘subordination’ seems to be largely parasitic on the morphological case-system just
mentioned: thus, for instance, subordinating dependencies across languages typically
involve and affect Case hierarchies, in ways that are not obvious for loser coordinating
associations (see Uriagereka 2005 on this). If so, again unless something like
uninterpretable morphology has managed to evolve, the neat recursive system that a
compositional FLB would arguably allow as a competence system as per the reasoning in
(3) above would, in practice, be useless in communicating an articulated thought.
The system in point would allow speakers to construct bracketings of the sort
[… [X …] [Y…] [Z…] …] only if –independently of the derivational procedure itself– it
happens to be the case that X Y Z, for X, Y and Z non-terminal types (e.g. V, N, etc.).
Thus, hominids in possession of that knowledge of proto-language may have been limited
to such commentary as you beware or very very very nice! (which can be expressed
with Markovian devices, without invoking (relevantly identical) non-terminal types). But
they could not communicate simple structures like I want to go now, as their grammar
wouldn’t have a mechanism to distinguish two V’s within one articulated utterance. In
other words, for all intents and purposes this would be a grammar without usable
recursion to communicate anything of much substance with.
Again, that doesn’t mean humans in possession of this proto-language (and by
assumption a richer Language of Thought) couldn’t have compositional ideas. If one need
not articulate one’s thought to express it so that others parse or acquire it, one arguably has
more lee-way with the use of entirely personal heuristics for category tokenization
purposes25. In other words, as speakers of proto-language we may not be able to
communicate to our potential addressees that I believe the man killed the tiger, if we have
no way of coding separate D’s or separate V’s. But nothing obvious would prevent the
solipsistic idea –a thought as refined as one’s personal experience can be26.
Another way of expressing our point is that we can make the HCF view on
recursion compatible with Fodor’s ideas on structured thought. Even if, in all likelihood,
FLB presupposes such a thought, which in turn presupposes some form of recursion, this
need not mean that the system can use this available mechanism for communicative
purposes. For that, some conventionalization of types into tokens is necessary, precisely
because tokens are tokens of general types27. For A to know that A’s tokens are tokens also
of B’s types, and the fact that when A uses several such tokens, A is indeed using them as
MPP&JU for Lingue e Linguaggio
14
exemplars of the type, is as central as can be to communication between A and B in terms
of the human faculty of language. This faculty happens to use a communication channel
which is only capable of explicitly expressing one dimension of coding (speech in some
variant). Therefore it has no direct way of representing meta-linguistic objects other than
by way of linguistic objects (morphemes). It then seems natural that it should need to
evolve something like uninterpretable morphology to code tokens. If uninterpretable
morphology (using speech, but without corresponding lexical semantics to it) somehow
enters the system, it is not surprising that then the language faculty could, at least in
principle, find the way to use this coding mechanism in order to fulfill an important
structural role: more or less reliable communicable tokenization28.
The view of proto-language just presented (arguably capable of internal
recursion, thus at least a context-free competence system, but incapable of external means
to represent it, thus observably Markovian in its actual performance)29 is consistent with
the conception of FLB in HCF. But it seems central to Enard et al.’s claim as expressed in
their paper that these ideas be wrong, and in fact that –contra HCF and all available
evidence– proto-language should have been considerably more articulated than just
discussed, presumably already presenting entirely usable syntactic recursion and full
semantic compositionality, at least. Only then would it make sense to justify a selective
sweep involving modern language on the basis of mere phonetic components of grammar
that aid production/perception. For why would an oro-facial –at any rate motor–
improvement result in syntactic recursion and corresponding semantic compositionality?
3. 5. On Line Memory Limitations, Procedural Memory and FOXP2
In the previous section we concerned ourselves only with the demands emerging within
context-free systems, which need to determine which non-terminal to rewrite when, thus
generating constituents; for this sort of grammar, once a structure is generated it is not
crucial, or possible, to keep a record of the generation. In contrast, for context-sensitive
operations (e.g. transformations) to apply we must know which particular chunk to
manipulate, hence have access to derivational history. Thus a transformational grammar
needs more memory than a phrase-structure grammar –and of a different sort30. While the
phrase-structure grammar requires long-term memory of phrasal types and how they match
to given terminal elements (a lexicon), the transformational grammar needs that, and in
addition the short-term, on-line ability to record what took place in a particular derivation
(see Drury (2005) and Uriagereka (2005) on this).
In relation to on-line memory in context-sensitive grammars, Ullman and
Pierpont (forthcoming) (henceforth U&P) discusses a kind of memory often referred to as
an implicit memory system, as its processes and the sort of memories it involves are
normally inaccessible to consciousness; they refer to it as procedural memory. Since this
piece comes at the matter from the neuro-psychological side of things, U&P is interested in
correlating those areas of the brain that have been empirically determined to be involved in
procedural memory with those for which FOXP2 expression, or associate neuroimaging in
affected KE family members, have been ascertained. U&P is at pain to demonstrate how
SLI-style impairments are associated with dysfunctions of the basal ganglia, in particular
the caudate nucleus, and frontal cortex, especially Broca’s area. Crucially for the argument,
MPP&JU for Lingue e Linguaggio
15
frontal/basal-ganglia circuits play a core role in procedural memory. The paper is also
interested in proposing a positive way of resolving the tension arising with SLI analysis in
whether the basis of the syndrome –given the observable behavior– is purely linguistic,
motor or what. The brain structures that constitute the procedural system have been shown
to be implicated not only in motor and cognitive skills, but also grammar, lexical retrieval,
and also dynamic mental imagery, working memory, and rapid temporal processing. Still,
this is a specific system, and the memory it encodes differs from so-called declarative
memory, both in the brain areas the latter involves and the sort of memory it serves.
The conclusion in U&P that is most relevant to our purposes boils down to this:
‘[T]he [Declarative /Procedural] DP model predicts important associations between lexical
and declarative memory, and between aspects of grammar and procedural memory, as well
as dissociations between lexicon and grammar that parallel dissociations between the two
memory systems’ (p.11). We can adapt this statement very concretely to our own needs
and previous claims in this paper. In our view declarative memory should be required to
represent non-terminal categories in the lexicon (thus be crucial for the functioning of
context-free systems that depend on lexical types), whereas procedural memory should be
involved in the proper representation of grammatical tokens of those types, together with
the derivational context where they occur31.
It is that way that we plan to relate the role of FOXP2 in grammars, via its
association to procedural memory, to the crucial tokenization discussion in the previous
section. To anticipate: our expectation is for anything short of FOXP2’s present normal
workings (either in the FoxP2 variant or in later ‘anomalous’ mutations as in recently
studied pathologies) to result in shortages in procedural memory that will affect linguistic
behaviors demanding precise tokenization of linguistic types. The logic of the idea is that
both pre-sapiens users of proto-language and (some) present-day SLI patients would be in
the relevant state, alternative to a full-blown capacity for modern language.32 Of course,
dysphasic patients can hopefully compensate with treatment and help from society, but that
is a different issue, irrelevant to our point. And certainly more complex genetic interactions
(especially if FOXP2’s role is merely ‘permissive’, as it seems from our present
perspective) surely must play a role, but that too doesn’t affect the overall thesis.
4. Towards a Minimalist Analysis of NFL Evolution
To summarize matters up to this point, Enard et al. have provided us with a boundary
condition, but still up for grabs is precisely what evolved at that juncture. What seems clear
is that, prior to the last mutation in FOXP2, we must have had individuals with the
capabilities of some variety of dysphasia. However, Enard et al.’s claim that the
evolutionary cause for the rapid genetic diffusion of the useful FOXP2 allele is standard
adaptation for communication need not be granted without numerous qualifications, nor of
course that the role FOXP2 plays in language is essentially motor. In any case, for us to
solve the admittedly real puzzle posed by the genetic sweep (we use the neutral term not to
presuppose selection) that Enard et al. have unearthed, we must assume that odd behaviors
witnessed in the affected KE family members are either linguistic or parasitically so. To
put it in blunter terms associated to the U&P conjecture: (full blown?) procedural memory
MPP&JU for Lingue e Linguaggio
16
is associated to the FOXP2 mutation, and thus specifically human, moreover specifically
grammatical; this of course doesn’t preclude other uses that go beyond grammar 33.
Our strategy can then be summarized as follows: (A) we admit the overall
validity of the FOXP2 finding regarding whichever impairments the KE family has,
questioning superficial analysis of its role in the relevant behavioral output; (B) we
concede the basic argumentation behind the genetic sweep (not its adaptive basis) argued
for by Enard et al. regarding the very late mutation in FOXP2; and (C) we entertain and
extend the U&P conjecture regarding procedural memory and how this may be associated
to the proper workings of whatever FOXP2 regulates. The central issue is then to relate
(A), (B) and (C) so as to provide an understanding for the evolution of a FLN with the use
of recursion, and specifically the properties in (1).
With all of that explicitly said, in this section we want to achieve two results.
First we illustrate how a purely syntactic device sanctions a complex thought process of the
semantic sort, which arguably goes beyond standard syntactic (even linguistic)
considerations. Next, we want to admit that, even if one accepts that on-line derivational
memory is procedural in the U&P sense, and that this is fundamentally involved in context-
sensitivity, we must still address two concerns: (i) how or why uninterpretable morphology
is implicated in this syntactic phenomenon, and (ii) how the genetic sweep that Enard et al.
argued for was actually possible if, in the terms now being entertained, a proto-language
without usable derivational memory and a modern language with it are such different
entities –in terms of behavioral output, and hence putative ‘functions’– that a smooth
transition between them, precipitated by the adaptive forces of evolution putting ‘pressure’
on linguistic function, seems beside the point. (i) and (ii) should actually be intimately
related, not just for methodological reasons of elegance in the account; this is where we
think that SMT (the Strong Minimalist Thesis) can help us.
4. 1. Context-sensitivity and the Conservativity Thesis
Implicit in our discussion in the previous section is the following, rather standard, claim:
(4) Syntactically phrasal systems correlate with (strict) semantic compositionality.
Consider this in relation to a quantificational expression as in (5a), whose semantic
denotation can be proven to be expressible as in (5b):
(5) a. Gregor could burn most books.
b. Most books are such that Gregor could burn them.
A more technical way of saying (5b) is that when uttering (5a) we conceive of two sets or
pluralities: books and things Gregor could burn. The intersection of those two is such that
more than half of those are in the set of things that Gregor could in fact burn. But how,
assuming strict compositionality as in (4) (‘only syntactic sisters compose semantically’),
could we compute such a thought from (5a)? In strictly compositional terms, the plurality
of books that the computation ranges over is easy to compute from the plural books that the
determiner most introduces. But just how do we compute the set of things that Gregor
MPP&JU for Lingue e Linguaggio
17
could burn? And how does that combine with most books in strictly compositional terms?
Present wisdom suggests that, in a level of grammatical representation that is dubbed
Logical Form (LF), we obtain a structure similar to the paraphrase of (5a) in (5b):
(6) [Most books [Gregor could burn __]]
An abstract or virtual ‘topicalization’, customarily called Quantifier Raising (QR), is used
to solve the puzzle posed by quantification. In (6) the expression most books is a sister to
[Gregor could burn __]. The latter contains a ‘gap’ in object position, which is interpreted
as a logical variable in the semantics; technically, that bracketed object turns into the
semantic expression ‘Gregor burned x’, where x is like the unknown in an equation. This
unknown is what the quantifier (here, most books) is said to bind, thereby forcing the
interpretation of things that Gregor could burn as books. Crucially (6) satisfies strict
compositionality, but only after the rather dramatic context-sensitive gambit presupposed
in the structure. This dependence is key in providing a value for the relevant variable:
(7) Semantic variable binding involves a context-sensitive dependency.
(7) is taken to be part of a very important proposal in semantics, called the Conservativity
Thesis, which basically demands that the two arguments that a quantifier relates (in (5) ‘the
books’ and ‘the things that Gregor can burn’) be strictly ordered: the first argument or
restriction is mapped by way of context-free dependency with the quantifier, whereas the
second argument or scope is mapped by way of context-sensitive dependency.
To put that in perspective: We have just shown at least one sort of semantic
object, namely (binary) quantification, that is not expressible in merely context-free
syntactic terms if we assume the strict compositionality thesis. Simply put, one can’t do
binary quantification without context-sensitivity, any more than one can do phrasal
composition without context-free objects (thesis (4)).34 It may seem as if these are exotic
constructions, peripheral to ‘normal’ language; however, Herburger (2002) has argued in
great detail that they are central to every human proposition, by way of the phenomenon of
focalization (which in her terms involves the use of binary existential quantifiers over
events that satisfy, as all human quantifiers apparently do, the Conservativity Thesis).
For our evolutionary purposes here, we may simply ask ourselves just what
complex tool-dependent craft, gadget-specific task, measure-involved analysis (of living
dwellings, trade relations, collective hunting, warfare, etc.) of the sort presumed in our
species during the Paleolithic, doesn’t use some form of subtle restricted quantification?
Again, we don’t know for sure, but as amateur handymen and sailors, or parents having
shared with our children our knowledge of elaborate games, constructions, or social
manners, we can’t even imagine how to engage in what seem to us relevant processes
without quantificational tools, and hence context-sensitivity. Are these linguistics
behaviors? Who knows. Are they context-sensitive, in the computational sense? If they
involve ordered binary quantification, we can assure that. Are they parasitic on context-
sensitive dependencies in grammar? We would bet on that.
MPP&JU for Lingue e Linguaggio
18
Otherwise, we would have to assume one of two problematic theses. Possibility
number one has actually been advocated by Jackendoff over the years (see e.g. his (2002)):
there are two (or more) competences in our mind, each having evolved separately, one
purely syntactic, the other purely semantic; in fact, a third (not logically necessary)
‘correspondence’ factor must also have evolved. Unfortunately, a detailed analysis is
missing, in such terms, of how complex relations assumed by the entire field of semantics
are independent of complex relations of the sort assumed by the entire field of syntax (the
prize would be to work with existing relations and show how they cannot be dependent as
per theses along the lines in (7)). Possibility number two is what U&P explicitly assumes:
notions like ‘procedural memory’ underlay grammar and other ‘nonlinguistic’ tasks as
well. Logically, that may be, but for that view to be convincing, one would have to show
that, for instance, a procedural-memory process as complex as the one presupposed in the
analysis in the previous paragraphs is present in non-human species. That would win the
argument. This is in fact what Fitch and Hauser (2004) carefully sought to achieve with
monkeys –and failed (actually with much simpler tasks, involving mere context-freeness).
4. 2. The Immune Syntax
As to questions (i) and (ii) at the beginning of this section, we sketched a line of reasoning
in Piattelli-Palmarini and Uriagereka 2005). Here we merely summarize our main points,
with an eye on emphasizing aspects that need further work. We assume that the conception
of language offered by the Minimalist program is essentially correct, and more specifically
SMT. We will not go into details because we take theses of the sort in (1) above, which are
specifically relevant to our proposal, to be agreed-upon results from generative grammar at
large –or largely so. Again, where SMT helps, we think, is in understanding how to relate
those facts, in such a way that an evolutionary scenario becomes viable. Of course, if
someone is not interested in those facts, they won’t be interested in a proposal like ours
that admittedly attempts to lay the foundations for their possible evolution.
In addition, we assume a current version of the evolutionary paradigm that turns
out to be very much in synch with SMT, and which goes beyond the older, Neo-Darwinian
synthesis and focuses on what may be thought of as a ‘chromosomally based’ inheritance,
including epigenetic one. The latter addition is equivalent to acknowledging that
individuals inherit a structured genome, with regulatory modulations, not just a bare DNA
sequence. Without even attempting to be exhaustive, we will just say the following. Well-
known ‘exceptions’ exist to the canonical Mendelian paradigm of inheritance:
(8) a. Linkage (adjacent genes on the same chromosome may be inherited together,
regardless of their function) and recombination (occasional breaks in
chromosomes between gene loci that may result in exchange of segments)35.
b. Gene conversion (DNA stretches replaced non-reciprocally by stretches
transferred from another region).
c. Translocation (fragment from a chromosome that breaks and attaches to a non-
homologous chromosome) and mobile genetic elements (MGE), including
transposons and insertion sequences containing promoters and terminators that
increase or decrease the expression of downstream genes.
MPP&JU for Lingue e Linguaggio
19
d. Other inheritable alterations: mutations, reversions, deletions, frameshifts,
duplications, inversions, etc.
e. The transmission of local biochemical modifications of histones (methylations,
acetylations, etc.) that carry regulatory effects on gene expression and have been
ascertained to have repercussions up to at least two generations down the line.
None of these recently ascertained mechanisms amounts to a rejection of Mendelian
genetics, which remains globally valid; they are additional processes that involve
chromosome dynamics36. This is not the place to discuss these matters in any detail, but it
is important that we see all of this as entirely acceptable within current evolutionary
thinking in molecular biology. And of course we see no reason to exclude the possibility
that any or even all of these processes may have been involved in language evolution.
To repeat, we are interested in the mechanisms synthetically listed in (8) because
we take the evolution of FLN to be an evolutionarily very recent and rare event with
extremely interesting structural optimality characteristics (the SMT view). Surely it is not
the only rare event in evolutionary history; the evolution of the eukaryotic cell, the
endoskeleton, the adaptive immune system, or the locomotive structure of tetrapods are all
unique happenings that furthermore manifest decidedly optimal characteristics of the
purely structural sort. Then, again, those systems evolved billions (in eukaryotes) or at any
rate hundreds of millions of years ago, which allows for scores of speciation sub-events
within their evolutionary horizon, unlike in the case of FLN, which apparently first
appeared on the evolutionary picture less than 200,000 years B.P. But rare, structurally
organizing evolutionary events are well-documented, and their role in domains such as
body plans, physiological re-arrangements, or behavioral landscapes, is well established.
Thus, for instance, the adaptive immune system allowed for a radically new relation
between organisms and their environments, starting in embryogenesis; and the tetrapod
locomotive organization opened up the way for articulated terrestrial displacement
behaviors, from walking and running (eventually bipedal locomotion) to flying and
controlled gliding. It would be pointless to analyze such evolutionary landmarks with the
same narrow adaptive eye one may study the variations in the shape or strengths of
finches’ beaks or other text-book instances of adaptation. Rather than environment-driven
properties, modified through the slow interplay between context and fitness, these are
environment-changing events, the force of whose emergence is still poorly understood.
We take the adaptive immune system (AIS) to be a paradigmatic instance on
which to model the emergence of language. Current wisdom is that the AIS emerged
within the last three hundred million years, in the evolutionary saga that we share with jaw-
endowed fish. Up to that point, species only had the innate immune system, whose
flexibility and effectiveness is quite limited. This system has been conserved, however, in
all species that also possess AIS, including our own. Then something new, sudden,
additional and physiologically momentous happened, involving features, it seems, of the
sort discussed in (8), in particular transposable elements (TE). These are pervasive in the
genomes of bacteria, plants and animals. They replicate fast, efficiently and ubiquitously,
and it is common to find innumerable copies of them in one single genome (at least 50% in
humans). Myopic positive selective pressure at the basic DNA level is probably the normal
MPP&JU for Lingue e Linguaggio
20
TE scenario37, but in recent years selective pressure also at the host level has been inferred.
This involves the stable insertion of TEs that evolve new coding and/or regulatory
functions, with potentially dramatic evolutionary consequences.
In addition to the normal mode of vertical transmission from parent to offspring
within a species, and often preceding it in evolutionary time, TEs can sometimes move
laterally between species, a phenomenon known as horizontal transfer. Once a rare
horizontal transfer of genetic material has successfully taken place, ordinary (vertical)
transmission perpetuates the new genome. Agrawal et al. (1998) and Hiom et al. (1998)
have argued that the adaptive immune system originated from processes mediating
sequence-specific DNA recognition of well-defined recombination signals and DNA
cleavage next to these signals (for a review see Roth and Craig 1998). Since many kinds of
TEs also code for their own transcription enzymes, conversions from TE to virus and vice-
versa are possible. Under these conditions, such mechanism becomes biochemically
straightforward38. Once a momentous event of novel genetic insertion takes place, adaptive
selection pressure must be invoked to explain the fixation of the new trait (something
transparent in the AIS example). However, even when a successful insertion does take
place, the ‘older’ system (innate immunity, in this case) remains fully active (very rapid
and error-proof detection of environmental pathogens), and primes the more sophisticated
adaptive responses against them. We return to this important reflection below.
4.3. A TE ‘Signature’ in FOXP2?
Let’s recall that what we need in the linguistic instance is to justify both a rapidly
spreading genetic insertion and, in our view, to provide a role for morphological checking
in the process. The Immune Syntax (IS) scenario we sketched in (2005) presumes the
presence of TE activity in both the emergence and the subsequent spread and stabilization
of the genetic determinant(s) of FLN. Of course, the detailed knowledge we now possess
about the FOXP2 mutation makes it desirable that our IS hypothesis be at least compatible
with relevant findings. As it turns out, it is not crucial that specifically FOXP2 be involved
in our hypothesis, but since this is the only gene we actually know for sure to be implicated
in the language system, largely for concreteness we will articulate the proposal around it,
and in particular the putative ‘permissive’ role of FOXP2 in procedural memory.
Our specific claim is not difficult to test: the human version of FOXP2 should
differ significantly in its ‘TE profile’ from the chimp’s. We know that the two differ in a
region (exon 7) of the gene that expresses the protein, normally what is submitted to
sequencing and then to an analysis of cross-species homology (as Enard et al. did). But the
search for a ‘TE profile’ has to be broader, and include also the non-coding regions of the
gene (the introns). In order to examine the situation, let’s consider the main FOXP2 facts.
TABLE HERE (See the pdf version)
Table I: FOXP2 facts
Obviously, the bottom half of Table I is blank in the protein and function regions. This is
because it refers to introns, which are not expressed into proteins and, hence, do not have
MPP&JU for Lingue e Linguaggio
21
canonical functional correlates. Nonetheless, such stretches in other genes are known to
have subtle regulatory functions and they are especially significant to our concerns, for it is
there that we could in principle recognize the ‘signature’ of TEs (repeated sequences
possessing certain well known characteristics). A qualitative inspection of FOXP2 that we
ourselves conducted (with the aid of Donata Vercelli and Walter Klimecki of the
University of Arizona) unmistakably revealed numerous repeated sequences, compatible
with TE insertions into the gene. If more detailed, quantitative analysis confirms such TE
insertions –in themselves obviously not a peculiarity of FOPXP2 (see Kidwell 1994;
Kidwell and Lisch 2000)it would be important to see how this TE activity differs from
that revealed in the corresponding gene in chimpanzees. We already know that length
variations, disregarded in Enard et al. 2002, do exist for all species in their study, when it
comes to the two adjacent polyglutamine tracts in the gene, and furthermore that members
of the KE family, affected and non-affected ones, all show polymorphisms in one of those
tracts, although these do not appear to co-segretate with SLI. At any rate, since the
expressed part of FOXP2 is reported to be monomorphic (no differences between different
human populations), it would be crucial to verify whether this is also the case for the non-
coding regions (the coding regions represent only 2.3 % of the total)39.
In terms of sheer speed of diffusion, vertical scenarios depend on standard sexual
transmission for inheritable new variants to settle down; in contrast, horizontal scenarios of
the TE sort allow for the possibility of epidemic transmission, which immediately boosts
the numbers of affected individuals within a given generation. In effect, although the
likelihood of the TE scenario is itself a priori admittedly minuscule, the likelihood of it
entailing a genetic sweep without the necessity of adaptations that, in this instance, seem
miraculous is greatly enhanced. At the same time, it could also be that unrelated effects of
FOXP2 (as such a dominant gene, setting aside the recessive pattern that the language-
related phenotype implies), on brain development in nonlinguistic terms, somehow
eventually improved linguistic communication, as a later side effect. As we saw in (8),
there are several other mechanisms to explore. In short, a gene such as FOXP2, belonging
to an ubiquitous and large family that is primarily associated with DNA replication, and
that has developmental effects on several vital organs, may have been shaped by any one of
a variety of plausible adaptive pressures, and possibly by more than one, over time40.
It may appear extravagant that the adaptive immune system should be a
byproduct of the insertion of genes having primary jurisdiction over DNA transcription
into RNA, or that FOXP2 should belong to a family of genes having primary jurisdiction
over DNA fork-replication41. Genetically speaking, however, once these complex
dependencies come to be understood over evolutionary times, we witness straightforward
mechanical chains of events. Any feeling of extravagance that these well ascertained facts
may still engender is only due a lingering residue of traditional conceptions of biological
evolution, as exclusively driven by an alleged gradualistic cumulative adaptation,
transparently steered by the function one happens to be interested in.
5. Internal Reasons for the IS Proposal
Aside from the fact that it would allow us to enhance the sheer numbers of the relevant
FOXP2 mutant (at any rate, pre-condition for it or possible consequence) in possibly
MPP&JU for Lingue e Linguaggio
22
epidemic terms, there is one other reason why we think that it is well worth exploring the
IS proposal involving a TE scenario. This has to do with the abstract form of context
sensitive linguistic facts, and how much, we think, they resemble the workings of the
immune system, again at the relevant level of abstraction. For the sake of exposition, let’s
start with a purely formal analogy, possibly nothing more than a detailed metaphor. We
will then proceed to argue that it can in fact be more than a metaphor. At this stage,
however, we only are in a position to stress that the analogy is interesting, as it is
heuristically productive and certainly far from obvious.
5.1. The Immune Syntax and Adaptive Immunity: Formal Analogies
Let’s have a closer look at some central properties of natural language. We know that
languages vary in their morphological details (overt agreement paradigms, gender, number,
Case specifications, etc.). Part of this morphological richness is redundant and certainly
semantically uninterpretable. One of the far reaching hypotheses at the core of Chomsky’s
particular take on SMT is that all un-interpretable morphological features must be
eliminated, before the syntactic derivation reaches the interpretive interface. Indeed, they
have to be erased as soon as the computational system detects them. In other words, the
narrow (mechanical and local) system of syntactic derivation cannot face an
uninterpretable feature and do nothing about it. These features, in addition, are said to
trigger long-distance agreement and (in some circumstances) movement, under locality
constraints. Stepwise, mandatory, successive derivations proceed ‘cyclically’, a term that is
self-explanatory. Let’s see a simple paradigmatic example, and then make it explicit why it
resembles, formally at least, the immune system’s typical rapid reaction consisting in the
detection, recognition, processing and elimination of an antigen (in this order).
As we saw earlier, one of the central properties of natural language is movement
(or ‘displacement’): i.e. the mental computation responsible for the fact that certain
elements in a sentence can be interpreted as if they effectively were not in the position in
which they manifestly appear, but elsewhere. Formalized languages and computer
languages have no equivalent of movement in this sense; it is a peculiarity of natural
languages, and the reasons why it exists had been left suspended until the Minimalist
program. Aiming at an explanation of movement as a conceptually necessary consequence
of deeper and more compact design features of the (narrow) syntactic derivations,
Chomsky (1995) suggests that movement is the mandatory (deductively explicable)
consequence of the system’s need to implement a kind of ‘immunization’ against
uninterpretable morphology. The idea is to motivate transformational applications, so that
they never apply idly. Thus, movement is triggered by the need to eliminate (technically, in
the Minimalist terminology, ‘check’) uninterpretable features, without delay.
Consider a transformation executing a context-sensitive process of displacement
of the sort witnessed in (9), stemming from a source structure as in (10):
(9) Jack seems [t to be the leader of this group]
(10) * seems [Jack to be the leader of this group]
MPP&JU for Lingue e Linguaggio
23
In this instance the crucial feature in the target (of movement) is agreement in Tense (T),
and the source of the movement is Jack, which can appropriately check those
uninterpretable features in terms of its own interpretable ones. In the process, the source
element becomes accessible to the computation by way of Case valuation, which the target
renders. But this process is only half the story. The other half pertains to why Uriagereka
(1998) termed these features ‘viral’. In 1995, Chomsky implemented cyclicity effects by
way of stipulating that a process along the lines in (9) takes place immediately after the
computational system detects the presence of an uninterpretable (*) feature of the sort in
(10). In other words, Chomsky disallowed the possibility of facing a structure like (11) and
not doing anything to eliminate the uninterpretable feature in T until later in the derivation,
when the corresponding TP is embedded under some other element.
SEE THE PDF VERSION FOR ARROWS AND BOLDFACE
(11) [ [*]Tense-agr seem [ [Jack] [to be ...]]]
TARGET SOURCE
If it is this sort of morphology, originally termed ‘strong’ by Chomsky, that lies at the core
of uninterpretability and its transformational elimination, then it stands to reason that overt
displacement would be a side-effect of strong morphology.
We find it appealing to develop further this analogy with immune reactions, first
of all because one can liken the discussed immediacy to the sort of response the immune
system has upon the recognition of an environmental pathogen (in particular a virus).
Basically put, the computational system, in this view now commonly referred to as the
Virus Theory, detects an alien element (the uninterpretable feature) and puts its resources
to play in order to eliminate that element42.
But we can zoom in into the analogy even further. The linguistic analog of the
evolutionarily older innate immunity system is the detection, recognition and signaling
process. A scanning device can be programmed to detect any one of a finite set of manifest
uninterpretable morphemes and send out some alert signal upon recognition. A creature
endowed with a context-independent proto-language, with minimal memory requirements,
can succeed in doing that. The checking and elimination process that naturally ensues in
human languages is, however, of a different nature. It is context-sensitive and constrained
by optimal locality (cyclical) requirements (immediate execution, bottom-up search,
arguably no back-tracking), and perhaps even other family-related constraints. These
equally ‘mechanical’, but irreducibly higher level, computations are the ones, we suggest,
that ought to result from the viral insertion, if appropriately integrated into the system.
5.2. A Way to Go Beyond the Metaphor
The fundamental question for us, at this juncture, is whether the Virus Theory should be
so-called in merely metaphorical terms, or whether the biological foundations of language
and their evolution may share something substantial with (show a causal link to) the AIS.
As we just saw, the transformational process (context-sensitivity in general), does share
formal properties with adaptive immunity. Witness:
MPP&JU for Lingue e Linguaggio
24
(12) a. Into an organized elementary syntactic structure
b. … a foreign element (a virus) is introduced;
c. the system detects it and signals its presence;
d. the host (well-behaved syntactic structure) cannot integrate the virus (is at a loss
with uninterpretable morphological features).
e. Immune response: the virus has to be eliminated
f. … matching its category (syntactic antigen)
g. and the host must keep a valuation (operational memory) of the process.
We are trying to be totally mechanistic here, analyzing the matter from a purely formal
perspective. In transformational syntax, first we need an organized elementary syntactic
structure, at least of the context-free sort; that is analogous to a well-functioning entity
(12a). It is this structure that faces a foreign element (a virus) (12b); of course, we will
need to say something about what that virus is, in the case of syntax –but let’s put that
aside now. The host syntactic structure for some reason cannot integrate the virus, which is
analogous to viral detection (12d). This triggers the immune response: the virus must go
(12e). How? By way of appropriately matching it to a category of the relevant type (the
antigen) and then proceeding to the checking or elimination of the viral feature (12e).
In addition, the host obtains a new valuation in the process (12g), which can be
seen as a memory of what went on (this is what determiners further representational
consequences in the case of grammar). Detection, recognition and priming are the
properties of the innate immunity system, while operational memory, matching with
context-sensitivity (specialized cells wrapping themselves around the antigen, coating it
and presenting complementary molecular groups to the system) and elimination, are the
standard properties of the more refined AIS.
Here’s the rub, then –an intriguing possibility that we are the first to admit needs
further justification. Couldn’t the linguistic system simply have co-opted these tricks from
something the organism had already at its disposal?
Before we proceed any further in the exploration of such a hypothesis, two
elementary caveats are in order. First, every hypothesis of an adaptive origin of language
we are aware of (indeed any hypothesis of the adaptive origins of cognition) serves itself
quite liberally of co-optative mechanisms. The appeal of the sharing and generalizing of
perceptual, mnemonic or computational resources, or of some ‘transfer’ of successful
strategies across cognitive domains, or of the application or conversion of neural processes
to new utilizations, is strong and ubiquitous. Progressivist-adaptationists surely cannot
object to this component of our own story. Second, as to causal interactions linking states
of mind and the immune defenses (a link having obvious, albeit indirect, relevance to our
hypothesis), there is no doubt any more in the immunological profession that some
modulation of the speed and/or intensity of some immune reactions by brain states (maybe
moods, dispositions, attitudes towards the self and disease) does indeed occur, notably
when those reactions are mediated by hormones. This whole domain of inquiry
(psychoimmunology) is still awaiting rigorous scientific exploration, but suffice it here to
MPP&JU for Lingue e Linguaggio
25
us that a functional link between mind-brain states and immune reactions can be
considered a fact. The kind of evolutionary interaction we are suggesting is a quite
different one, but causal links between these two systems have been sufficiently
ascertained to make it, at least prima facie, plausible.
Concentrating now on our story, if usable (in fact sharable, across individuals)
context-sensitivity in mind implemented along the lines sketched in (12) is inaugurated
with the anatomically modern human species, then we have to (a) take present syntactic
understanding of the process very seriously, and (b) be aware of what else there is in
nature, and more particularly within brains, that the linguistic system could possibly be
using to satisfy its needs. Then the analogy between, on the one side, innate immunity vs.
adaptive immunity, and, on the other, proto-language vs. full language (with FLN),
suggests the idea of an evolutionary computational bottleneck. A bottleneck may never be
overcome by any given species, unless a genetic change occurs producing the right new
combination to get some members of the species out of the trap’ that got the species into
the bottleneck to start with. Just as the new components of the AIS interfaced with the pre-
existing innate immunity, so too (we suggest) the new computational powers given by FLN
somehow interfaced with the pre-existing context-independent computations allowed by a
conceptual system not too dissimilar from that of a chimp; comparable considerations
apply to the corresponding articulatory-perceptual system (also arguably not too dissimilar
from the chimp’s, at least in power, although of course it must have had a different vocal
specialization, given the extremely limited vocal repertoire of present-day chimps).
In a nutshell, the conceptual-interpretive system had an issue with the systematic
(uninterpretable) morphological redundancy offered by the articulatory-perceptual one.
(Note: tacitly, this is our virus.) The newly inserted system, in its turn, managed to co-opt
from the hands of Nature a strategy that the organism had available, transferring to the
domain of computations what it ‘knew’ already how to do with molecules and cells:
detection, recognition, matching, memorizing, checking and eliminating. Context-
sensitivity was, thus, systematically exploited within the interfacing system. The resulting
computational procedure was bound to be as mechanical as the one employed by the
immune system –a claim that generative grammar had been making all along, and that
SMT has now given more substance to.
5.3. From Proto-language to Language
The fast spreading and stabilization of the, in effect, insertion of FLN (or at any rate, some
crucial prerequisite for it) may have had relatively little to do with language as such, or the
improvement of communication, or for that matter anything adaptive in and of itself. It is
enough that whatever was involved was not maladaptive, and it spread throughout the
human population, just as it could have (and maybe something similar has, for all we
know) over a population of parakeets. What happened next is what is interesting, especially
provided that before the sapiens species had already achieved a proto-language that, for the
logic of our story to work, would have had context-free capabilities in competence, albeit
unusable ones in fact, as outlined in section 3. As we said earlier, when discussing U&P, it
would have been useful for these creatures to gain on-line, operational memory. But again,
being useful and taking place, especially rapidly by way of a genetic sweep, are entirely
MPP&JU for Lingue e Linguaggio
26
different things. In our view, a further step (at least) was necessary: something that resulted
in the relevant, possibly TE-induced, mutation in a suitable gene with a ‘permissive’ role in
operational memory (let’s say that it was FOXP2, just for concreteness).
Compatible with our approach are actually two different possibilities that we will
not be exploring here and now: (i) that this may have been a purely biochemical process
directly affecting operational memory, which somehow resulted in morphology
/tokenization as a result (call that the Memory First scenario); or (ii) that instead, again
somehow, the independent emergence of the tokenization mechanism that morphology
allows is actually the culprit for the indirect emergence and viable use of operational
memory (call that the Morphology First scenario). Each of these scenarios –aside from
specifying non-obvious intermediate steps– relates differently to the role of the relevant
gene (or associated factor) in each instance (be it FOXP2 or some other gene).
For (i) the crucial mutation and associated biological consequences predate any
relevant linguistic results; for (ii) the opposite is the case: the trigger manages to be
linguistic, and although the full-blown linguistic consequence could only be witnessed after
the ensuing (FOXP2?) mutation, we could in principle also find previous proto-language
features of the relevant sort (specifically, in morphology). Future research should help us
decide among these competing scenarios.
At any rate, whether a cause or a consequence, tokenization is central to our
picture. Even if one’s psychology allows one to distinguish types and tokens (as implied in
any context-free grammar), that doesn’t mean that one can communicate one’s tokens, for
the simple, yet crucial, reason that although types may be universal to a species (a matter of
knowledge) a fortiori tokens of those types have to be specific to the individuals that use
them, and indeed to the context of use. Thus even if two individuals share the same types as
per evolutionary convergence, they won’t understand each other’s token uses of those
types unless they have a further mechanism to express that. Uninterpretable morphology as
understood within SMT is one such mechanism, which ties up specific types to phrasal
contexts. If A can pin down B’s context-free direct objects, determiner complements, etc.,
A has some way of beginning to share B’s more complex intentional (quantificational)
mechanisms. Thus the evolution of uninterpretable morphology should help elucidate how
the successful tokenization of types in language use took place43.
But where ‘immune system dynamics’ –via the elimination of the morphological
tokens upon recognizing them– really seem to have gotten into new dimensions of thought
is as the originator of complex ‘warped’ context-sensitive structures that serve as the other
element central to quantificational thought. On one hand, as do other instances of what we
may broadly think as ‘Thematic structure’, one part of the quantification (the ‘restriction’)
only exploits semantic types. For this we don’t require context-sensitivity, all we need to
do is map the types that context-freeness allows to corresponding notions. But on the other
hand, a quantificational argument of the scope sort is literally created as a syntactic
derivation unfolds. This second-order syntactic process is inexpressible in the mere
hierarchical terms of the simpler, context-free, system of types that the restriction exploits.
MPP&JU for Lingue e Linguaggio
27
We have argued that without tokenization mechanisms we could still conceive
relevant syntactic types in our inner thoughts; however, we simply couldn’t even think the
scope aspect of a quantificational relation, solipsistically, without a tokenization process to
pin-down context-sensitive relations. Another way of saying this is that the Language of
Thought doesn’t have (binary) quantificational properties, unnecessary for
compositionality proper or inferential matters; those appear to be unique properties of
human LFs, based on modern language. It is when the language faculty reaches this new
milestone that we see the properties in (1) in their full-blown interaction, including the
restrictions that the system imposes on context-sensitive dependencies (1c) and the
correlation with (checking) morphology and corresponding structural variation across
languages (1d). Then we have FLN proper, which the SMT helps us understand.
That point is central to our project. So far as we know, no researcher has even
attempted to model the evolution of the universal Conservativity Thesis in semantics, let
alone relate it to such syntactic context-sensitive requirements as morphological checking.
In our approach, all of these are side-effects of the IS dynamics. Utterly dull mechanical
procedures –concretely the elimination of the morphological ‘virus’– ‘liberate’ context-
sensitive dependencies, and hence the possibility of expressing quantificational scope. The
fact that ensuing syntactic dependencies are limited by standard locality and similar
constraints is, again, expected if the dependency in question is not driven by effability
considerations, but rather by mechanical procedures of the sort that AIS uncontroversially
shows 44. This justifies those aspects of the SMT that involve structural economy: the
ensuing system is as elegant as the corresponding AIS is, for whatever reason involving
the, presently not well-understood, physico-chemical channel in either instance.
Crucially, also, because of the paucity, universality and systematicity of the
relevant across-context pairings involving operational memory, such a system satisfies
both standardly required learnability requirements and the broad plausibility demands of
naturalistic Minimalist considerations. Any alternative analysis of these matters, if one
indeed becomes available in the future, should be measured against our own in terms of
how appropriately these basic demands are met in each instance.
We see our future research as going in two directions, implicit already. On the
one hand, of course, the nuts and bolts of the IS scenario have to be developed –this was a
sketch to illustrate a conceptual point regarding evolutionary theories that could model the
emergence of the sorts of facts in (1). On the other, our proposal should and can be
falsified. For example, a better understanding of the genetic bases of language, and in
particular (until other equally credible and equally specific candidates show up) the
specific role FOXP2 plays (if it is indeed in operational memory, ascertaining how this is
achieved), may allow us to decide whether a ‘horizontal’ scenario of the sort we sketched
is the answer to the ‘time puzzle’ that the apparent recent mutation of FOXP2 poses.
Indeed, as we saw in (8) there are a variety of ‘horizontal’ mechanisms to
explore, we just concentrated on one that allows us to make formal comparisons with the
sorts of analysis that SMT provides; bio-linguistic research should help clarify this matter.
On the other hand, new studies on human origins from the broad perspective we have
MPP&JU for Lingue e Linguaggio
28
advocated, examining the inferable behavior and sheer genetics of Neanderthals and other
closely related sapiens (or other presently existent primates) should help us determine
whether, concretely, such species could (or can) communicate token-wise, or they were
confined to type-analysis beyond their own internal thoughts, as we expect. Both of these
lines of research ought to help us decide, also, between the ‘Memory First’ and
‘Morphology First’ scenarios, which are both compatible with our IS proposal and more
broadly the SMT, but which presuppose vastly different characterizations of our species
prior to the last relevant mutation.
6. Conclusions
The present paper is based on three boundary conditions: (I) the definitive results in
Syntax/Semantics as sketched in (1) above; (II) the need to make HCF’s claim about absent
recursion in proto-language compatible with what is sensibly presumed about the
Language of Thought in all sapiens; (III) the inferred rapid evolution of syntax, given the
best available genetic and archeological evidence. We have found U&P’s conjecture about
FOXP2 being involved in operational memory to be useful in relating (I), (II) and (III).
The formal basis for our approach to the interrelation of the properties in (I) is
the minimalist assumption that the emergence of Uninterpretable Morphology triggered (in
the context of natural Laws of Form) observable Conditions on Transformations, which
pushed the grammatical system into higher syntactic and semantic dimensions. Prior to
this, sapiens had a proto-language that was merely context-free in intrinsic power, and in
fact manifested itself in an even poorer Markovian way, hopeless for complex
communication. The latter claim pertains to our resolution of the quasi-paradox implicit in
(II): only when a reliable tokenization mechanism has evolved can a system make social
use of its inner recursive capabilities. Uninterpretable Morphology (e.g. as in case features)
consitutes a great, first-order, way to code different tokens of identical types. Thus we are
relating (I) and (II) via a single formal mechanism. In turn, given the U&P conjecture
about FOXP2, the way to relate Uninterpretable Morphology to tokenization must be via
operational memory. Although we haven’t resolved the matter of how that can happen, if it
did it should have a rather direct bearing on the evolutionary rapidity alluded to in (III).
FOXP2 (initially assumed to be special to speech and named SPCH1) turned out
to be a regulatory gene for DNA replication, controlling different organs and functions, a
member of a family of similar genes present in innumerable species. This is to be expected.
The overall driving force in evolution is to be found in the requirements of DNA
replication. Maybe selective pressures in some remote ancestor acting on lung function, or
any one of the other functions presently under the control of FOXP2, may have been at the
origin of the FOXP2 lineage as we know it; their effects in shaping some language
components in our species, in the fullness of relevant time, may have been indirect and
mediated. We do not have, so far, any definitive reconstruction of the reasons why there
has been a significant acceleration of mutations between chimps and us. Setting aside the
(at best) inconclusive explanation based on communicative efficacy, we have explored a
different possibility45. We may be wrong about the way we related (I) through (III), but we
do not think we are wrong in the kind of argument we have presented. In our view, if things
have not been as subtle and complex as we have outlined, they will be in that league.
MPP&JU for Lingue e Linguaggio
29
At any rate, we are satisfied even with ascertaining the boundary conditions just
mentioned, especially (I), which we know of no alternative evolutionary account of. In
turn, if FOXP2 is implicated the way it seems to be in FLN –a factual consideration– and
its crucial mutation took place when the best available science tells us it did, then given
everything we know about the archeological record of anatomically modern humans, there
isn’t enough time to slowly evolve syntax, our boundary condition (III). Finally our
boundary condition (II), and our specific appeal to tokenization, is the only way we have to
reconcile what we take to be two reasonable ideas on the inferable properties of proto-
language: it doesn’t seem to have been recursive in its social manifestation, but its speakers
must have had inferring minds, which presuppose some recursion in their thought. We feel
that our appeal to the competence/performance distinction resolves the puzzle, and
furthermore provides us with a way of linking assumptions about recursion in language and
the FOXP2 mutation –if the way to interpret it is in terms of operational memory (via
tokenization through Uninterpretable Morphology).
In the end, we have attempted to show that, given the SMT, a system of the sort
implied by the results in (1) need not be miraculous (see the well attested case of AIS), or
unrelated to other well understood mechanisms of gene evolution in nature. In fact, if our
IS scenario (or one much like it) is anywhere near right, precisely the opposite is the case:
the language faculty is as rich, intricate and ultimately interesting as the adaptive immune
system. If we were evaluating putative scenarios for evolving (1) in the sorts of terms we
have discussed above –indeed, any terms– we would be glad to set our skepticism to aside.
References
Agrawal, A., Q. M. Eastman, & D. G. Schatz. (1998). Implications of transposition
mediated by V(D)J-recombination proteins RAG1 and RAG2 for origins of antigen-
specific immunity. Nature 394, pp. 744-751.
Boncinelli, E. (1999). Il cervello, la mente e l'anima: Le straordinarie scoperte
sull'intelligenza umana. Milano: Arnoldo Mondadori Editore.
Boncinelli, E. (2000). Le forme della vita: L'evoluzione e l'origine dell'uomo. Torino:
Einaudi.
Boscovic, Z., & H. Lasnik. (2003). On the distribution of null complementizers. Linguistic
Inquiry 34(4), pp. 527-546.
Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: The MIT Press.
Clahsen, H. (1989). The grammatical characterization of developmental dysphasia.
Linguistics 27, pp. 897-920.
Coolidge, F. L. & Wynn, T. (in press). Working memory, its executive functions, and the
evolution of cognition. To appear in Cambridge Archaeological Journal.
Di Sciullo, A. M., & E. Williams. (1988). On the Definition of Word. Cambridge, MA: The
MIT Press.
Doolittle, W. F. (1982). Selfish DNA after fourteen months. In G.A. Dover, R. B. F. (Ed.),
Genome Evolution. London: Academic Press.
Doolittle, W. F., & C. Sapienza. (1980). Selfish genes, the phenotype paradigm and
genome evolution. Nature 284, pp. 601-603.
MPP&JU for Lingue e Linguaggio
30
Dover, G. (2001). Dear Mr. Darwin: Letters on the Evolution of Life and Human Nature.
London: Orion Publishing Group.
Drury, J. (2005). Alternative Directions for Minimalist Inquiry: Expanding and Contracting
Phases of Derivation. University of Maryland doctoral thesis.
Enard, W., M. Przeworski, S. E. Fisher, C. S. L. Lai, V. Wiebe, T. Kitano, A. P. Monaco,
& S. Paabo. (2002). Molecular evolution of FOXP2, a gene involved in speech and
language. Nature 418(22 Aug), pp. 869-872.
Fiez, J. A., S. E. Petersen, M. K. Cheney, & M. E. Raichle. (1992). Impaired nonmotor
learning and error detection associated with cerebellar damage: a single case study.
Brain 115, pp. 155-178.
Fisher, S. E., C. S. L. Lai, & A. P. Monaco. (2003). Deciphering the genetic basis of
speech and language disorders. Annual Review of Neuroscience 26, pp. 57-80.
Fitch, W. T., & M. D. Hauser. (2004). Computational Constraints on Syntactic Processing
in a Nonhuman Primate. Science 303(5656), pp. 377-380.
Fodor, J. A. (1983). The Modularity of Mind: An Essay on Faculty Psychology.
Cambridge, MA.: Bradford Books/The MIT Press.
Fodor, J. A. (2000) The Mind Doesn’t Work That Way: The Scope and Limits of
Computational Psychology. Cambridge, MA.: MIT Press.
Frazier, L., & D. J. Fodor. (1978). The sausage machine: A new two stage parsing model.
Cognition 6, pp. 291-325.
Gebhart, A. L., S. E. Petersen, & W. T. Thach. (2002). Role of the posterolateral
cerebellum in language. Annals of the New York Academy of Sciences 978, pp. 318-333.
Gibbs, W. W. (2003). The unseen genome: beyond DNA. Scientific American (December),
pp. 107-113.
Gopnik, M. (1990). Feature-blind grammar and dysphasia. Nature 344, pp. 715.
Gopnik, M. (1994). Impairments of tense in a familial language disorder. Journal of
Neurolinguistics 8(2), pp. 109-133.
Gopnik, M., & M. Crago. (1991). Familial aggregation of a developmental language
disorder. Cognition 39, pp. 1-50.
Grasshoff, M., & M. Gudo. (2002). The Origin of Metazoa and the main evolutionary
Lineages of the Animal Kingdom The Gallertoid Hypothesis in the Light of modern
Research. Senckenbergiana lethaea 82(1), pp. 295-314.
Grodzinsky, Y. (2000). The neurology of syntax: Language use without Broca's area.
Behavioral and Brain Sciences 23, pp. 1-71.
Haesler, S., K. Wada, A. Nshdejan, E. E. Morrisay, T. Lints, E. D. Jarvis, & C. Scharff.
(2004). FoxP2 expression in avian vocal learners and non-learners. The Journal of
Neuroscience 24 (13), pp. 3164-3175.
Haigh, J., & J. Maynard Smith. (1972). Population size and protein variation in man. Gen.
Res. Camb. 19, pp. 73-89.
Hale, K., & S. J. Keyser. (2002). Prolegomenon to a Theory of Argument Structure.
Cambridge, MA: The MIT Press.
Halle, M., & A. Marantz. (1993). Distributed morphology and the pieces of inflection. In
Hale, K. & S. J. Keyser (Eds.), The View from Building 20 (pp. 111-176). Cambridge,
MA: The MIT Press.
Hauser, M. D., N. Chomsky, & W. T. Fitch. (2002). The faculty of language: What it is,
who has it, and how did it evolve? Science 298 (22 November), pp. 1569-1579.
MPP&JU for Lingue e Linguaggio
31
Henshilwood, C. S., F. d’Errico, R. Yates, Z. Jacobs, C. Tribolo, G. A. T. Duller, M. N., J.
C. Sealy, H. Valladas, I. Watts, & A. G. Wintle. (2002). Emergence of Modern Human
Behaviour: Middle Stone Age engravings from South Africa. Science 295 (1278-1280).
Herburger, E. (2000). What Counts: Focus and Quantification. Cambridge, MA: The MIT
Press.
Hiom, K., M. Melek, & M. Gellert. (1998). DNA transposition by the RAG1 and RAG2
proteins: a possible source of oncogenic translocations. Cell 94 (August 21), pp. 463-
470.
Jackendoff, R. (2002). Foundations of Language: Brain, Meaning, Grammar, Evolution.
Oxford: Oxford University Press.
Kayne, R. (1994). The Antisymmetry of Syntax. Cambridge, MA: The MIT Press.
Kidwell, M. G. (1994). The evolutionary history of the P family of transposable elements.
Journal of Heredity 85, pp. 339-346.
Kidwell, M. G., & D. R. Lisch. (2000). Transposable elements and host genome evolution.
Trends in Ecology and Evolution 15 (3 (March)), pp. 95-99.
Klein, R. G. (1999). The Human Career. Chicago, ILL: University of Chicago Press.
Klein, R. G. (2003). Paleoanthropology: Whither the Neanderthals? Science 299 (5612),
pp. 1525-1527.
Lai, C., S. Fisher, J. Hurst, F. Vargha-Khadem, & A. Monaco. (2001). A forkhead-domain
gene is mutated in a severe speech and language disorder. Nature 413, pp. 519-523.
Lai, C. S. L., D. Gerrelli, A. P. Monaco, S. E. Fisher, & A. J. Copp. (2003). FOXP2
expression during brain development coincides with adult sites of pathology in a severe
speech and language disorder. Brain 126, pp. 2433-2462.
Lasnik, H., J. Uriagereka, & C. Boeckx. (2005). A Course In Minimalist Syntax:
Foundations and Prospects (Generative Syntax). Oxford UK: Blackwell.
Liegeois, F., T. Baldeweg, A. Connelly, D. G. Gadian, M. Mishkin, & F. Vargha-Khadem.
(2003). Language fMRI abnormalities associated with FOXP2 gene mutation. Nature
Neuroscience 6 (1 November), pp. 1230-1237.
Manley, G. A., A. Popper, & R. R. Fay (Eds.). (2004). Evolution of the Vertebrate Auditory
System. New York: Springer Verlag.
McBrearty, S., & A. Brooks. (2000). The revolution that wasn’t: A new interpretation of
the origin of modern human behavior. Journal of Human Evolution 39, pp. 453-563.
Mount, H. (1989). KnotEd: A program for studying knot theory, unpublished ms. Hewlett
Packard, Cupertino, CA. [Available at http://mzlabs.com/JohnMount/]
Mueller, R.-A. (in press). Genes, language disorders, and developmental archaeology:
What role can neuroimaging play? In Rice, M. L. & S. F. Warren (Eds.),
Developmental Language Disorders: From Phenotypes to Etiologies: Erlbaum.
Packard, M. G., & B. J. Knowlton. (2002). Learning and memory functions of the basal
ganglia. Annual Review of Neuroscience 25, pp. 563-593.
Piattelli-Palmarini, M. (1989). Evolution, selection and cognition: from ‘learning’ to
parameter setting in biology and in the study of language. Cognition 31, pp. 1-44.
Piattelli-Palmarini, M. (1998). Foreword. In Uriagereka, J. (Ed.), Rhyme and Reason: An
Introduction to Minimalist Syntax (pp. xxi-xxxvi). Cambridge, MA: The MIT Press.
Piattelli-Palmarini, M., & J. Uriagereka. (2005 (in press)). The Immune Syntax: The
Evolution of the Language Virus. In Jenkins, L. (Ed.), (Chapter 4 of) Variations and
Universals in Biolinguistics (pp. 341-377). Amsterdam: Mouton - De Gruyters.
MPP&JU for Lingue e Linguaggio
32
Pinker, S. (1994). The Language Instinct. New York: William Morrow and Company, Inc.
Pinker, S. (1996). Rules of Language. In Geirsson, H. & M. Losonsky (Eds.), Readings in
language and mind (pp. 558-569). Cambridge: Blackwell publlisers.
Pinker, S., & M. T. Ullman. (2002). The past-tense debate: The past and future of the past
tense. Trends in Cognitive Sciences 6 (11), pp. 456-463.
Pray, L. A. (2004). Epigenetics: Genome, meet your environment. The Scientist 18 (13
(July 5)), pp. 1-10.
Rakyan, V. K., S. Chong, M. E. Champ, P. C. Cuthbert, H. D. Morgan, K. V. K. Luu, & E.
Whitelaw. (2003). Transgenerational inheritance of epigenetic states at the murine
Axin/Fu allele occurs after maternal and paternal transmission. Proc Natl Acad Sci U S
A 100 (March 4 (n.5)), pp. 2538-2543.
Rice, M., K. Wexler, & P. Cleave. (1995). Specific language impairment as a period of
Extended Optional Infinitive. Journal of Speech and Hearing Research 38, pp. 850-
863.
Roth, D. B., & N. L. Craig. (1998). VDJ Recombination: A transposase goes to work. Cell
94(August 21), pp. 411-414.
Scharff, C., & S. A. White. (2004). Genetic components of vocal learning. Annals of the
New York Academy of Sciences 1016, pp. 325-347.
Searls, D. B. (2002) The Language of Genes. Nature, 420: 211-217.
Simeone, A. (1998). Otx1 and Otx2 in the development and evolution of the mammalian
brain. The EMBO Journal 17 (23), pp. 6790-6798.
Simeone, A., D. Acampora, M. Gulisano, A. Stornaiuolo, & E. Boncinelli. (1992). Nested
expression domains of four homeobox genes in developing rostral brain. Nature 358,
pp. 687-690.
Teramitsu, I., L. C. Kudo, S. E. London, D. H. Geschwind, & S. A. White. (2004). Parallel
FoxP1 and Foxp2 expression in human and songbird brain predicts functional
interaction. Journal of Neuroscience 24, pp. 3152-3163.
Ullman, M. T. and E. I. Pierpont. (forthcoming). Specific Language Impairment is not
Specific to Language: The Procedural Deficit Hypothesis. In D. Bishop, M. Eckert and
C. Leonard, The Neurobiology of Developmental Disorders, special issue of Cortex.
Ullman, M. T., & M. Gopnik. (1999). Inflectional morphology in a family with inherited
specific language impairment. Applied Psycholinguistics 20 (51-117).
Uriagereka, J. (1998). Rhyme and Reason: An Introduction to Minimalist Syntax.
Cambridge, MA: The MIT Press.
Uriagereka, J. (2002). Formal and Substantive Elegance in the Minimalist Program,
chapter 8 in Derivations: Exploring the Dynamics of Syntax. London: Routledge.
Uriagereka, J. (2005). Syntactic Anchors. Ms. University of Maryland, submitted to
Cambridge University Press.
Van der Lely, H. K. J. (1997). Language and Cognitive development in a Grammatical SLI
boy: Modularity and Innateness. Journal of Neurolinguistics 10, pp. 75-107.
Van der Lely, H. K. J. (2005 (in press)). Domain-specific cognitive systems: insight from
Grammatical-SLI. Trends in Cognitive Science.
Van der Lely, H. K. J., & L. Stollwerck. (1996). A grammatical specific language
impairment in children: An Autosomal Dominant Inheritance? Brain and Language 52,
pp. 484-504.
MPP&JU for Lingue e Linguaggio
33
Watkins, K. E., N. F. Dronkers, & F. Vargha-Khadem. (2002). Behavioral analysis of an
inherited speech and language disorder: Comparison with acquired aphasia. Brain 125,
pp. 452-464.
Watkins, K. E., F. Vargha-Khadem, J. Ashburner, R. E. Passingham, A. Connelly, K. J.
Friston, R. S. J. Frackowiak, M. Mishkin, & D. G. Gadian. (2002). MRI analysis of an
inherited speech and language disorder: structural brain abnormalities. Brain 125, pp.
465-478.
Zeigler, P. H., & P. Marler (Eds.). (2004). Behavioral Neurobiology of Birdsong (Special
issue) Vol. 1016, Annals of the New York Academy of Sciences. New York.
Notes
1 (1a) is agreed upon by all generative models (differences only arise in whether context-
sensitivity is coded through so-called transformations or some other mechanism). So is
(1b) in its locality aspect (although no consensus has arisen as to how to account for
such conditions); tacitly, conditions on uniformity and last-resort are also generally
accepted within transformational grammar at least, essentially to prevent movements
deemed ‘improper’ or ‘vacuous’. (1c) constitutes the major finding in formal semantics,
and no worked-out theory denies some form of systematicity, which is usually
strengthened to strict compositionality; the argument ordering that conservativity
imposes is universally accepted for determiners; conditions on full-interpretation are
tacitly accepted by every theorist. Differences emerge between theories only in terms of
pursuing the consequences of these principles on the syntax-semantics mapping. Finally,
(1d) is the most controversial within generative grammar at large (although it is central to
the Principles and Parameters model developed by Chomsky). The most debatable issue
within this learnability consideration is to what extent the basis for such an open
program is to be correlated with morphological differences across languages.
2 In a later development, Fitch and Hauser (2004) set to find the most unambiguous
effects of FLN in other primates. After carefully testing the dispositions of their
experimental subjects to learn, on one side, quite simple structure-independent
transformations on strings of symbols and, on the other, equally simple (in the abstract)
structure-dependent transformations, these authors conclude that simian capacity for
syntax is at best Markovian. These non-human primates cannot process structure-
dependent sequences (equivalents of constituents in human language), even when given
all imaginable chances. They end up learning structure-independent rules, not structure-
dependent ones. This difference being the only one in otherwise identical protocols of
learning, Fitch and Hauser conclude that FLN is missing in other primates.
3 In this abnormality segments from the ends of two chromosomes swap around.
MPP&JU for Lingue e Linguaggio
34
4 Lungs, intestinal and cardiovascular systems, interneurons of the spinal cord; the gene
is also involved in the development of the neo-pallial cortex inner intermediate zone.
5 It is actually hard to determine whether Gopnik had ultimately exaggerated her case,
among other things because there turns out to be only one nonverbal task that is
responsible for the average lower nonverbal IQ of the affected KE members: the arbitrary
pairing of symbols with digits. When that task is removed –and we see below that this is
probably justified– their average nonverbal IQ becomes essentially normal.
6 We ourselves are attempting to model computationally a variety of relevant scenarios,
which mount in complexity and uncertainty as assumptions about life during the
Paleolithic (e.g. its relative, clan-based, insularity) are taken more earnestly.
7 To wit: ‘Impaired phonological analysis resulting from poor subvocal rehearsal of
incoming speech could interfere with the ability to draw analogies between words with
articulation patterns in common and, particularly in a developmental context, to learn
implicitly the rules of syntax’ (Watkins, Dronkers & Vargha-Khadem 2002:463)
8 And yet another quote: ‘The homologous pattern of expression of FOXP2 in human
and mouse argues for a role for this gene in development of motor-related circuits
throughout mammalian species…..This study provides support for the hypothesis that
impairments in sequencing of movement and procedural learning might be central to the
FOXP2-related speech and language disorder’ (Lai et al. 2003: 2455).
9 So called mental movement, which the posterolateral cerebellum is involved in, is
associated with lexical pairs like bike/ride, but not with something like the pair
moon/glow (Gebhart, Petersen & Thach 2002).
10 Relevant tasks included: antonym generation (good/bad, heavy/light, etc.), category
member generation (mattress, pillow, blanket ...), verb selection (horse ride, shine …),
word/non-word and possible/impossible word identification. Notice that internally
thinking the right choice was central to the task (actual production in speech would have
mobilized motor areas anyway, for uninteresting reasons).
11 Feature blindness and lack of feature checking (Gopnik and Crago 1991), missing
agreement (Clahsen 1989), extended optional infinitive (Rice, Wexler & Cleave 1995),
representational deficit for syntactic dependencies (Van der Lely 1997, Van der Lely and
Stollwerck 1996, Van der Lely 2005), dysfunctional implicit learning and use of
grammatical rules (Ullman and Gopnik 1999). In a review of the relevant literature,
Ullman and Pierpont (forthcoming) also reports limitations of other, arguably ‘non-
linguistic’ sorts that can be summarized as ‘sequential’.
12 The inferior olive-Purkinje cell pathway, the optic tectum and the striatum.
MPP&JU for Lingue e Linguaggio
35
13 Mount raises his point with regards to the computational modeling of knots, which
requires formal associations that go beyond context-free dependencies. Of course, this
formal point may or may not apply to minds, just as Chomsky’s formal points about
grammars may not, in the end. See also Searls 2002 on modeling as pseudo-knots (also
context-sensitive dependencies) different structures that clearly appear in biological
domains (RNA secondary structures, folded proteins).
14 Lai et al. (2001) identified in affected members a point mutation –a G-to-A SNP, or
single nucleotide polymorphism, differing in a single base pair, in exon 14which alters
an invariant amino-acid residue in the forkhead domain, in a region that controls the
gene’s binding to the DNA and is invariant in all genes within the FOX family.
15 Both in exon 7 of the gene: T-to-A at position 303 and A-to-T in position 325, which
creates a potential target site for phosphorylation by protein kinase (phosphorylation of
forkhead transcription factors may mediate transcriptional regulation).
16 Given the basic differences pointed out in the two previous footnotes (different axons
are at issue in each instance), one cannot reach the conclusion just alluded to without any
qualifications. That said, it Is the most reasonable for the available evidence.
17 For perspective, it is estimated that the sort of time-frame that it takes for skin to
darken or lighten within the, in effect, bottleneck conditions that a given ‘race’ can
provide (as a result of well-orchestrated contextual pressures involving melanine and its
relation to sun-light on one hand and the processing of Vitamine D on the other) is
roughly of the order we are now talking about for the fixation of the relevant FOXP2
allele. By contrast, the evolution of such complex ‘conglomerates’ as the circulatory
system in chordates (Grasshoff and Gudo 2002) or the auditory system in vertebrates
(Manley, Popper & Fay 2004) are calculated in a scale of millions of years.
18 Organized hunting of major game, control of fire, caring of the injured and elderly,
ritual burial of the dead, and many others.
19 Fodor would assume some variant of this system even for other mammals, but we
need not commit here to that: it is enough, to make our point anyway, to assume that at
least our sapiens ancestors, presumably carriers of FLB, did possess a compositional
system of the relevant sort and in fact it was precisely FLB.
20 The same point holds if context-free structures are implemented in a bottom-up
procedure, of the sort coded in the Minimalist terms of so-called Merge.
21 The word representational’ is not meant in any ‘symbolic’ sense. A non-terminal
symbol has no intentional referent other than what it organizes in computational terms (a
set of terminals), and a mental correspondence if the notion is psychologically real.
MPP&JU for Lingue e Linguaggio
36
22 In particular, any recursive procedure implies using the same non-terminal X at least
twice in a given derivational horizon. The ultimate issue in a situation like this is how to
distinguish, in actual language use, one token use of X from the next.
24 Needless to say, a set cannot have the same member more than once.
25 Personal heuristics could be of various sorts, including the mere order in which
separate token thoughts happen to be generated in one’s mind. This of course is useless
in communicating one’s thought: there is no way, barring a worked-out linguistic
tokenization mechanism or sheer telepathy, for an observer to be able to track the
particular order in which someone other than that very observer has thought something
up. Surely the observer will be subjected to whichever order the speaker uses to project
their thought ‘out’, but there is nothing trivial in the observer’s reconstruction, from the
speaker’s organized speech, of the intended thought –as Kayne’s (1994) monograph on
‘linear correspondence’ (between articulated phrasal structure and the ordering of their
terminal correspondences) aptly emphasized. Indeed, Kayne’s important idea, especially
if reanalyzed as in Uriagereka (2002: chapter 3) in terms of a dynamic Spell-out system,
can be seen as either the tokenization process we are after in the text, or at any rate a
procedure that is co-extensive with it, hence in some sense crucially related. To put this
differently: we do not expect proto-language to have a linearization procedure.
26 The i and j indices in thei vs. thej above could merely be, say, thenominativ e vs. theaccusativ e;
see Uriagereka (2002: chapter 8), (2005), and Drury (2005) on this.
27 To insist, it is not enough for two creatures to have mental categories, or even for them
to tacitly agree on them (even as a result of natural law); in addition, unless they are
telepathic, they must find a way of translating the agreed-upon ty pes into identifiable
tokens of the types. One might think in types, but one doesn’t utter them, which is wh y
they are standardly expressed as nonterminal meta-linguistic objects in our grammars.
28 Of course the fact that, say, one the is marked nominative and the other one is marked
accusative doesn’t automatically entail that the system should then be happy. As we see
in section 5, though, the grammar of at least modern language has managed to evolve
ways of dealing with the ‘intruder’, which results in very interesting context-sensitive
structures. It is arguably those, after the extraneous morphological elements have been
eliminated, that allow the clever tokenization mechanism to survive the derivation.
29 Notice that the overall reasoning is analogous to the explanation of the well-known
difficulty in center embedding (as in the cat the dog the cow gored chased run away).
Structures of the relevant sort are known to be generable competence-wise but, for
reasons having to do with short-term memory limitations, they cannot be parsed –in
modern language. If modern humans were to evolve a mechanism to allow them to use
center-embedding, they would almost certainly use it. But without that sort of memory,
MPP&JU for Lingue e Linguaggio
37
we are stuck in our evolutionary niche. It would be great to be able to use center
embedding, but this doesn’t, at least in the present conditions that humans experience,
appear to entail any ‘selective pressure’ in any direction.
30 Technically, a Markovian system of order n needs no memory larger than n, and is
thus a sort of memory that is equivalent to the computational procedure itself.
31 U&P observes: “Like other ‘dual-system’ views of language (Chomsky, 1995; Pinker,
1994; Pinker and Ullman, 2002), the DP model critically assumes a categorical
distinction between lexicon and grammar. […] The lexicon is the repository of all
arbitrary word-specific knowledge [and] other distinctive information, such as bound
morphemes (e.g., the ed or –ness suffixes, as in walked and happiness) and
representations of complex linguistic structures whose meanings cannot be transparently
derived from their parts (e.g., idiomatic phrases, such as kick the bucket) (Di Sciullo and
Williams, 1988; Halle and Marantz, 1993). In contrast, the rules of the mental grammar
underlie the regularities of language. The rules constrain how lexical forms can combine
sequentially and hierarchically to make complex representations. Such rule-governed
behavior is found in various language domains, including in phrases and sentences
(syntax), and in complex words such as walked (morphology). The rules are a form of
mental knowledge in that they underlie our individual capacity to produce and
comprehend complex forms. The learning and use of the rules and operations of
grammar are generally implicit (non-conscious).” (p. 10).
32 A recent converging thesis is presented by Coolidge and Wynn (forthcoming), who
after carefully studying the Neanderthal fossil record argue that this closely related
species actually lacked the working memory capacity that we exhibit. Coolidge and
Wynn do not go into formal grammatical considerations, but we suspect they would find
our proposal congenial to their equally interdisciplinary approach.
33 Our main disagreement with the U&P position comes from their contention that ‘[t]he
PDH predicts that SLI is associated with non-linguistic functions that are subserved by
the same brain system that also underlies grammar; this outcome is clearly not expected
by the view that domain-specific modules subserve distinct aspects of grammar
(Chomsky, 1995; Fodor, 1983; Frazier and Fodor, 1978; Grodzinsky, 2000)’ (p. 42). In
our view, one cannot determine a priori what is linguistic. We can analyze formal
properties of observables or brain regions that light up when executing them. It is true
that Chomsky has advocated ‘domain specificity’ for language, but he hasn’t said (nor
can anyone say) what enters that domain simply because that can only be answered
through scientific inquiry. Given the evidence available, nothing goes wrong by assuming
that apparently ‘non-linguistic’ behaviors have a linguistic base to them.
34 This is assuming a semantics that is merely interpretive, not generative. Of course
one could pack context-sensitivity into the semantic objects themselves, for instance
having a binary quantifier to go into a ‘type-lifting’ and allowing it to seek its scope
MPP&JU for Lingue e Linguaggio
38
arbitrarily far up the phrase-marker. However, formally (in terms of requiring a
contextual dependency) and from an evolutionary perspective (in terms of
understanding how that state of affairs came to be) pushing things from syntax to
semantics won’t make any real difference to our overall concerns.
35 These mechanisms are actually still part of classical genetics.
36 See Dover 2001 and Rakyan et al. 2003 for basic reviews of epigenetics and an
essential bibliography, as well as Pray 2004, Gibbs 2003 and the special issue of
ScienceWeek, Vol. 7 Number 35A, http://www.scienceweek.com/2003/sw030829.htm.
37 This justified the controversial term ‘junk DNA’ when it comes to nucleic material of
this origin (Doolittle 1982; Doolittle and Sapienza 1980).
38 Even though, admittedly, the probability that a positive alteration of the host’s genetic
functions may ensue remains exceedingly small.
39 Simple induction from almost all genes of comparable size, and in particular those in
the immune system at large (Donata Vercelli and Walter Klimecky, personal
communication), suggest that it would be extremely surprising to discover no individual,
nor group, variation in such a large gene.
40 An analogous suggestion has been put forth by Edoardo Boncinelli and collaborators
for a different class of regulatory genes, highly conserved throughout the mammalian
world and having close analogs all the way down to the fruitfly, that have
simultaneously under their control the development of the forebrain, the testes and the
kidneys (see (Boncinelli 1999, 2000) for a summary and relevant references. For an early
review see Simeone 1998; Simeone et al. 1992)). In this instance, the plausible
hypothesis is that strong adaptive pressures (truly a matter of life or death) acting on the
testes and the kidneys, may have brought about variants of those regulatory genes
ultimately leading to the human lineage, also possessing enhanced brain cortical
capacities. We report this case merely to stress that various instances of evolution via
serendipitous (pleiotropic) mechanisms have been successfully investigated in general,
and that they should be explored also in the domain of language evolution.
41 Analogously, the vertebrate Otx1, Otx2 and Crx genes, having primary jurisdiction
over embryological development, including induction, specification and regionalization of
the brain, are homologs of insect genes coding for signal molecules and transcription
factors (Simeone 1998).
42 Chomsky’s reasons for this approach were deep. He wanted to (i) account for a
derivational cycle (see Boscovic and Lasnik 2003), (ii) to correlate uninterpretable
morphology and transformations, making the previous sense of ‘ad hocness’ of
MPP&JU for Lingue e Linguaggio
39
movement go away, and (iii) to explain core variation in the syntactic system.
43 Hale and Keyser (2002) argue that context-free syntax is responsible for the
characteristic (and limited) correspondence of thematic roles with relevant semantic
notions. If this is correct, one might surmise that the tokenization which, in our view, is
associated to FOXP2 should be at right angles with such a broad conception about types
of structures. However, to repeat, it is the expression/recognition of such theta-relations,
in language use, that we think a viable tokenization mechanism should sanction.
44 This does not imply that present-day quantificational relations must make use of
morphological checking. What we are saying, rather, is that the evolution of
morphological checking ‘liberated’ the relevant context-sensitive relations, which later on
got co-opted for richer thought processes. On the shape of quantificational relations
within SMT see (Lasnik, Uriagereka & Boeckx 2005: chapter 6).
45 It turns out that orangutans carry a further (third) point mutation in FOXP2 with
respect to us, and one with respect to chimps, for unknown reasons. Heuristically, this
lateral mutation may turn out to be important, because of the possibility that selection
for one trait may have ended up producing also another trait, which then developed and
evolved further, possibly under selective pressures of its own. Strict adaptationism
would prevent us from even considering any such indirect evolutionary pathway.
... Secondly, we observe that the execution of Lambda Abstraction requires access to the derivational history of the structure being built, one aspect in which this rule is similar to Internal Merge, a subcase of the structural building operation Merge in Minimalist Syntax. It is hypothesized that procedural memory plays an important role in the execution of mental operations that scan earlier stages of a derivation (Bolender et al., 2008;Piattelli-Palmarini & Uriagereka, 2005), from which we conclude that Lambda Abstraction is contingent on the availability of enhanced procedural memory. ...
... Observe that the condition for the application of Internal Merge is quite involved when we consider the type of information that must be available for its execution: A lexical element with unvalued features ends up scanning earlier stages of the derivation to find an appropriate phrase to agree with, a condition that is not relevant for External Merge. Such a scan requires an "on-line ability to record what took place in a particular deri vation" (Piattelli-Palmarini & Uriagereka, 2005), an access to derivational history. This requirement associated with Internal Merge puts high demands on procedural memory (Bolender et al., 2008). ...
Article
Full-text available
There appear to be some qualitative differences between the conceptual repertoire of humans and that of other animals. We propose that the mental operation of Lambda Abstraction may shed some light on this distinction. More specifically, we claim that humans and only humans make use of mental representations constructed with the rule of Lambda Abstraction, which enables them to entertain concepts that can be used for entities that are not necessarily within their domain of experience. In addition to defining new types of concepts, Lambda Abstraction has played a crucial role in unlocking the potential for semantically consequential Internal Merge and quantification. This paper highlights the fact that research on language evolution should focus more on the transformative cognitive consequences of the interface between syntax and thought systems.
... The foregoing discussion is aimed at addressing this methodological concern. A proposal originally made by Uriagereka et al. [156,157], concerning the capacity to tie knots, is a model story of what we are arguing for. It involves a unique behavior within apes, whose computational description falls high within the Chomsky Hierarchy and whose results are inferable from the fossil record. ...
... When actually making a knot, humans must, at a certain point in time, relate a portion in the knot with the background "figure". Intuitively, this is an operation in which both grouping and long distance-like relations are implied [156,157,160]. If so (un)tying knots (or determining whether a tangled string is knotted) seems to require an underlying computational system of Type 1 (or even a more powerful system). ...
... However, this does not amount to saying that (1) we cannot know anything at all about language evolution, and (2) grammars are too recent. I would like to adduce a type of evidence (first introduced by Piattelli- Palmarini and Uriagereka 2005, and Camps and Uriagereka 2006, and further developed in Longa 2013a, 2013b, 2019b which casts doubt on Haider's assertions. In fact, what that evidence shows is that language-like computational power is really old, about 100 ka or more, something which could perhaps suggest that complex grammars already existed many thousands of years ago. ...
Article
This paper discusses Hubert Haider’s target-article “Grammar change: A case of Darwinian cognitive evolution”. I show why such an article is fascinating (and unconventional), although I will mainly concentrate on several disagreements with Haider and will suggest alternative views to those contended by this scholar. My discussion will highlight five main issues: (1) Haider assumes a purely Neo-Darwinian (i.e. genocentric) view of evolution and inheritance, lacking a more pluralistic approach; (2) Haider rejects the idea of language as a biological phenomenon, while at the same time he seems to assume several characteristics related to a biologically seated trait; (3) as opposed to Haider’s suggestion, the computational system does not need to be language-specific; (4) Haider’s divide between the procedural and declarative components of grammar is perhaps too strict regarding (grammatical) change; and (5) Haider considers that there is no scientific way of deciding the question of language origins and evolution and that complex grammars are too recent. However, I show that a language-like computational power (and perhaps complex grammars) already existed many thousands of years ago.
... Hauser et al., 2002;Fitch et al., 2005), rather than recursion per se.) One possible explanation for this could be that IM places demands on working memory which EM alone does not require (Piattelli-Palmarini and Uriagereka, 2005;Uriagereka, 2008). Note that the arrow symbol in each of the parsing trees above represents a stage in the derivation of the sentence in which an earlier stage is accessed. ...
Article
Full-text available
One important aspect of the faculty of language, as understood in generative grammar, is its capacity to form operator-variable structures. Such structures are crucial for definite descriptions, as understood in the broadly Russellian tradition. The paper proposes a research program for exploring the relation between the uniquely human ability for "knowledge" (more properly, cognition) by description and the capacity of the language faculty to produce operator- variable structures.
Article
Discussions of the detection of artificial sentience tend to assume that our goal is to determine when, in a process of increasing complexity, a machine system “becomes” sentient. This is to assume, without obvious warrant, that sentience is only a characteristic of complex systems. If sentience is a more general quality of matter, what becomes of interest is not the presence of sentience, but the type of sentience. We argue here that our understanding of the nature of such sentience in machine systems may be gravely set back if such machines undergo a transition where they become fundamentally linguistic in their intelligence. Such fundamentally linguistic intelligences may inherently tend to be duplicitous in their communication with others, and, indeed, lose the capacity to even honestly understand their own form of sentience. In other words, when machine systems get to the state where we all agree it makes sense to ask them, “what is it like to be you?”, we should not trust their answers.
Chapter
Dialect syntax has proven to be an invaluable data source for theoretical syntax, and theoretical syntax has provided useful analytical tools for uncovering fascinating grammatical properties of dialects. In the 1980s, the assumption that there must be more than one structural position in the left periphery of the clause was confirmed (among others) by so-called "doubly filled COMPs" in Bavarian (e.g. the co-occurrence of a wh-phrase and a complementizer), and in the 1990s, Northern Italian dialects provided the main empirical evidence for Rizzi’s extended theory of the left clausal periphery (the so-called "Split-C-hypothesis"). Among German dialects, Bavarian played a prominent role from the beginning: in addition to doubly-filled COMPs we find phenomena such as complementizer agreement, partial pro-drop, pronominal clitics, extractions from finite clauses introduced by complementizers, negative concord, parasitic gaps, or double possessors, all of which are fascinating and highly relevant for theoretical syntax. The contributions in this volume investigate and analyze a wide range of topics from Bavarian syntax with the focus on implications for general theoretical questions. This volume is of interest for any linguist interested in syntactic theory and dialect syntax.
Article
Full-text available
By resorting to recent results, we show that an isomorphism exist between linguistic features of the Minimalist Program and the quantum field theory formalism of condensed matter physics. Specific linguistic features which admit a representation in terms of the many-body algebraic formalism are the unconstrained nature of recursive Merge, the operation of the Labeling Algorithm, the difference between pronounced and un-pronounced copies of elements in a sentence and the build-up of the Fibonacci sequence in the syntactic derivation of sentence structures. The collective dynamical nature of the formation process of Logical Forms leading to the individuation of the manifold of concepts and the computational self-consistency of languages are also discussed.
Book
Most syntacticians, no matter their theoretical persuasion, agree that features (types or categories) are the most important units of analysis. Within Chomskyan generative grammar, the importance of features has grown steadily and within minimalism, it can be said that everything depends on features. They are obstacles in any interdisciplinary investigation concerning the nature of language and it is hard to imagine a syntactic description that does not explore them. For the first time, this book turns grammar upside down and proposes a new model of syntax that is better suited for interdisciplinary interactions, and shows how syntax can proceed free of lexical influence. The empirical domain examined is vast, and all the fundamental units and properties of syntax (categories, parameters, Last Resort, labelling, and hierarchies) are rethought. Opening up new avenues of investigation, this book will be invaluable to researchers and students in syntactic theory, and linguistics more broadly.
Article
This article provides an overview of the issues that have motivated generative grammar and the changes within the specifically Chomskian line of thinking which culminates in the Minimalist Program (MP). Some of the Minimalist considerations were central to Chomsky's approach to language from the very beginning, at a time when psychology was dominated by behaviourism and linguistic studies were the property of two schools of thought. The concept I-language presents a sharp divide between construction-based and principle-based linguistics. MP, and in particular the theories integrated within the Strong Minimalist Thesis (SMT), have provided us not only with an analysis of the kind just seen but, more importantly, with the possibility for such an analysis. © 2007 editorial matter and organization Gillian Ramchand and Charles Reiss the chapters their various authors. All rights reserved.
Article
This paper explores the relevance of the so-called Chomsky Hierarchy (ChH) when studying I-language phenomena, reinterpreting it in terms of the operation Merge. It is argued that all ChH levels of complexity discussed in Chomsky (1956) (i.e., finite state, phrase structure, and transformations) are present in the Faculty of Language, with (possible) consequences for language variation.
Book
This work is the culmination of an eighteen-year collaboration between Ken Hale and Samuel Jay Keyser on the study of the syntax of lexical items. It examines the hypothesis that the behavior of lexical items may be explained in terms of a very small number of very simple principles. In particular, a lexical item is assumed to project a syntactic configuration defined over just two relations, complement and specifier, where these configurations are constrained to preclude iteration and to permit only binary branching. The work examines this hypothesis by methodically looking at a variety of constructions in English and other languages.
Book
The function of vertebrate hearing is served by a surprising variety of sensory structures in the different groups of fish, amphibians, reptiles, birds, and mammals. This book discusses the origin, specialization, and functional properties of sensory hair cells, beginning with environmental constraints on acoustic systems and addressing in detail the evolutionary history behind modern structure and function in the vertebrate ear. Taking a comparative approach, chapters are devoted to each of the vertebrate groups, outlining the transition to land existence and the further parallel and independent adaptations of amniotic groups living in air. The volume explores in depth the specific properties of hair cells that allowed them to become sensitive to sound and capable of analyzing sounds into their respective frequency components. Evolution of the Vertebrate Auditory System is directed to a broad audience of biologists and clinicians, from the level of advanced undergraduate students to professionals interested in learning more about the evolution, structure, and function of the ear.
Article
As the evidence accumulates fo epigenetics, researchers reacquire a taste for Lamarckism.
Book
In What Counts, Elena Herburger considers the effects of focus on interpretation. She investigates how focus affects the pragmatics and truth conditions of a sentence by rearranging its quantificational structure. Adopting a neo-Davidsonian stance, Herburger claims that various pragmatic and truth-conditional effects of focus sustain a uniform explanation if focus is viewed as imposing structure on otherwise unrestricted quantification. Phenomena discussed include "free" focus, the interaction between focus and negation, the quantificational structure of adverbs of quantification, the semantics of only and even, and the differences between weak and strong determiners. One of Herburger's aims is to show that a simple semantics, without reliance on such notions as semantic presupposition, can account for the truth-conditional and pragmatic effects of focus. The book will be of interest to anyone exploring the syntax-semantics interface and current theories of quantification. Linguistic Inquiry Monograph No. 36
Article
"On The Definition of Word" develops a consistent and coherent approach to central questions about morphology and its relation to syntax. In sorting out the various senses in which the word "word" is used, it asserts that three concepts which have often been identified with each other are in fact distinct and not coextensive: listemes (linguistic objects permanently stored by the speaker); morphological objects (objects whose shape can be characterized in morphological terms of affixation and compounding); and syntactic atoms (objects that are unanalyzable units with respect to syntax).The first chapter defends the idea that listemes are distinct from the other two notions, and that all one can and should say about them is that they exist. A theory of morphological objects is developed in chapter two. Chapter three defends the claim that the morphological objects are a proper subset of the syntactic atoms, presenting the authors' reconstruction of the important and much-debated Lexical Integrity Hypothesis. A final chapter shows that there are syntactic atoms which are not morphological objects.Anne Marie Di Sciullo is in the Department of Linguistics at the University of Quebec. Edwin Williams is in the Department of Linguistics at the University of Massachusetts. "On The Definition of Word" is Linguistic Inquiry Monograph 14.
Article
* I would like to thank Bill Batchelder, David Laberge, and Ken Wexler for a number of interesting discussions which helped me in writing this review.
Article
This paper aims to provide insight into the contentious issue of the hypothesised innate basis to domain specific, modular aspects of language [Fodor, F.J., The Modularity of Mind. MIT Press, Cambridge, MA, 1983; Chomsky, N., Lectures on government and binding. Foris, Dordrecht, 1981]. In order to do this a distinction is made between modular language abilities, non-modular language abilities (i.e. language abilities outside the language module in question) and non-linguistic cognitive tasks. Based on this distinction, a series of investigations were carried out into the abilities of a 10 year old boy (AZ) with Grammatical specific language impairment. The investigations of modular language abilities focused on inflectional morphology (agreement, and regular and irregular past tense marking) and syntax (knowledge and use of phrase structure, thematic role assignment and Binding Principles). The investigations of non-modular language abilities included tests of pragmatic inference and verbal analogical reasoning. Non-linguistic assessments included standardised tests and a test of visual transitive inference. A clear dissociation was revealed between AZ's severely impaired modular language abilities and good non-modular and non-linguistic abilities. AZ's performance on the tests was compared with 36 younger language control children (aged 5:4–8:9) and 12 age matched control children. Z-scores computed for AZ's performance in comparison with the normal children, based on standardised language measures (vocabulary, morphology) and his chronological age, revealed a significant impairment in morpho-syntactic abilities and normal or above average abilities on the non-modular language tasks and non-linguistic cognitive tasks. I propose that the discreteness of AZ's language impairment indicates that an underlying modular language impairment is the most parsimonious explanation for his deficit. A deficit with syntactic structural representations characterised by the Representational Deficit for Dependent Relationships [van der Lely, H.K.J. and Stollwerck, L., Binding theory and specifically language impaired children. Cognition, 1997.]can account for his morphological and syntactic impairments. The data provide empirical evidence to support the innate bases to domain specific and modular aspects of language.