ChapterPDF Available


Embodied language processing; Emergentist approaches to language; Sociocultural theories of language; Usage-based linguistics.
Modern Theories of Language
Vera Kempe
and Patricia J Brooks
Abertay University, Dundee, UK
City University of New York, New York, NY,
Embodied language processing;Emergentist
approaches to language;Sociocultural theories
of language;Usage-based linguistics
Modern theories of language represent efforts to
account for the evolution, acquisition, and
processing of language within an integrated
framework. Such efforts acknowledge the rela-
tionship of language to sensorimotor experience,
social interaction, and general cognitive con-
straints on information processing.
The study of language is a diverse eld that
engages the applied interests of educators and
speech pathologists as well as the academic inter-
ests of researchers working in wide range of dis-
ciplines including linguistics, speech and hearing
sciences, psychology, biology, computer science,
philosophy, sociology, and anthropology.
Scholars in these different disciplines invariably
conceptualize language in different ways, viewing
it, e.g., as a biological attribute, a cultural trait, a
set of communication skills, or a normative sys-
tem of signs. This entry reviews modern theories
of language that adopt the view of human lan-
guage as a complex form of human behavior.
The review is organized in accordance with
three perspectives considered to be complemen-
tary, as opposed to mutually exclusive: an evolu-
tionary perspective that focuses on the origins of
what are seemingly unique features of human
languages; a developmental perspective that high-
lights the social-interactional contexts in which
children acquire language as well as relationships
between perception, cognition, and action in
development; and a cognitive perspective that
examines the processing mechanisms that deter-
mine how language use unfolds in real time.
Theories of Language Evolution
and Change
Human languages seem to display many discon-
tinuities when contrasted with the communication
systems of other species. These include duality of
patterning, symbolic signs, vast vocabularies,
syntactic rules, and propositional structure, all of
which allow for unlimited productivity in the cre-
ation of communicative signals whose meaning
#Springer International Publishing AG 2016
T.K. Shackelford, V.A. Weekes-Shackelford (eds.), Encyclopedia of Evolutionary Psychological Science,
DOI 10.1007/978-3-319-16999-6_3321-1
transcends the here and now. Modern theories of
language evolution attempt to understand precur-
sors to these discontinuous traits and what scenar-
ios may have given rise to their emergence.
The study of the evolution of language presents
extraordinary methodological challenges. The
comparative method involves the study of homol-
ogous traits to uncover potential common ances-
try as well as the study of analogous traits that
have emerged across different lineages to under-
stand common selection pressures. This method
has revealed strikingly few parallels between the
vocal communicative repertories of humans and
their nearest relatives, the great apes, although
there may be greater similarities in nonverbal
forms of communication such as gesture. At the
same time, the comparative method has revealed
some striking similarities between humans and
songbirds with respect to the imitative behaviors
of juveniles and the role of social feedback in
shaping immature vocalizations.
The comparative method is fraught with dif-
culties when trying to establish which exact traits
constitute analogues or homologues in species
that do not possess the faculty of language.
Paleo-anthropological methods are of limited use
since the fossil record does not contain clear traces
of anatomical prerequisites (i.e., vocal tract struc-
ture) or of neural adaptations associated with lan-
guage. Similarly, archeological methods rely on
inferences about the link between the notoriously
incomplete record of artifacts and the cognitive
abilities involved in their production and use.
Finally, the recent applications of molecular biol-
ogy that try to trace the emergence of language
depend on the growing, but currently still limited
knowledge about the genetic underpinnings of
this unique human ability. Thus, in the absence
of hardevidence, the extant theories of lan-
guage evolution and change apply conjecture or
reverse engineering, or use computational model-
ing and experimental studies of specic selection
pressures that operate during biological evolution
and cultural transmission to provide proof of con-
cept for basic principles that may underlie the
emergence of language.
Evolutionary theories of language often differ
with respect to what is regarded as the
phenomenon to be explained. Drawing on de
Saussures distinction between language as a sys-
tem of signs (langue) and language as the prod-
uct of the application of knowledge about this
system (langageor parole), linguistic theory
has upheld the conceptual distinction between
competence and performance. The concept of lan-
guage performance describes the behaviors asso-
ciated with language use such as the
comprehension, production, and learning of lin-
guistic signals. These behaviors and their anatom-
ical, neural, and cognitive underpinnings,
including domain-general cognitive mechanisms
such as sensorimotor processing, working mem-
ory, planning, cognitive control, conceptual rep-
resentations, and intentionality, have recently
been termed the Faculty of Language in the
Broad Sense(Hauser et al. 2002), and are the
subject of study of the discipline of psycholinguis-
tics with its vast arsenal of experimental and neu-
rophysiological paradigms.
Theories of the evolution of the underpinnings
of human language have been concerned with
abilities like control over vocalization and ges-
ture, vocal and gestural learning (including imita-
tion), sharing of conceptual representations, and
intention reading, which is required for the recog-
nition of conspecicsbehaviors as communica-
tive signals. While there is debate as to whether
language is homologous with animal vocal sig-
naling systems like bird song or whether precur-
sors of language may have initially arisen in the
gestural modality, there is consensus that many of
these abilities constitute adaptations to a variety of
selection pressures that may not be related to
language, but were likely to be related to social
organization, mate choice, and tool use, and are
the product of a gradual and continuous process of
evolution by natural selection.
In contrast, language competence has been
viewed as a cognitive capacity encompassing
knowledge of a nite set of rules that can be
used to generate an innite number of utterances.
It is this set of rules called a Generative
Grammar which is seen as being at the heart of
the human faculty for language. According to the
theory of Universal Grammar, for a child to
acquire the Generative Grammar of a specic
2 Modern Theories of Language
language they must have innate knowledge of
universal constraints that limit the types of struc-
tures that occur in human languages. Studying the
psychological reality of Universal Grammar (i.e.,
the innate substrate for acquiring a Generative
Grammar) is complicated by the fact that formal
descriptions of what constitutes knowledge of
grammar have changed since the 1960s: The Stan-
dard Theory encompassed the notions of a deep
structure describing the underlying logical rela-
tionships between the parts of a sentence, and a
surface structure describing the specic manifes-
tations of how those parts are assembled based on
a set of transformation rules, the specication of
which underwent major revisions in subsequent
editions of the theory throughout the 1970s. In the
1980s, the Principles-and-Parameters Framework
(aka Government and Binding Theory) viewed
Universal Grammar as an innate set of principles
comprising phrase-structure rules that specify
hierarchical relationships (Government) and rela-
tionships of coreference (Binding) between
words, and a set of parameters, i.e., values that
dene variability in language-specic manifesta-
tions of these principles, the specic values of
which are set upon receiving input during the
process of acquisition. Finally, in the 1990s, the
Minimalist Program narrowed the human faculty
for language down to a basic combinatorial oper-
ation, Merge, which hierarchically combines pairs
of elements, consisting of a head and its comple-
ment, into a superordinate unit that inherits the
properties of the head (e.g., in the phrase big dog,
the adjective big serves as the dependent of dog,
and thus modies its meaning). Crucially, the
product of Merge can subsequently combine
with another element (either as dependent or
head) to create a new unit (e.g., the big dog),
thereby implementing the fundamental property
of recursion considered at the core of the human
Faculty for Language in the Narrow Sense
(Hauser et al. 2002). As theories of the evolution
of language predate the proposal of the concept of
the Faculty for Language in the Narrow Sense
this entry retains the term Universal Grammar
when talking about the evolution of language
Language as Product of Biological Adaptation
Theories that explain the evolution of Universal
Grammar in analogy with the evolution of biolog-
ical traits can be distinguished with respect to
whether evolution is assumed to have taken
place continuously or in discontinuous jumps.
Adaptationist theories propose that Universal
Grammar evolved gradually following principles
of Darwinian selection because it confers repro-
ductive benets by supporting language acquisi-
tion. In analogy to the evolution of the visual
system, the central argument is that of complexity
of design: In order for a cognitive entity as com-
plex as Universal Grammar to evolve it must have
been ne-tuned to t its purpose through a lengthy
process of continuous adaptation. As a result,
humans are equipped with a set of innate, a priori
constraints that enable them to solve the logical
problem of language acquisition by allowing chil-
dren to derive correct structural generalizations
from limited input. Nonadaptationist theories
argue that Universal Grammar has emerged fairly
recently (~9050 K ago) as the result of a chance
mutation. This view draws on two lines of evi-
dence: (a) archeological evidence for a dramatic
rise in technological advancement attributable to
artifacts that can be dated from 50 K years
onward, and (b) evidence for a mutation in the
FOX P2 gene that occurred within the last 100 K
years. However, given the exceedingly small like-
lihood of a complex set of constraints such as
Universal Grammar arising through a chance
mutation, proponents of the nonadaptationist
view have suggested that the mutation affected
only a minimal, core element of
grammar recursion (Hauser et al. 2002).
Whether the ability to apply Merge recursively
provides a sufcient advantage for the language-
learning child to derive generalizations about
grammatical structure continues to be debated,
with some theorists providing evidence that
Merge itself could be learned from the input,
rather than existing as innate knowledge (Ninio
Language as Product of Cultural Transmission
Critics of the idea that Universal Grammar is a
biological adaptation argue that this view is
Modern Theories of Language 3
fraught with a number of fallacies: Given human
dispersion and linguistic variability, it is unclear
how an arbitrary set of universal rules could have
evolved as an adaptation to different specic lin-
guistic environments. Secondly, it is unclear why
only innate representations of highly abstract
grammatical properties should have evolved,
rather than predispositions to learn other specic
aspects of an ambient language such as its sound
structure or vocabulary. Finally, it is unlikely that
modications in the genotype could have caught
up with the rapidly moving targetof language
which is liable to change over the course of even
just a few generations.
Instead, recent theorizing about language evo-
lution views language as a cultural trait that has
evolved through cultural transmission via social
interaction. Thus, it is not the human brain that has
adapted to language but languages have adapted
to be learnable and usable by humans
(Christiansen and Chater 2016). Language is
seen as a product of cumulative cultural, rather
than biological, evolution, shaped by require-
ments to be transmissible across generations and
expressive in communication. As a result, lan-
guage is suited to the capacities of the human
perceptual-motor system; the architecture of the
cognitive mechanisms that underlie learning,
memory, and information processing; the nature
of mental representations and thought; and the
pragmatic constraints that govern human
Evidence for this view comes from agent-
based computational simulations and laboratory
experiments that manipulate various aspects of
cultural transmission of language-like systems to
determine how linguistic structure emerges at var-
ious levels. These lines of research suggest that
combinatorial structure emerges in response to the
need to transmit signals through noisy channels
while compositional structure, i.e., the consistent
linking of form-based features to dimensions of
meaning, emerges from the combined pressure of
transmitting signals through limited capacity
memory systems coupled with the need for users
to be communicatively efcient.
Coevolution of Genes and Language
Gene-culture coevolution theories explore the
feedback between genetic and cultural inheritance
mechanisms. These theories draw on ideas from
domains in which the interplay between genetic
and cultural evolution in certain populations had
been demonstrated, such as evolution of lactose
tolerance as an adaptation to cattle farming, or of
light pigmentation as an adaptation to colder cli-
mates. The central idea is that culture creates new
environments, which then exert specic selection
pressures by inuencing mechanisms that enable
learning of ambient cultural traits. This view rec-
onciles the traditional juxtaposition of innateness
and learning. With respect to language, it implies
that although the genotype determines the avail-
able learning mechanisms, their selection is
governed by the structure of language that arises
from the interaction of many individuals over
time. This selection, in turn, leads to a weakening
of the role of innate biases that shape leaning
(Kirby et al. 2007). Indeed, recent modeling
work has demonstrated that strong universality
in behavior can emerge from weak biases through
the process of cultural transmission. Thus,
although the process of cultural evolution causes
languages to adapt to the human cognitive system,
the resultant structural properties of language
select for cognitive abilities that are ever better
suited to learning and processing of language.
Following similar principles, the human larynx
and the associated neural and cognitive control
of sound vocal abilities may also have evolved
in response to the pressure to produce more dis-
tinct sounds and words (Lieberman 2012). How-
ever, gene-culture coevolution theories that
postulate genetic adaptions to language at the
level of individual populations need to reconcile
this view with the current lack of evidence for
individual predispositions for acquiring specic
languages, as would be evident if children
adopted into different cultural and ethnic back-
grounds exhibited specic difculties in acquiring
features of the languages of their adopted families.
4 Modern Theories of Language
How Is Language Learned?
Acknowledging the shift away from the nativist
stance that regards language acquisition as a mat-
urational process built on an innate blueprint for a
formal generative grammar, modern theories of
language development view rst and second lan-
guage learning as a process of skill acquisition
(Christiansen and Chater 2016; Ninio 2006),
wherein learners develop uency in processing
language in real time, i.e. converting input
consisting of sequences of sounds (or signs) into
mental representations of speakerscommunica-
tive intentions and generating sequences of
sounds or signs to express their own communica-
tive intentions. Under this view, the fundamental
context for language acquisition involves joint
activities where speakers and listeners share com-
mon ground and can make reasonable inferences
about their partners communicative goals (Clark
1996; Tomasello 1999). The relationship between
language learning and the social environment is
reciprocal: the developing ability to use language
facilitates social interaction, and social interaction
constrains and guides the process of language
Social Shaping
In the context of social interaction, language
learning involves sustained engagement with par-
ents, siblings, and other caregivers. Many theories
include social learning as an integral component
of language acquisition and consider how feed-
back from caregivers supports its development.
From early infancy, caregivers and infants coor-
dinate the timing of their communicative bids
when engaged in face-to-face (dyadic) social
interaction, using eye gaze as well as vocalization
to modulate the amount and intensity of social
stimulation. Such coordination of communicative
effort fosters the development of a sense of con-
nection between caregiver and infant while build-
ing a mutual awareness of doing something
together. Recent experimental work demonstrates
how contingent social feedback serves to promote
more advanced speech-like vocalizations (i.e.,
canonical babbling) in prelinguistic infants
(Goldstein et al. 2003). In this and similar studies,
caregivers were prompted via an earpiece to
afrm their infants communicative attempts by
smiling, touching them, or imitating their vocali-
zations, with half of the infants receiving contin-
gent feedback (i.e., the prompts were timed to
immediately follow the infants vocalizations),
while the other half received noncontingent feed-
back (i.e., the prompts were yoked to the timing of
the vocalizations of a different infant from a pre-
viously recorded session). Across studies, the pro-
vision of contingent feedback was found to
increase both the quantity and quality of infant
vocalizations, such that infants who received
prompt, contingent feedback made a greater num-
ber of communicative attempts to engage their
caregivers and produced more mature, canonical
forms of babbling relative to the yoked controls.
Similarly, in longitudinal designs, individual dif-
ferences in maternal responsiveness to their
infants communicative bids have been shown to
predict the timing of subsequent language mile-
stones, including the emergence of rst words,
50 words in expressive language, combinatorial
speech, and talk about past events (Tamis-
LeMonda et al. 2001), with maternal repetition
or imitation of the childs speech proving to be
the most impactful form of feedback for later
developmental milestones.
Embodied Cognition as Expressed in Gesture
and Speech
For infants, one of the major challenges involved
in learning a language is to discern the referents of
the words heard in the ambient language a pro-
cess referred to as word-to-world mapping. Care-
givers play an important role in facilitating this
process by talking about what the infant already
has in mind; for instance, by naming objects that
the infant is pointing at or holding. Most early
verbs and other relational terms like up or off are
acquired in contexts where the infant is involved
in purposeful activity, which allow, for example,
the infant to map a word such as up onto their
efforts to be picked up. Modern theories of lan-
guage have come to recognize that the semantic
representations associated with linguistic forms
are multimodal in nature and reect the sensori-
motor experiences of learners. Multimodal
Modern Theories of Language 5
representations are one feature of embodied cog-
nition wherein conceptual processing involves
simulation or mental reenactment of previously
experienced situations (Barsalou 2009). In chil-
dren, as well as adults, accessing verb meanings is
associated with neural activity in motor areas
(frontal cortex), which varies as a function of the
type of verb, e.g., handverbs such as clap or
throw vs. legverbs such as run or chase, with
self-generated action seeming to play a key role in
the development of these neural signatures of
sensorimotor activity during verb acquisition
(James and Swain 2011).
Notably, infants are learning the meanings of
words at the same time as they are acquiring a
wide range of communicative gestures, many of
which may be viewed as forms of simulated
action, as when the child puts their hands over
their heads as a request to be picked up (Tomasello
1999). Young childrens gestures seem to convey
their thoughts when they are at the cusp of acquir-
ing a new skill (Goldin-Meadow 2009). These
gestures provide crucial information to caregivers
about what their infant has in mind (i.e., what are
the infants communicative intentions), which
facilitates caregiversprovision of relevant input
to aid word-to-world mapping. Infants often
acquire corresponding gestures prior to learning
verbs like wave, sleep, or drink, with the word
subsequently mapped onto the gesture as a com-
ponent of its meaning.Gestures also seem to play
a key role in young childrens acquisition of com-
binatorial speech, as infants rst combine single
words with familiar gestures (e.g., pointing at a
cookie while saying more) prior to combining two
words into a single utterance (e.g., saying more
cookie). Deictic gestures (e.g., pointing at some-
thing), in particular, may be important for the
childs acquisition of a general combinatorial
operation like Merge, by serving as variable
expressions (i.e., with meanings corresponding
to pronouns like it or that) that change in meaning
depending on the context of use. Thus, learning to
point as a form of request at a multitude of objects
(e.g., cookies, milk, out-of-reach toys) paves the
way for the child to acquire corresponding phrases
such as get itwhich exemplify the situational
meanings and exible utility of linguistic
Bootstrapping in Complex Dynamical
In order to process incoming information lan-
guage learners need to rapidly incorporate this
information into existing representations.
A number of theories provide accounts for how
representations of the hierarchical structure of
language can be built up incrementally from
the input. The evidence suggests that from the
very rst moments of contact with language in
utero, infants become attuned to the statistical
regularities associated with the duration of vowels
and consonants and the resulting rhythm of the
language (Mehler et al. 1988). During infancy,
sensitivity to prosodic characteristics, i.e. intona-
tional and rhythmical patterns, provides a rich
source of information that facilitates discovery
of the underlying grammatical structure of a lan-
guage because prosodic and syntactic structures
tend to be aligned to a considerable degree
(Morgan and Demuth 1996). The process of dis-
covering the sound patterns and structures of the
ambient language is aided by the fact that child-
directed speech often exaggerates phonetic and
prosodic characteristics in ways that support the
discriminability of phonemes (i.e., meaning-
bearing speech sounds) and discovery of higher
order units, such as words, recurrent morphemes,
and phrasal constituents. Thus, the rich input sup-
ports online bootstrapping of linguistic structure
at multiple levels. More generally, the notion of
bootstrapping implies that as children acquire
aspects of language, such as the meanings of
concrete nouns that can be learned ostensively
through a process of word-to-world mapping
(e.g., pointing at an object to elicit its name),
they can make use of familiar forms (i.e., what
they have already learned) to aid them in learning
unfamiliar and more abstract words and structures
(Gleitman et al. 2005).
The notion of bootstrapping and learning as
being supported by affordances of the environ-
ment extends beyond language input: Language
development is grounded in the infants sensori-
motor experience, which undergoes abrupt
6 Modern Theories of Language
transitions as infants develop motor skills over the
rst year of life. Recent work has explored how
the transition from crawling to walking provides
infants with new ways of sharing their interest in
objects with others, which in turn may alter the
communicative dynamics of caregiver-child inter-
action. In a study focusing on 13-month-olds,
where half of the infants were already walking
and half were still crawling (Karasik et al. 2011),
walkers more often carried objects to their
mothers, shared their interest in objects from a
distance, and communicated their interest in shar-
ing an object while in motion. The observed dif-
ferences in how walkers and crawlers shared
objects with others had unexpected consequences
for communicative development by inuencing
how the mothers perceived their infantscommu-
nicative intentions. Mothers responded differen-
tially to the moving bids (favored by the walkers)
in comparison to the stationary bids (favored by
the crawlers) by producing more advanced action
directives, such as open it,when the child
offered or showed them the object while in transit.
These ndings suggest that developmental
changes in one domain (i.e., motor development)
can yield cascading effects on development in a
seemingly unrelated domain (i.e., language devel-
opment) by altering parent-child conversational
patterns, which, in turn, enriches the input to the
language-learning child.
Usage-Based Learning
In the face of decay, incoming linguistic informa-
tion needs to be processed rapidly by the learner.
As a result, learning is based on local contingen-
cies processed in small chunks rather than on
surveying and generalizing over large corpora of
previously encountered input. This assumption is
built into a number of theoretical frameworks that
focus on the dynamics of language usage, chunk
and pattern extraction, categorization and gener-
alization, and change over time; these include
connectionist, emergentist, and usage-based
approaches to language and development.
According to these approaches, infants and tod-
dlers learn language in a piecemeal fashion,
starting out by acquiring words and phrasal pat-
terns that are relevant to their daily routines, such
as mealtime or book reading, using mutually
understood activities and predictable communica-
tive formats to make sense of the accompanying
language. Through a process of cross-situational
statistical learning, young children hone in on the
meanings of words and at the same time become
sensitive to the unique ways that individual words
co-occur with other words, including their fre-
quency of occurrence in different syntactic and
situational contexts. Acquisition of item-specic
co-occurrence statistics supports priming,
wherein the processing of linguistic structure
becomes more uent and efcient as a function
of item and pattern familiarity. Storage of
precompiled units or chunks of language incurs
further benets by allowing the child to make
incremental predictions about what comes next
when processing language in real time. By
emphasizing statistical learning of distributional
information, which includes monitoring how
often different lexical items appear in different
grammatical constructions, these approaches pro-
vide dynamic accounts of how children go from
being fairly conservative learners to making the
sorts of constrained generalizations that allow
sophisticated language users to achieve the unlim-
ited expressive potential of human language.
How Is Language Used?
Language is transient. Humans are able to com-
prehend language at a rate of about 20 phonemes
per second despite the fact that the ability to
differentiate nonlinguistic sounds is limited to
about 1.5 per second. This implies that current
input is rapidly overwritten by new input, urging
immediate processing a phenomenon referred to
as the Now-or-Never bottleneck (Christiansen and
Chater 2016). Similarly, the planning of action
sequences, such as those required for speech pro-
duction, is temporally constrained as planning of
long sequences far in advance is hampered by
interference and forgetting. These constraints are
reected in a number of basic features of
processing systems, which are discussed below.
These basic features of the architecture of the
language processing system viewed as
Modern Theories of Language 7
consisting of a series of subprocesses that operate
in a cascaded manner utilizing different types of
information in an interactive manner in order to
interpret and even predict upcoming input are
thought to govern both language comprehension
and language production, although controversy
persists about the extent to which the two systems
share parts of their architecture and underlying
neural substrate (Pickering and Garrod 2013).
Crucially, the principles that underlie language
processing and communicative interaction may
be shared with processing architectures in a num-
ber of other domains such as visual perception,
action planning, or social cognition, rather than
being unique to language.
Levels of Processing
Theories of language processing postulate com-
ponents or stages that deal with different types of
information in the signal, such as phonological,
prosodic, lexical, morphological, syntactic,
semantic, and pragmatic information. Early theo-
ries were heavily inuenced by the idea of modu-
larity of processing of different types of
information (Fodor 1983). As a consequence, lan-
guage processing was assumed to handle different
types of information in several stages assembled
in a strictly sequential and independent way, such
that information processed at later stages could
not inuence processing at earlier stages. For
example, the process of utterance or sentence
comprehension was conceived of as parsing the
input to uncover the underlying syntactic struc-
ture, formalized as a phrase-structure grammar.
Parsing was viewed as incorporating the input
into a syntactic tree in incremental fashion follow-
ing a set of heuristics that minimized memory load
by limiting the number of higher-order syntactic
units that could be assembled at any given point in
time. Semantic processing was postulated to be
subsequent to the identication of syntactic struc-
ture, and semantic information could not be
admitted for resolving processing uncertainties at
earlier stages. The idea that processing proceeds
from lower to higher levels in strictly sequential
fashion was also echoed in early theories of lan-
guage production where information was propa-
gated sequentially in the reverse direction.
Incremental Cascaded Processing
More recent theories acknowledge that the rapidly
fading signal necessitates efcient conversion of
sensorimotor input into higher-level representa-
tions. This constraint is reected in the feature of
incremental processing, whereby small chunks of
input cycle through the various processing stages
so that earlier chunks are already integrated with
higher-level information while lower-level infor-
mation is still being processed to create later
chunks. For example, studies of speech
shadowing have shown that speakers routinely
correct errors in the speech stream, which sug-
gests that spoken word recognition proceeds
extremely rapidly based on partial input allowing
higher-level lexical information to be used to
assemble articulatory commands while lower-
level phonological information is still being
processed. For larger segments of input, like utter-
ances and sentences, the idea of incremental
processing was rst introduced by the two-stage
sausage machine model (Frazier and Fodor 1978),
which proposed that incoming words rst get
chunked into syntactic phrases, which are then
incrementally incorporated into a syntactic repre-
sentation. Modern theories assume a massively
cascaded architecture according to which the pro-
cessor chunks the incoming input and immedi-
ately passes the information on to higher levels
of information processing while at the same time
continuing to process lower-level information in
the subsequent input (Christiansen and Chater
An incremental cascaded architecture has also
been proposed in speech production theories,
which show based on empirical ndings from
picture-word interference tasks that higher-level
structure, e.g. the semantic and syntactic structure
of an utterance, is constructed incrementally and
on the y. Recent evidence from syntactic priming
suggests that speech planning involves a syntactic
planning phase that is independent from semantic
content and phonological form and liable to
reusing the structure of recently encountered
input (e.g. Pickering and Branigan 1998). The
idea of purely syntactic representations derives
from earlier theories of speech production,
which postulated separate syntactic
8 Modern Theories of Language
representations of words called lemmas
(in contrast to lexemes carrying phonological
information). In the production of continuous
speech, syntactic planning spans somewhat lon-
ger time windows than planning at lower levels,
e.g. planning of prosodic structure, which, in turn,
is planned further ahead than syllabic and phono-
logical structure, attesting to the cascaded nature
of the process. Controversy exists with respect to
the size of the preplanned components, and the
extent to which preparation of multiple compo-
nents involves parallel processing.
Interactive Processing
In comprehension, incremental chunk-by-chunk
processing runs into problems when correct inter-
pretation of current input depends on information
that only becomes available in the yet-to-be-
processed input. At the level of identifying indi-
vidual words, ambiguities can arise from distor-
tions in the acoustic signal (i.e., noisy input), from
a lack of one-to-one correspondence between the
signal and the intended word form (i.e., polysemy
and homonymy), or from competition from pho-
nologically related words with similar onsets (i.e.,
words in a cohort such as carbon,carton,carpen-
ter, etc.). Often, such uncertainties in spoken word
recognition cannot be alleviated through bottom-
up processing, thus requiring the listener to make
inferences about the intended word based on prob-
abilistic contextual cues. Empirical evidence
shows that the lexical status of the ambiguous
word as well as the coarticulatory properties of
the surrounding sounds can affect the sensitivity
of early sensory processing of speech sounds,
suggesting that the processing system is interac-
tive. In many instances of noisy input, interactive
processing includes the use of crossmodal sensory
integration (e.g., lip reading) to facilitate rapid
restoration of missing/distorted phonemes
(Massaro 1987). Moreover, higher-level informa-
tion can inuence processing retrospectively; for
example, accessing lexical knowledge about a
given word can inuence interpretation of ambig-
uous phonological information that was encoun-
tered within that word, for example, a sound with
a voice onset time between /d/ and /t/ is perceived
as /t/ at the end of /pi/ because /pit/ is a words but
/pid/ is not. Such effects have successfully been
implemented in connectionist word recognition
models such as TRACE (McClelland and Elman
Ambiguities can also arise in sentence
processing. For example, in so-called garden-
path sentences, such as The horse raced past the
barn fell or Put the frog on the napkin in the box,
listeners may be led down the garden path to form
an incorrect interpretation of the sentence. This
occurs when information early in the sentence,
e.g., the past participle raced, is misunderstood,
e.g., taken to be the main verb in the sentence as
opposed to a modier of horse (part of a reduced
relative clause). Early sequential theories postu-
lated that the processing system commits itself
deterministically to just one interpretation using
a set of processing heuristics. According to these
theories, listeners will subsequently revise their
interpretations of sentences only if incompatible
information is encountered, with reanalysis
manifesting itself in processing cost measurable
by error rates, reaction times or electrophysiolog-
ical markers indicating processing of unexpected
information. Crucially, the process of reanalysis is
thought to be triggered by various types of infor-
mation that are independent of syntactic structure,
such as semantics, discourse structure, pragmatic
inference, and/or real-world knowledge.
Reanalysis may not be fully available to young
children who show minimal evidence of revising
incorrect interpretations of temporarily ambigu-
ous sentences, like the examples given above
(Trueswell et al. 1999).
In contrast, interactive constraint-satisfaction
theories (MacDonald et al. 1994) permit the
processing system to maintain several alternative
interpretations of a sentence in parallel, with via-
ble interpretations continuously constrained by
multiple information sources (see list above for
sources of information guiding the process of
reanalysis), thereby limiting the need for second-
pass revisions under most circumstances. Thus, a
phrase such as The witness interrogated... is
much less likely to be interpreted as a noun +
main verb than the similarly structured phrase
The horse raced... because a variety of factors,
including frequency of use, encourage the listener
Modern Theories of Language 9
to interpret the noun phrase The witness to be the
object of the verb interrogated. Such probabilistic
information may be obtained through statistical
learning of co-occurrence patterns in the input
(see subsection above on Usage-Based Learning),
although other information may reside in the con-
text of the specic communicative episode. There
remains controversy with respect to the amount of
information that can be retained in a memory
buffer before an ambiguity must be resolved and
the types of information that can be considered at
various processing stages. It has been suggested
that listeners often do not consult all available
sources of information in processing a sentence,
but rather engage in satiscing to generate good-
enoughrepresentations that minimize effort and
processing cost (Ferreira et al. 2002).
Top-Down Processing
As indicated, higher-level information can be
used to resolve uncertainty at lower levels of
processing either by aiding in the selection of the
correct interpretation among the various alterna-
tives activated by bottom-up processing, or by
selectively preactivating just one compatible
interpretation at the exclusion of others, even
though such top-down processing could lead to
misinterpretation of upcoming input. Controversy
has arisen about whether context-based facilita-
tion of bottom-up processing reects top-down
processing, or whether it just arises from priming,
i.e. from the spreading of lingering, yet rapidly
decaying, activation of already-processed input at
lower levels of representation. This controversy is
of greater relevance in sentence processing than in
word processing, where top-down processing has
been demonstrated by predictive eye movements
in the visual-world paradigm, for example, when
participants perform eye movements toward
objects predicted by the previous context
(Altmann and Kamide 1999). In contrast, in stud-
ies of sentence processing there is less agreement
on whether high-level information can inuence
bottom-up processing of lower-level information
directly, or whether it merely aids in selection of
different alternatives constructed by lower-level
processing. One reason for the difference between
word and sentence processing lies in the length of
the time windows necessary for processing rele-
vant information. Whereas top-down inuences
on word recognition occur within a relatively
short time window (not overly taxing on memory
and processing resources), top-down inuences
on sentence processing require a much longer
time window for morphological, syntactic, and
semantic processing. However, recent neurophys-
iological evidence supports the idea of top-down
effects in sentence processing: differences in
low-frequency oscillatory neural activity indicate
reciprocal entrainment of cortical networks asso-
ciated with lower and higher-level information
processing prior to point at which disambiguating
input is encountered (Lewis and Bastiaansen
Predictive Processing
By denition, introducing top-down inuences
into a cascaded processing system makes
processing predictive. Prediction safeguards the
system against loss of information in the input
due to noise in the transmission channel and in
neural processing. The exact nature of how pre-
diction operates is currently the subject of consid-
erable debate. In its minimal sense, prediction
implies that contextual effects from other types
of information can inuence the state of the
processing system before further bottom-up
processing has taken place. Theories differ with
respect to whether prediction operates in a deter-
ministic or probabilistic manner. As described
above, early theories were deterministic in that
they proposed top-down processing to favor one
possible interpretation, typically the strongest
contender of all possible interpretations, which
was either the simplest in terms of memory load
or the most frequent. In cases of mismatch
between the predicted and incoming input (as in
the case of garden-path sentences), the processing
system had to reanalyze the input to identify the
alternative with the greatest plausibility. Recent
evidence, however, favors graded prediction
effects, which are determined by the surprisal of
a given continuation, i.e. by the amount of new
information that would potentially be gained by
processing this particular alternative: the higher
the surprisal, the higher the processing cost. Thus,
10 Modern Theories of Language
recent theories propose that several alternatives,
each weighted by a certain strength of belief, can
predictively be computed and remain active in the
processing system until a resolution has been
achieved. The process of resolution is proposed
to follow an idealor rational observermodel,
which engages in incremental belief updating
using Bayesian inference to change an existing
prior distribution of probabilities for the various
interpretation alternatives to a new posterior prob-
ability distribution, which, in turn, becomes the
prior distribution for the next processing cycle.
Modern theories are trying to illuminate how peo-
ple trade off the benets and costs associated with
predictive preactivation in different domains of
language processing.
Communicative Interaction
Traditional theories of language processing were
developed to account for empirical ndings of
laboratory studies involving single speakers or
listeners subjected to sophisticated manipulations
of language input in the absence of a conversa-
tional partner. Recent theorizing has increasingly
shifted the focus toward understanding how
humans use language to engage cooperatively
with others (Clark 1996) and how the production
and comprehension systems of interlocutors inter-
act when taking turns in conversation (Pickering
and Garrod 2013). Modern theories of referential
communication attempt to apply principles of
interactive processing and Bayesian prediction to
explore how speaker intent can be encoded and
recovered efciently in the face of pragmatic con-
straints on the amount of information conveyed in
speech (Frank and Goodman 2012). Processing
principles derived from comprehension and pro-
duction research can also be applied to model the
communicative interaction itself by demonstrat-
ing how mechanisms like priming, inference, and
monitoring of output lead to interactive alignment
of linguistic signals at various levels of informa-
tion processing, with the ultimate goal of achiev-
ing an alignment between interlocutorsinternal
representations. Here, controversies persist
around the question of audience
design”–under what conditions humans engage
in strategic and volitional attempts to align the
form of their output and adjust its informativeness
in relation to the expectations, needs, and assumed
mental models of different conversational part-
ners. Furthermore, the question as to how lan-
guage is embedded in, and governed by,
communicative behavior in general has led to pro-
posals about the evolutionary primacy of turn-
taking behavior and its ontological independence
from language (Levinson 2016).
Modern theories of language try to understand
how language use arises from domain-general,
embodied learning and processing mechanisms
operating in service of communication that takes
place in richly structured social environments.
Research has begun to move away from the
study of language as the behavior of individuals
to consider how it arises as a cooperative enter-
prise within social groups ranging in size from
dyads to global communities. Contemporary the-
ories of language are informed by powerful math-
ematical and computational models, big-data
approaches to the analysis of natural language
corpora, neurophysiological methods, and tradi-
tional laboratory experiments, designed to illumi-
nate social as well as cognitive mechanisms
underpinning language learning and processing.
These complementary methodologies have the
potential to offer new insights for understanding
language as an emergent property of the biologi-
cal substrate of individuals engaged in complex,
hierarchical social interactions.
Communication and Developmental
Early Theories of Language
Language Acquisition
Language Acquisition in Infants and Toddlers
Language Development
Language Instinct, The
Modern Theories of Language 11
Linguistic Evolution
Modeling Language Transmission
Noam Chomsky and Linguistics
Phonemes and Symbols
Physiology of Language
Pinkers (1994) The Language Instinct
Steven Pinker and Language Development
Words and Rules
Altmann, G. T., & Kamide, Y. (1999). Incremental inter-
pretation at verbs: Restricting the domain of subsequent
reference. Cognition, 73(3), 247264.
Barsalou, L. W. (2009). Simulation, situated conceptuali-
zation, and prediction. Philosophical Transactions of
the Royal Society of London B: Biological Sciences,
364(1521), 12811289.
Christiansen, M. H., & Chater, N. (2016). Creating lan-
guage: Integrating evolution, acquisition, and
processing. Cambridge, MA: The MIT Press.
Clark, H. H. (1996). Using language. Cambridge, UK:
Cambridge University Press.
Ferreira, F., Bailey, K. G., & Ferraro, V. (2002). Good-
enough representations in language comprehension.
Current Directions in Psychological Science, 11(1),
Fodor, J. A. (1983). The modularity of mind: An essay on
faculty psychology. Cambridge, MA: The MIT Press.
Frank, M. C., & Goodman, N. D. (2012). Predicting prag-
matic reasoning in language games. Science,
336(6084), 998998.
Frazier, L., & Fodor, J. D. (1978). The sausage machine:
A new two-stage parsing model. Cognition, 6(4),
Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., &
Trueswell, J. C. (2005). Hard words. Language Learn-
ing and Development, 1(1), 2364.
Goldin-Meadow, S. (2009). How gesture promotes learn-
ing throughout childhood. Child Development Per-
spectives, 3(2), 106111.
Goldstein, M. H., King, A. P., & West, M. J. (2003). Social
interaction shapes babbling: Testing parallels between
birdsong and speech. Proceedings of the National
Academy of Sciences, 100(13), 80308035.
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The
faculty of language: What is it, who has it, and how did
it evolve? Science, 298(5598), 15691579.
James, K. H., & Swain, S. N. (2011). Only self-generated
actions create sensori-motor systems in the developing
brain. Developmental Science, 14(4), 673678.
Karasik, L. B., Tamis-LeMonda, C. S., & Adolph, K. E.
(2011). Transition from crawling to walking and
infantsactions with objects and people. Child Devel-
opment, 82(4), 11991209.
Kirby, S., Dowman, M., & Grifths, T. L. (2007). Innate-
ness and culture in the evolution of language. Proceed-
ings of the National Academy of Sciences, 104(12),
Levinson, S. C. (2016). Turn-taking in human communi-
cation: Origins and implications for language
processing. Trends in Cognitive Sciences, 20(1), 614.
Lewis, A. G., & Bastiaansen, M. (2015). A predictive
coding framework for rapid neural dynamics during
sentence-level language comprehension. Cortex, 68,
Lieberman, P. (2012). Vocal tract anatomy and the neural
bases of talking. Journal of Phonetics, 40(4), 608622.
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S.
(1994). The lexical nature of syntactic ambiguity reso-
lution. Psychological Review, 101(4), 676703.
Massaro, D. W. (1987). Speech perception by ear and eye:
A paradigm for psychological inquiry. Mahwah:
McClelland, J. L., & Elman, J. L. (1986). The TRACE
model of speech perception. Cognitive Psychology,
18(1), 186.
Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N.,
Bertoncini, J., & Amiel-Tison, C. (1988). A precursor
of language acquisition in young infants. Cognition,
29(2), 143178.
Morgan, J. L., & Demuth, K. (Eds.). (1996). Signal to
syntax: Bootstrapping from speech to grammar in
early acquisition. Mahwah: Erlbaum.
Ninio, A. (2006). Language and the learning curve: A new
theory of syntactic development. Oxford: Oxford Uni-
versity Press.
Ninio, A. (2014). Learning a generative syntax from trans-
parent syntactic atoms in the linguistic input. Journal of
Child Language, 41, 12491275.
Pickering, M. J., & Branigan, H. P. (1998). The represen-
tation of verbs: Evidence from syntactic priming in
language production. Journal of Memory and Lan-
guage, 39(4), 633651.
Pickering, M. J., & Garrod, S. (2013). An integrated theory
of language production and comprehension. Behav-
ioral and Brain Sciences, 36(4), 329347.
Tamis-LeMonda, C. S., Bornstein, M. H., & Baumwell,
L. (2001). Maternal responsiveness and childrens
achievement of language milestones. Child Develop-
ment, 72(3), 748767.
Tomasello, M. (1999). Cultural origins of human cogni-
tion. Cambridge, MA: Harvard University Press.
Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L.
(1999). The kindergarten-path effect: Studying on-line
sentence processing in young children. Cognition,
73(2), 89134.
12 Modern Theories of Language
... Since the beginning of language science, linguists have distinguished two main fields of study: (i) expressive language for syntactic or articulatory aspects of language; and (ii) receptive language for language perception and comprehension. Verbal production (externalization process or "spell-out"; Bolhuis et al., 2014), the expressive language aspect closest to the physical world (de Boer, 2011;Fitch, 2010), is intimately related to embodied or "grounded" cognition (Kempe & Brooks, 2016). Several past observations have suggested that language production and the emergence of human-specific articulatory gestures may be the direct precursor of syntax (Carstairs-McCarthy, 1999;Studdert-Kennedy, 2005). ...
Full-text available
The field of neurocognition is currently undergoing a significant change of perspective. Traditional neurocognitive models evolved into an integrative and dynamic vision of cognitive functioning. Dynamic integration assumes an interaction between cognitive domains traditionally considered to be distinct. Language and declarative memory are regarded as separate functions supported by different neural systems. However, they also share anatomical structures (notably, the inferior frontal gyrus, the supplementary motor area, the superior and middle temporal gyrus, and the hippocampal complex) and cognitive processes (such as semantic and working memory) that merge to endorse our quintessential daily lives. We propose a new model, "L∪M" (i.e., Language/union/Memory), that considers these two functions interactively. We fractionated language and declarative memory into three cognitive dimensions, Embodiment-Formulation-Internalization, that communicate reciprocally. We formalized their interactions at the brain level with a connectivity-based approach. This new taxonomy overcomes the modular view of cognitive functioning and reconciles functional specialization with plasticity in neurological disorders.
Full-text available
The field of neurocognition is currently undergoing a significant change of perspective. Traditional neurocognitive models evolved into an integrative and dynamic vision of cognitive functioning. Dynamic integration assumes an interaction between cognitive domains traditionally considered to be distinct. Language and declarative memory are regarded as separate functions supported by different neural systems. However, they also share anatomical structures (notably, the inferior frontal gyrus, the supplementary motor area, the superior and middle temporal gyrus, and the hippocampal complex) and cognitive processes (such as semantic and working memory) that merge to endorse our quintessential daily lives. We propose a new model, "L∪M" (i.e., Language/union/Memory), that considers these two functions interactively. We fractionated language and declarative memory into three fundamental dimensions or systems (“Receiver-Transmitter”, “Controller-Manager” and “Transformer-Associative” Systems), that communicate reciprocally. We formalized their interactions at the brain level with a connectivity-based approach. This new taxonomy overcomes the modular view of cognitive functioning and reconciles functional specialization with plasticity in neurological disorders.
Full-text available
ABSTRACT We examined parents' two-word utterances expressing core syntactic relations in order to test the hypothesis that they may enable children to derive the atoms of hierarchical syntax, namely, the asymmetrical Merge/Dependency relation between pairs of words, and, in addition, to identify variables serving generative syntactic rules. Using a large English-language parental corpus, we located all two-word utterances containing a verb and its subject, object, or indirect object. Analysis showed that parental two-word sentences contain transparent information on the binary dependency/merge relation responsible for syntactic connectivity. The syntactic atoms modelled in the two-word input contain natural variables for dependents, making generalization to other contexts an immediate possibility. In a second study, a large sample of children were found to use the same verbs in the great majority of their early sentences expressing the same core grammatical relations. The results support a learning model according to which children learn the basics of syntax from parental two-word sentences.
Language is a hallmark of the human species; the flexibility and unbounded expressivity of our linguistic abilities is unique in the biological world. In this book, Morten Christiansen and Nick Chater argue that to understand this astonishing phenomenon, we must consider how language is created: moment by moment, in the generation and understanding of individual utterances; year by year, as new language learners acquire language skills; and generation by generation, as languages change, split, and fuse through the processes of cultural evolution. Christiansen and Chater propose a revolutionary new framework for understanding the evolution, acquisition, and processing of language, offering an integrated theory of how language creation is intertwined across these multiple timescales.Christiansen and Chater argue that mainstream generative approaches to language do not provide compelling accounts of language evolution, acquisition, and processing. Their own account draws on important developments from across the language sciences, including statistical natural language processing, learnability theory, computational modeling, and psycholinguistic experiments with children and adults. Christiansen and Chater also consider some of the major implications of their theoretical approach for our understanding of how language works, offering alternative accounts of specific aspects of language, including the structure of the vocabulary, the importance of experience in language processing, and the nature of recursive linguistic structure.
Language development remains one of the most hotly debated topics in the cognitive sciences. In recent years, we have seen contributions to the debate from researchers in psychology, linguistics, artificial intelligence, and philosophy, though there have been surprisingly few interdisciplinary attempts at unifying the various theories. This book offers a new view of language development. Drawing on formal linguistic theory (the Minimalist Program, Dependency Grammars), cognitive psychology (Skill Learning) computational linguistics (Zipf curves), and Complexity Theory (networks), it takes the view that syntactic development is a simple process and that syntax can be learned just like any other cognitive or motor skill. This book develops a learning theory of the acquisition of syntax that builds on the contribution of the different source theories in a detailed and explicit manner. Each chapter starts by laying the relevant theoretical background, before examining empirical data on child language acquisition. The result is a bold new theory of the acquisition of syntax, unusual in its combination of Chomskian linguistics and learning theory. This book challenges many of our usual assumptions about syntactic development.
Most language usage is interactive, involving rapid turn-taking. The turn-taking system has a number of striking properties: turns are short and responses are remarkably rapid, but turns are of varying length and often of very complex construction such that the underlying cognitive processing is highly compressed. Although neglected in cognitive science, the system has deep implications for language processing and acquisition that are only now becoming clear. Appearing earlier in ontogeny than linguistic competence, it is also found across all the major primate clades. This suggests a possible phylogenetic continuity, which may provide key insights into language evolution.
There is a growing literature investigating the relationship between oscillatory neural dynamics measured using electroencephalography (EEG) and/or magnetoencephalography (MEG), and sentence-level language comprehension. Recent proposals have suggested a strong link between predictive coding accounts of the hierarchical flow of information in the brain, and oscillatory neural dynamics in the beta and gamma frequency ranges. We propose that findings relating beta and gamma oscillations to sentence-level language comprehension might be unified under such a predictive coding account. Our suggestion is that oscillatory activity in the beta frequency range may reflect both the active maintenance of the current network configuration responsible for representing the sentence-level meaning under construction, and the top-down propagation of predictions to hierarchically lower processing levels based on that representation. In addition, we suggest that oscillatory activity in the low and middle gamma range reflect the matching of top-down predictions with bottom-up linguistic input, while evoked high gamma might reflect the propagation of bottom-up prediction errors to higher levels of the processing hierarchy. We also discuss some of the implications of this predictive coding framework, and we outline ideas for how these might be tested experimentally. Copyright © 2015 Elsevier Ltd. All rights reserved.
How do children acquire the meaning of words? And why are words such as know harder for learners to acquire than words such as dog or jump? We suggest that the chief limiting factor in acquiring the vocabulary of natural languages consists not in overcoming conceptual difficulties with abstract word meanings but rather in mapping these meanings onto their corresponding lexical forms. This opening premise of our position, while controversial, is shared with some prior approaches. The present discussion moves forward from there to a detailed proposal for how the mapping problem for the lexicon is solved, as well as a presentation of experimental findings that support this account. We describe an overlapping series of steps through which novices move in representing the lexical forms and phrase structures of the exposure language, a probabilistic multiple-cue learning process known as syntactic bootstrapping. The machinery is set in motion by word-to-world pairing, a procedure available to novices from the onset, one that is efficient for a stock of lexical items (mostly nouns) that express concrete basic-level concepts. Armed with this foundational stock of "easy" words, learners achieve further lexical knowledge by an arm-over-arm process in which successively more sophisticated representations of linguistic structure are built. Lexical learning can thereby proceed by adding structure-to-world mapping methods to the earlier available machinery, enabling efficient learning of abstract items-the "hard" words. Thus acquisition of the lexicon and the clause-level syntax are interlocked throughout their course, rather than being distinct and separable parts of language learning. We concentrate detailed attention on two main questions. The first is how syntactic information, seemingly so limited, can affect word learning so pervasively. The second is how multiple sources of information converge to solve lexical learning problems for two types of verbs that pose principled obstacles for word-to-world mapping procedures. These types are perspective verbs (e.g., chase and flee) and credal verbs (e.g., think and know). As we discuss in closing, the outcome of the hypothesized learning procedure is a highly lexicalized grammar whose usefulness does not end with successful acquisition of the lexicon. Rather, these detailed and highly structured lexical representations serve the purposes of the incremental multiple-cue processing machinery by which people produce speech and parse the speech that they hear.
Being able to talk entails having both a species-specific tongue and neural mechanisms that can acquire and execute the motor acts that generate human speech. Darwinian modification of anatomy initially adapted for swallowing yielded the human tongue and supralaryngeal vocal tract (SVT). These species-specific adaptations extend the range of vowel formant frequencies to include the quantal vowels [i], [u] and [a]. The biological cost is an increased risk of choking. Speech is possible absent these sounds, as is the case for young children, but it is not as robust a means of communication. The tongues and SVTs of human newborn infants cannot produce quantal vowels, contrary to the claims of L-J Boe and his colleagues which are based on their VLAM modeling technique. VLAM modeling distorts newborn tongues and SVTs to conform to those of adult humans. Similar distortions invalidate VLAM modeling of the vocal anatomy of young children and reconstructions of fossil hominins. Although other mammals are able to lower their larynges, their tongues cannot form the shapes necessary to produce quantal vowels. The neural bases by which the complex motor control patterns necessary to produce speech are learned and executed involve cortical–basal ganglia neural circuits similar to those present in other primates. In humans, the FOXP2 transcriptional factor enhanced motor control, associative learning and other aspects of cognition by increasing synaptic plasticity and dendritic connectivity in the basal ganglia and other components of these circuits. Other transcriptional factors that differentiate humans from chimpanzees appear to enhance neuronal transmission. Cortical-to-cortical circuits undoubtedly play a role in human language, much as they do in other aspects of behavior. However, the direct, cortical-to-laryngeal neural circuits that Deacon and Fitch believe account for human speech do not exist. Constraints on neck length in fossil hominins permit new assessments of their SVTs, and an insight on when human neural capacities for speech production evolved. Fully human SVTs are not apparent until 50,000 years ago in the Upper Paleolithic European populations but must have been present much earlier in Africa. The neural capacity for speech motor control also must have present in early human African populations.