Content uploaded by David Spurrett
Author content
All content in this area was uploaded by David Spurrett on Jun 24, 2015
Content may be subject to copyright.
1. Introduction
One of the most fascinating puzzles to confront evolution-
ary biologists has to do with Homo sapiens’ ability for
speech. Why are we the only animals that talk? How and
when did our ancestors begin to formulate and spew forth
segmented bits of air into meaningful sequences, and what
behaviors led to the earliest language (protolanguage)? In
order to formulate hypotheses about the evolutionary un-
derpinnings that preceded the first glimmerings of speech
in early hominins, this article synthesizes findings from in-
fant and child development, psychology, primatology, and
anthropology.
It is widely recognized that acquisition of vocal language
is scaffolded onto the special sing-song way in which par-
ents vocalize to their infants, known as “baby talk” or moth-
erese (Dooling 1974; Ferguson 1977; Hirsh-Pasek &
Golinkoff 1996; Hirsh-Pasek et al. 1987; Karmiloff &
Karmiloff-Smith 2001; Monnot 1999; Snow 1972; 1998;
2002). As detailed below, the worldwide practice of direct-
ing musical speech toward human babies provides a tem-
porary framework or scaffold that, among other functions,
facilitates their eventual comprehension and production of
speech. Nevertheless, one school of thought views the main
feature that distinguishes motherese from adult-directed
(AD) speech, namely tone of voice or prosody, as a compo-
nent of a primate gesture-call system that is totally separate
from language. Burling (1993), for example, notes that
“tone of voice amounts to an invasion of language by some-
thing that is fundamentally different”(p. 30). However, be-
cause motherese is the medium in which infants around the
world initially perceive and eventually process their re-
spective languages, an analysis of its features may elucidate
the prelinguistic foundations of the protolanguage(s)
evolved by early hominins. Instead of separating prosody
from language, then, the view developed below is that
parental prosody is not only an integral component for
propagating language today, it also formed an important
substrate for the natural selection of protolanguage in early
Homo. In addition to focusing on infant-directed (ID) com-
munications of parents, clues for modeling the evolution of
prelinguistic behaviors are also gleaned from examining the
processes by which infants acquire languages.
Although there is a robust literature on the vocal aspects
of motherese, few workers have appreciated the important
parallel roles of mother-infant interactions in visual, ges-
tural, and tactile domains. For example, infant-directed
communications from mothers of 3- to 4-month-old infants
are frequently accompanied by exaggerated facial expres-
sions that have precursors in other primates and that signal
BEHAVIORAL AND BRAIN SCIENCES (2004) 27, 491–541
Printed in the United States of America
©2005 Cambridge University Press 0140-525X/04 $12.50 491
Prelinguistic evolution in early
hominins: Whence motherese?
Dean Falk
Department of Anthropology, Florida State University, Tallahassee, FL 32306-
7772
dfalk@fsu.edu http:// www.anthro.fsu.edu/people /faculty/falk.html
Abstract: In order to formulate hypotheses about the evolutionary underpinnings that preceded the first glimmerings of language,
mother-infant gestural and vocal interactions are compared in chimpanzees and humans and used to model those of early hominins.
These data, along with paleoanthropological evidence, suggest that prelinguistic vocal substrates for protolanguage that had prosodic fea-
tures similar to contemporary motherese evolved as the trend for enlarging brains in late australopithecines/early Homo progressively in-
creased the difficulty of parturition, thus causing a selective shift toward females that gave birth to relatively undeveloped neonates. It
is hypothesized that hominin mothers adopted new foraging strategies that entailed maternal silencing, reassuring, and controlling of the
behaviors of physically removed infants (i.e., that shared human babies’ inability to cling to their mothers’ bodies). As mothers increas-
ingly used prosodic and gestural markings to encourage juveniles to behave and to follow, the meanings of certain utterances (words) be-
came conventionalized. This hypothesis is based on the premises that hominin mothers that attended vigilantly to infants were strongly
selected for, and that such mothers had genetically based potentials for consciously modifying vocalizations and gestures to control in-
fants, both of which receive support from the literature.
Keywords: bipedalism; brain size; chimpanzees; foraging; gestures; hominins; infant riding; motherese; prosody; protolanguage
Dean Falk is the Hale G. Smith Professor of Anthro-
pology at Florida State University and Honorary Pro-
fessor of Human Biology at the University of Vienna.
She is the author of over 100 publications that focus on
early hominids, brain evolution, comparative neuro-
anatomy, primate behavior, and cognitive evolution. Re-
search on cranial blood flow and australopithecine en-
docasts led Falk to develop the “radiator hypothesis” of
brain evolution and to question the conventional inter-
pretations of certain fossils. Falk is the author of Brain-
dance: Revised and Expanded Edition (2004). In 2003,
she received the Austrian Cross of Honor for Science
and Art, 1
st
class.
affiliation and invitation for contact (e.g., raise eyebrows, eye-
brow flash, smile, nod, bob head backward) (Dissanayake
2000). With this in mind, mother-infant interactions that en-
compass visual, vocal, gestural, and tactile communication
are compared below in chimpanzees and humans in order to
identify the probable nature of the mother-infant interac-
tions that characterized early hominins.
1
Hominins are believed to have spent much of their pre-
history in fission-fusion communities that foraged daily for
food, which entailed mothers traveling in the company of
dependent offspring and a small number of other individu-
als (Nishida 1968; Stanford 1998). Around the time of the
australopithecine/early Homo transition, maternal pelves
that had been modified to accommodate bipedalism be-
came subject to an emerging trend for increasingly large
brains (Falk 1998; Falk et al. 2000), which eventually
caused a selective shift toward females that gave birth to
relatively helpless infants (Small 1998). Consequently, the
ability of babies to cling actively to their mothers was lost in
hominins (Ross 2001). Similar to some anthropoid mothers
that live under difficult foraging circumstances (Fuentes &
Tenaza 1995; Lyons et al. 1998), these mothers are hypoth-
esized to have adopted postnatal foraging-related changes
in maternal care, which included periodically putting their
infants down beside them in order to obtain and process
food. As a result, the incidence of distal mother-infant ges-
tural communications increased (Tomasello & Camaioni
1997) and prosodic (affective) vocalizations became ubiq-
uitous to compensate for the reduction in sustained
mother-infant physical contact.
The “putting the baby down” hypothesis focuses on
events that preceded the emergence of protospeech, and is
in keeping with the continuity hypothesis that the biologi-
cal capacity for language evolved incrementally within the
hominin line (Armstrong et al. 1994; King 1996): “differ-
ences between human language and nonhuman primate
communication are only quantitative and... these differ-
ences may be accounted for by gradual shifts in abilities due
to changing selection pressures – perhaps in the ability to
create... communicative utterances (Gibson 1990) or to
donate information to others” (King 1996, p. 193).
According to the discontinuity hypothesis, on the other
hand, language appeared suddenly, without phylogenetic
links to earlier communication systems (Burling 1993). This
latter hypothesis views “language backward through the
lens of contemporary linguistic theory rather than in the
context of how evolution operates” (Callaghan 1994,
p. 359). Most evolutionary biologists, however, believe that
reproductive fitness (an individual’s production of viable
offspring) is the driving force behind evolution and that,
whether it proceeds gradually or rapidly, most “evolution-
ary change occurs in the context of what is already in place
as a result of prior selective pressures” (Callaghan 1994,
p. 359). The present paper is grounded squarely on this
premise. Thus, contemporary motherese is viewed as the
result of prior selective pressures, the nature of which is ex-
plored in the following sections. Since language acquisition
today is universally scaffolded onto motherese, it is argued
that selection for vocal language occurred after early ho-
minin mothers began engaging in routine affective vocal-
ization toward their infants, a practice that characterizes
modern women, but not relatively silent chimpanzee moth-
ers. Below, it is shown that human infants are “primed” to
learn their native languages by the particular flavor of
motherese to which they are exposed. As well, data are pre-
sented that strongly suggest that this universal practice and
its associated ontogenetic unfolding of language acquisition
in human infants is genetically driven. For all of these rea-
sons, “positing a phylogenetic discontinuity between pri-
mate vocal communication and speech seems to [be] an un-
necessarily complicating assumption in the absence of
more compelling evidence” (Armstrong et al. 1994, p. 358).
2. Mother-infant interactions in chimpanzees and
humans
2.1. Mother-infant communication in chimpanzees and
bonobos
Because common chimpanzees (Pan troglodytes) and the
less-studied bonobos (Pan paniscus) provide the best ref-
erential models for early hominin behavior (Falk 2000;
Moore 1996), this section reviews the literature on their
mother-infant interactions in order to provide background
for examining the evolution of prelinguistic behaviors. As is
the case for humans, the period during which infant and ju-
venile chimpanzees are emotionally and physically depen-
dent upon their mothers is extended compared to monkeys.
Indeed, prolongation of the various developmental stages is
thought to be one of the trends that characterized the evo-
lution of higher primates. According to this view, increased
durations of dependency facilitated extended learning as-
sociated with the evolution of bigger-brained, highly intel-
ligent, and longer-living primates (Falk 2000).
Much of what is known about the vocalizations of wild
common chimpanzees has been discovered by Jane Goodall
and her colleagues (Goodall 1986). Many emotional states
of chimpanzees are obviously similar to those of humans,
and are expressed in a variety of easily recognizable facial
expressions (Preuschoft 2000; Preuschoft & van Hooff
1995; Schmidt & Cohn 2001) that, in turn, are frequently
linked with particular vocalizations. Chimpanzees produce
vocalizations by alternating the sizes and shapes of their
mouths and resonating cavities, and “facial expressions play
a key role in close-up communication between chim-
panzees” (Goodall 1986, p. 119), which may be related to
the fact that, at about the age of 3 months, infants show “a
sudden intense interest for the mother’s face” (Plooij 1984,
p. 142).
Goodall (1986) notes that vocal communication of chim-
panzees is far more complex than previously appreciated,
and has classified 34 discrete calls along with the emotions
with which they are associated. She also observes that chim-
panzee listeners learn much from the sequences of vocal-
izations that pass back and forth between individuals. (For
example, the screaming of an adult followed by squeaks and
then pant-grunts indicates to a distant chimpanzee that an
aggressive interaction has occurred and that the victim has
relaxed and approached the aggressor.) Chimpanzee calls
are distinguished (with presumably more difficulty for hu-
man than chimpanzee listeners) from an acoustically
graded continuum. Thus, the hoo is an isolated but distinc-
tive part of the whimpering sequence:
The single hoo may be uttered several times in succession, but
each vocalization is made separately; as a hoo sequence starts
to rise and fall in pitch and volume, and when each sound is pro-
duced in temporally rapid succession, it grades into the whim-
per. The hoo is uttered by both an infant and (much less often)
Falk: Prelinguistic evolution in early hominins: Whence motherese?
492 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
his mother when they need to reestablish physical contact –
when, for example, the infant wants to ride on his mother’s back
during travel or when she reaches to retrieve him from a situa-
tion she perceives to be dangerous. (Goodall 1986, p. 129)
In addition to hoos, several other calls are used by infants
as well as older chimpanzees, including screams (mothers
recognize those of their infants); whimpering (most com-
monly heard in infants, especially during weaning); and
tantrum screams (which occur in older infants that have
been rejected during weaning). Plooij (1984) discusses sev-
eral additional calls that are emitted by common chim-
panzee infants including effort-grunts, staccatos, and uh-
grunts. Because chimpanzees are unable to cling properly
for the first two months of life, they are as helpless as human
neonates and must be carried and supported on the ventral
side of their mothers’ bodies (Plooij 1984). Significantly, ma-
ternal support for chimpanzee infants varies, is related to
their whimpering, and is crucial for infant survival:
Some mothers supported and carried their babies almost con-
tinuously from shortly after birth whereas others restricted
themselves to the minimum necessary not to lose their baby.
Consequently, during locomotion over greater distances
(travel) babies from the first group were safe; they rarely
whimpered or screamed. Babies in the second group, on the
other hand, whimpered frequently when loosing their grip on
the mother’s hair, dangling from only one or two of their four
limbs.... The maternal support is of vital importance to the
baby. Without it, the baby would surely fall off and may die
(Plooij 1984, p. 45, emphasis mine).
The structure and contextual use of vocalizations of
bonobos have been investigated in the wild (Bermejo &
Omedes 1999, p. 355). Voices of bonobos are higher
pitched than those of common chimpanzees (Kano 1992),
and their utterances appear to be more structured and flex-
ible and to always occur in the context of facial expressions,
gestures, and tactile communication (Bermejo & Omedes
1999). In bonobos, peep sequences are among the most im-
portant vocalizations, and croaks, muffled barks, and pant-
ing laughs are used mainly by young individuals. Peep yelps
and peeps that may escalate into screams are given by in-
fants that are prevented from nursing, accompanied by in-
tense pouts. Bonobos also produce choruses in which indi-
viduals echo each other’s calls, and seem to be trading
information about emotions and intentions during aggres-
sive confrontations that involve vocalizations, which led de
Waal (1997) to suggest that bonobos appear to engage in
more language-like exchanges of information about their
internal states than do common chimpanzees. Although de
Waal did not claim that bonobos talk, they seem, at least, to
have a latent ability to learn names, as shown by a study in
which two human-enculturated bonobos were able to learn
to comprehend English words for novel objects with few ex-
posures to the novel items, an ability that did not require vi-
sual contact with items during acquisition of their names
(Lyn & Savage-Rumbaugh 2000). In this context it is inter-
esting that, although many believe that apes do not imitate
vocally (Fitch 2000), recent spectrographic and statistical
analyses reveal that the well-known bonobo Kanzi produces
distinct vocalizations for “banana,” “grape,” “juice,” and
“yes” (Taglialatela et al. 2003).
2.1.1. Infant-directed vocalizations of common chim-
panzees. Mothers of infant chimpanzees are notoriously
shy (McGrew 1992) and, except for hoos, calls that are
specifically directed by mothers to their infants are rarely
mentioned in the literature. The few other maternal ID
calls noted by Goodall (1986) include replies to screams of
their infants “even if the child is out of sight” (p. 131), and
soft barksor coughs given in mild rebuke to weaning infants
that begin to suckle after throwing temper tantrums
(p. 576). Chimpanzee mothers have also been reported to
emit soft vocalizations while examining their infants (Nicol-
son 1977). Maestripieri and Call (1996) note that, when
they occur, ID vocalizations of chimpanzee mothers, such
as hoos and whimpers, are similar to the vocalizations pro-
duced by their infants. It is significant that one of the few
circumstances under which chimpanzee mothers routinely
produce ID vocalizations is in conjunction with foraging
and travel. For example, hoos are uttered to retrieve infants
for travel, and “soft grunts may be exchanged when... two
or more familiar chimpanzees, especially family members,
are foraging or traveling together. Typically one individual
grunts when he pauses during travel, or when he gets up to
move on.... Thus these grunts function to regulate move-
ment and cohesion” (Goodall 1986, p. 131).
2.1.2. Infant-directed vocalizations of bonobos. Bermejo
and Omedes (1999) note that bonobo mothers in the wild
are very sensitive to screams of their infants and emit barks
or hiccups during alarm situations, which elicit immediate
responses from offspring. Similar to common chimpanzees,
bonobo mothers have also been observed vocalizing in or-
der to retrieve infants for travel:
Nevertheless, the mother often carries her offspring during
travel until it is at least three or four years old. The signal initi-
ating this kind of transportation is the mother’s vocalization.
Then, after walking a short distance, up to 6 m, she will stand
with one foot slightly lifted, the sole facing toward the rear, in
a stationary walking position. There she will stand, waiting for
the juvenile to run after and jump onto her back. (Kano 1992,
p. 164)
Bonobos are thought by some to be more intelligent than
common chimpanzees, partly because of their relatively
greater success at learning nonvocal, humanlike language
(Savage-Rumbaugh 1984; Savage-Rumbaugh et al. 1998).
Compared to common chimpanzees, the human-encultur-
ated bonobo Kanzi accompanies many gestures with spon-
taneous vocalizations that “appear to be voluntary and
used intentionally to draw attention to Kanzi and to what
he wants” (Savage-Rumbaugh 1984, p. 408). Although his
adoptive mother (Matata) anticipated and aided Kanzi’s de-
veloping locomotor activities, there is no indication that she
vocalized during these ID gestures. In sum, although both
bonobos (Kano 1992) and common chimpanzees have rich
vocalization systems, there is little evidence that mothers
engage in a significant amount of ID vocalization, in stark
contrast to the case for humans.
2.1.3. Infant-directed gestures of common chimpanzees.
Although chimpanzees use gestures involuntarily to express
moods, as well as intentionally to call attention to them-
selves or to deliver imperatives, their repertoire of gestures
anticipates but fails to achieve the sophistication of that ac-
quired by a typical 1-year-old human child (Tomasello &
Camaioni 1997). Tomasello and Camaioni point out three
characteristics of natural gesturing in chimpanzees that dif-
fer from gesturing in human infants: (1) Chimpanzee ges-
tures are almost exclusively dyadic (used to attract attention
Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 493
to oneself ) instead of mostly triadic (used to attract atten-
tion to an outside party), (2) their gestures remain largely
imperative without developing declarative or referential el-
ements, and (3) most chimpanzee gestures involve physical
contact between the signaler and recipient (i.e., they are not
distal). Significantly, the two triadic exceptions noted for
chimpanzee gestures by Tomasello and Camaioni (1997)
appear to be similar to request and offer gestures of human
mother-infant pairs (Messinger & Fogel 1998, see sect.
2.2.1), not only physically, but also motivationally (i.e., used
to request food and to seek positive social contact).
For chimpanzees, ID gestural communication appears to
be much richer than ID vocal communication. A newborn
common chimpanzee is licked and groomed by its mother
immediately after birth, and bouts of maternal ID groom-
ing increase in duration during the first year of its life
(Goodall 1986). Plooij (1984, Appendix A) documents a rich
repertoire of ID gestural and kinesic behaviors toward de-
veloping infants in chimpanzee mothers from Gombe re-
lated broadly to carrying, cradling, nursing, weaning, play,
traveling, and acquisition of motor skills. ID gestures have
also been noted for captive mothers (Nicolson 1977), two
of which spent considerable amounts of time examining
their young infants. One cradled her infant and “kissed” it
on the mouth (p. 541). Captive mothers also frequently pat-
ted their infants’ heads and backs. The captive mothers
seemed to test and encourage their infants’ developing mo-
tor skills by giving them “walking lessons” (pp. 541– 42).
The first cross-fostered chimpanzee schooled in American
Sign Language for the Deaf (ASL), Washoe, has even been
reported to mold her adopted son Loulis’s hands in the form
of a sign (Fouts et al. 1989). (It is important to note, how-
ever, that gesturing should not be confused with sign lan-
guage because it lacks the complex grammar and arbitrari-
ness found in the latter [Karmiloff & Karmiloff-Smith
2001].)
Some of the most interesting ID gestures of chimpanzee
mothers have been observed in conjunction with feeding.
Mothers begin sharing solid food with infants when they are
about 5 months old, and have been observed snatching
leaves that were not part of their normal diet from their in-
fants’ mouths (Goodall 1986). In addition to teaching in-
fants which foods are palatable, Goodall believes this sort
of ID intervention serves to reinforce traditional food pref-
erences in chimpanzee communities. Along these lines, it is
fascinating that at least some chimpanzee mothers from the
Taï forest, Ivory Coast, have anecdotally been reported to
teach their offspring to use implements such as rocks to
crack open nuts that have been placed on anvils (Boesch
1991; Boesch & Boesch-Achermann 1991).
Play is the hallmark of a young chimpanzee’s life, and its
frequency peaks between the ages of 2 and 4 years (Goodall
1986). Females with infants play more than other adults,
which entails a good deal of ID physical activity:
A chimpanzee infant has his first experience of social play from
his mother as, very gently, she tickles him with her fingers or
with little nibbling, nuzzling movements of her jaws. Initially
these bouts are brief, but by the time the infant is six months
old and begins to respond to her with play face and laughing,
the bouts become longer. Mother-offspring play is common
throughout infancy. (Goodall 1986, pp. 369–70)
Significantly, turn-taking in chimpanzees has been docu-
mented in the context of mother-infant play: “The early bit-
ing triggered the onset of mother-baby play: contingent
upon when bitten, the mother started to tickle the baby and
this biting-tickling grew into an alternating interaction, in
which both mother and baby could take their turns” (Plooij
1984, p. 142).
As the infant matures in the wild, its mother “shapes and
cushions his first interactions with other individuals”
(Goodall 1986, p. 568), primarily by keeping a wary eye on
the infant, which she hurries to remove from potentially
harmful social situations. Although chimpanzee mothers
are extremely lenient, occasionally a mother seizes her in-
fant and drags it away, for example, if it continues to ignore
her obvious signals that it is time for them to move on to a
new location ( p. 368). Maternal tolerance decreases during
an infant’s fourth and fifth years as it is weaned and forced
to walk by itself. When juveniles throw temper tantrums,
their mothers often give in by embracing them and allow-
ing them to suckle. For example, after a 4-year-old son who
was being weaned was rejected twice while attempting to
climb onto his mother’s back, he uttered terrified screams
that galvanized her “into instant action, [she] rushed back
and with a wide grin of fear gathered up her child and set
off – carrying him” (p. 582).
2.1.4. Infant-directed gestures of bonobos. ID gestures of
bonobo mothers are similar to those of common chim-
panzees. Infant bonobos and common chimpanzees begin
eating solid food at about the same age, although the two
species differ in how they request solid food from their
mothers (Kano 1992). The most observed pattern in com-
mon chimpanzees is for infants to put their mouths near
their mothers’ mouths. In bonobos, the most prevalent
form of begging is for the offspring to touch their mothers’
mouths. Under these circumstances, bonobo mothers may
look away while shaking their heads as if annoyed, but they
usually give up the food. As Kano (1992) summarizes, “a
kind of food-sharing occurs frequently in which a juvenile
approaches and snatches food from its mother or takes food
directly from her mouth. The mother certainly does not
dole out the food, but she lets her offspring pull and bite at
it” (p. 167).
Bonobo mothers frequently play with their infants using
slow-moving and gentle motions, often while resting in day
nests. During play, mothers tickle with their fingers, play-
bite, and grab their infants. “While lying sprawled looking
up, she will tickle the infant and hold its hands and feet;
hanging high in space, the infant looks very happy and for-
tunate” (Kano 1992, p. 132). Interestingly, bonobo mothers
in this position sometimes appear to be playing “airplane”
with their infants (p. 165).
Based on observations of Kanzi and his mother, Matata,
Savage-Rumbaugh (1984) and Savage-Rumbaugh et al.
(1998) suggest that bonobo mothers foster the emergence
of intentional communication skills in their infants by re-
sponding to their gestures for aid as they move indepen-
dently from place to place. Matata monitored Kanzi’s acro-
batics closely when he was 4 to 11 months old and “would
nearly always raise a foot or arm toward Kanzi and shove
him toward the object he had been trying to reach...
Kanzi, like human infants, began to signal his desired intent
to go to a particular location and to look back and forth be-
tween his locomotor goal and his mother” (Savage-Rum-
baugh 1984, p. 405). Such gestures and visual checking ap-
peared rather suddenly when Kanzi was 10 months old, and
he then began to “ask” his mother to pick him up, and to
Falk: Prelinguistic evolution in early hominins: Whence motherese?
494 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
help him reach a particular place. At the same age, Kanzi’s
half-brother, Akili, also signaled his desire for help getting
from one place to another to Matata. Shortly after Kanzi be-
gan signaling his intentions, he spontaneously began to
point by touching objects with an extended index finger. Al-
though common chimpanzees may use an extended hand to
refer to things, use of an extended index finger is rare (But-
terworth 1997; Savage-Rumbaugh 1984).
It must be kept in mind, however, that Kanzi is a bonobo
that was enculturated by humans (Savage-Rumbaugh et al.
1998), rather than mother-reared in a more natural setting,
which has important implications for learning to engage in
social interactions that focus attention on a third entity and,
indeed, the development of triadic gestures (Tomasello &
Camaioni 1997; Tomasello et al. 1993). Unlike mother-
reared chimpanzees, enculturated chimpanzees imitatively
learn actions upon objects in a manner similar to young chil-
dren, an ability that appears to be scaffolded onto socialized
attention, which is acquired by interacting with humans
(Tomasello et al. 1993). Tomasello et al. argue that such
“broadly based skills of social cognition are a prerequisite
to the acquisition of language skills” (p. 1702).
By the time the wild bonobo is 6 months old, it starts to
move around the periphery of its mother. If the infant at-
tempts to go far away, however, the mother will bar its way
with her hand and resume carrying it. Mothers continue to
carry their offspring during travel until they are 3 to 4 years
old. Similar to common chimpanzees, when it is time to
move within or from trees, bonobo mothers assume a pos-
ture and wait for their infants to jump on their backs. When
the infant gets close, its mother may extend her hand to-
ward it (Kano 1992).
2.1.5. Chimpanzee and bonobo laughter. According to
Goodall, laughing that somewhat resembles human laugh-
ter is heard during play sessions. Although most laughter re-
sults from physical contact such as tickling, it also occurs
during chasing play. Because they play more frequently, in-
fants laugh more than adults. “Sound spectrograph analysis
shows a change from steady exhaled sound, to chuckle-like
pulsed exhaled sound, to ‘wheezing’ laughter” (Goodall
1986, p. 130). Sonagrams have also been collected of short
series of rhythmic panting laughs in wild bonobos, which
are the only bonobo vocalizations that are clearly associated
with only one context, namely play (Bermejo & Omedes
1999).
Provine (1996; 2000) notes that chimpanzee laughter
has the sound and cadence of a handsaw cutting wood, and
differs from that of humans in the way that sounds are typ-
ically related to the airstream. The vowel-like notes of hu-
man laughter (e.g., “ha”) “are performed by chopping a sin-
gle expiration, whereas chimpanzee laughter is a breathy
panting vocalization that is produced during each brief ex-
piration and inspiration” (Provine 1996, p. 40). Chim-
panzee laughter also lacks the vowel-like notes that typify
human laughter. In other words, unlike the norm for hu-
mans, chimpanzees breathe in and out as they produce a
breathy, panting laughter. (In a personal communication,
however, Phillip Tobias noted that the late Louis Leakey
had a marvelous belly laugh that was vocalized on both the
exhale and the inhale, an anecdote which shows that the
classic human “ha-ha” laugh is a central theme around
which variation occurs.) Provine suggests that chim-
panzee-like laughter was present in the common ancestor
of apes and humans. If so, it would have been an important
component of mother-infant communication in early ho-
minins.
2.2. Motherese in humans
Human infants discover how rhythm organizes their native
languages between birth and 2 months of age (Karmiloff &
Karmiloff-Smith 2001). In most cultures, learning to
process the rhythms of speech is facilitated by the special
way in which infants are addressed, known variously as
motherese, musical speech (Trainor et al. 2000), or infant-
directed (ID) speech. In ID speech, intonation contours
around phrases are exaggerated, as are stress patterns
within words and sentences. Many repetitions and ques-
tions with rising intonations are used. The following exam-
ples provide a feel for the exaggerated stressed syllables (in
capitals) that typify motherese (see also Wheeldon 2000):
Aren’t YOU a nice BAby? Good GIRL, drinking all your MILK.
Look, look, that’s a giRAFFE. Isn’t that a NICE giRAFFE?
DOGgie, there’s the DOGgie. Ooh, did you see the lovely
DOGgie? (Karmiloff & Karmiloff-Smith 2001, p. 47)
Infants’ preference for ID as opposed to adult-directed
(AD) speech increases during the first several months of life
(Cooper et al. 1997), and ID speech is used most intensively
with 3- to 5-month-old infants, although it persists until
around 3 years -of-age (Stern et al. 1983). Six-month-old
hearing and deaf infants also show greater attention and re-
sponsiveness to ID than to AD Japanese Sign Language
(Masataka 1998).
Despite several “flawed” studies to the contrary, Monnot
(1999) marshals strong support for the hypothesis that ID
speech that is characterized by a simplified vocabulary,
more repetition, exaggerated vowels, higher overall tone,
wider range of tone, and slower tempo is a universal trait
among modern humans. Pitch and rhythmic structure com-
prise two main dimensions of singing and music, as well
(Dissanayake 2000). The singing of lullabies and playsongs
to infants is also universal (Trehub et al. 1993), conveys
meaning that is emotional rather than linguistic, and has
acoustic features that are similar to ID speech: “For both
playsongs and lullabies the tempo was slower, there was rel-
atively more energy at lower frequencies, inter-phrase
pauses were lengthened, and the pitch and jitter factor
were higher” (Trainor et al. 1997, p. 383). From the begin-
ning, then, babies everywhere are predisposed to respond
to certain maternal vocalizations that function as uncondi-
tioned stimuli that alert, please, soothe, and alarm the in-
fant (Fernald 1994). The universalist hypothesis also spec-
ifies that ID speech contributes initially to infant emotional
regulation, then to socialization, and finally to the acquisi-
tion of speech in a sequential, age-appropriate manner
(Monnot 1999; Trainor et al. 2000).
Vocal, gestural, and kinesic social interactions between
parents and infants serve, in part, to reinforce the latter’s at-
tention to, and eventual development of, language. Thus,
parents unconsciously establish eye contact with infants
and then use motherese to maintain joint attention. As par-
ents realize infants are responding to their voices by kick-
ing, jerking, or with coos and gurgles, they begin taking
turns with the infants. Parents speak, pause for the infant
response, then speak again. As Karmiloff and Karmiloff-
Smith note (2001), “These ‘conversations’ that are initially
one-sided linguistically may actually constitute an impor-
Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 495
tant preparation for taking part in later dialogue when the
toddler will be capable of using language to replace the
primitive kicks and gurgles” (p. 48).
What is particularly important for this discussion is that,
rather than meaning or grammar, it is the melodic and ex-
aggerated prosodic patterns of ID speech that initially in-
terest infants (Karmiloff & Karmiloff-Smith 2001). The
melodies of mothers’ speech are compelling stimuli that are
effective in eliciting emotion in preverbal infants (Fernald
1994; Morton & Trehub 2001; Soken & Pick 1999) and, in
addition to revealing information about mothers’ feelings
and motivational states, may be used instrumentally to in-
fluence infants’ behaviors:
When the mother praises the infant, she uses her voice not only
to express her own positive feelings, but also to reward and en-
courage the child. And whether or not the mother feels anger
when producing a prohibition, she uses a sound well designed
to interrupt and inhibit the child’s behavior.... In this respect,
the use of prosody in human maternal speech is similar to the
use of vocal signals by some nonhuman primates.” (Fernald
1994, p. 64, emphasis mine)
As babies mature, motherese has an important role for their
development of speech. For example, English, Russian,
Swedish, and Japanese mothers hyperarticulate vowels
when addressing their infants (but not other adults), thus
amplifying the phonetic characteristics of vowels and facil-
itating the phonological aspects of their infants’ develop-
ment (Andruski et al. 1999; Burnham et al. 2002; Kuhl et
al. 1997). The fact that hyperarticulation is didactic rather
than merely reflecting high emotional content is illustrated
by a comparative study of pitch (fundamental frequency),
affect (intonation and rhythm), and vowel hyperarticulation
(vowel triangles) of mothers as they spoke to their 6-month-
old infants, their pets (cats or dogs), and other adults:
These results show that infant- and pet-directed speech are
similar and distinctly different from adult-directed speech in
terms of heightened pitch and affect. Interestingly, only infant-
directed speech contains hyperarticulated vowels. Thus, vowel
hyperarticulation does not accompany special registers simply
because they differ from adult speech in pitch and affect.
Rather, it seems to be a didactic device: Mothers exaggerate
their vowels for their infants but not for their pets. (Burnham
et al. 2002, p. 1435)
By around 10 months of age, children begin to babble in
rhythms that are consistent with the prosodic structure of
their language (Levitt 1993). At a fundamental level, the vo-
cal turn-taking that develops between mothers and their
babbling babies (Karmiloff & Karmiloff-Smith 2001) helps
the latter grasp the “rule” that conversationalists take turns.
Such “social syntax” (Snowdon 1990) may enhance infants’
acquisition of other rules that are preliminary to learning
the proper arrangements for elements within sentences
(syntax). Infants appear to learn an important aspect of syn-
tax, namely, the boundaries between linguistic categories
such as words or phrases, through phonological bootstrap-
ping, that is, by attending to the correlations between the
prosodic cues of motherese (phonological features, intona-
tion, stress, vowel length) and linguistic categories (Burn-
ham et al. 2002; Gleitman & Warner 1982; Morgan 1986;
Morgan & Demuth 1996).
By the time infants reach the single word stage (at around
17 months of age), they are becoming sensitive to the way
in which different word orders convey different meanings
in English (Hirsh-Pasek & Golinkoff 1996). But once in-
fants acquire some feel for linguistic categories, how do
they begin to grasp a sentence’s meaning? Pinker (1987;
1994) suggests a likely mechanism is through semantic
bootstrapping – the mapping of sounds onto mental se-
mantic concepts such as transitive and intransitive verbs.
Thus, after an infant has learned the meanings of the rele-
vant nouns, he or she is able to infer the semantic meaning
of syntactical categories from the context in which they are
heard:
Upon hearing “The boy is patting the dog,” for example, the
child needs to know what the words “boy” and “dog” mean be-
fore he can even start a grammatical analysis of the sentence.
Then, upon seeing the accompanying action (boy touching the
dog’s back), the child can use this real-world situation to make
the formal linguistic analysis, mapping “the boy” to the subject
noun phrase, and “patting the dog” to the verb phrase contain-
ing a direct object. In other words, to get syntax under way, the
child initially extracts an appropriate semantic representation
for a verb by mapping the extralinguistic context onto the syn-
tactic string and by inferring what the speaker is trying to con-
vey. In this way, the child is able to learn that “pat” means to
move your hand on something in a certain way, as he can infer
from the extralinguistic context. He can also derive from the
linguistic context that “pat” is a transitive verb that must take a
direct object. (Karmiloff & Karmiloff-Smith 2001, p. 115)
Although Pinker’s hypothesis is difficult to apply to non-
transparent situations, it finds support from research that
shows that most of the utterances addressed to infants in
the early stages of learning a language are simple, active
sentences of the type, “The boy is patting the dog”
(Karmiloff & Karmiloff-Smith 2001). What is particularly
important for the present discussion is that Pinker stresses
the importance of nonsyntactic prosodic cues provided by
motherese for semantic bootstrapping.
Recent work illustrates that motherese is also important
for infants’ acquisition of morphology (Kempe & Brooks
2001). In certain languages in which nouns are classified into
different gender classes in ways that seem arbitrary rather
than systematically form-based, diminutives are used more
frequently when talking to children than adults and serve to
increase the transparency of the gender markings. In these
languages, learners exposed to diminutives are able to gen-
eralize gender from the diminutives’ transparent suffixes to
the nouns that they modify. Diminutives are important for
acquisition of proper gender or case markings in Russian,
Spanish, Finnish, Lithuanian, and “there is widespread
agreement that the occurrence of diminutives in CDS
[child-directed speech] is primarily motivated by pragmatic
and semantic factors” (Kempe & Brooks 2001, pp. 251–52).
To summarize, the above studies show that motherese
varies between cultures in subtle ways that are tailored to
the specific difficulties inherent in learning particular lan-
guages. Additional studies on speech development in in-
fants document the effects of prosody on syllable omission
(Lewis et al. 1999) and reduction (Snow 1998); the shaping
of monosyllabic utterances (Snow 2002) and words (De-
muth 1996; Fee 1997); auditory memory of speech (Man-
del et al. 1994); and prediction of dialogue structure (Hastie
et al. 2002). As a general rule, infants’ perception of
prosodic cues in association with linguistic categories is im-
portant for their acquisition of knowledge about phonology,
the boundaries between words or phrases in their native
languages, and, eventually, syntax. Prosodic cues also prime
infants’ eventual acquisition of semantics and morphology.
Finally, the fascinating discovery (Burnham et al. 2002) that
Falk: Prelinguistic evolution in early hominins: Whence motherese?
496 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
infant-directed speech contains separate elements that
serve to express emotions, on the one hand, and function as
didactic devices, on the other, is consistent with the view
that motherese evolved incrementally from largely affective
ancestral vocal communications to its present highly com-
plex form.
2.2.1. Multimodal motherese in humans. Because com-
munication with infants involves tactile and visual as well as
auditory stimuli, interest is growing in multimodal moth-
erese that involves gesture, facial expressions, and touching
of infants in addition to vocal utterances (Dissanayake 2000;
Fogel 1993). For example, studies of American and Italian
mother/infant pairs suggest that ID speech is accompanied
by ID bodily gestures that are relativelysimple compared
to gesticulations directed towards adults (Iverson et al.
1999; Shatz 1982). Compared to AD gestures, ID gestures
occur less frequently, are simpler and less abstract, and
function to highlight certain utterances or to attract atten-
tion to particular objects. As Italian infants’ absolute num-
bers of both gestures and words increased between the ages
of 16 and 20 months, their relative use of gestures de-
creased from 42% to 27%, in proportion to the sharp in-
crease in word production (Iverson et al. 1999, p. 65).
Rather than adding information to verbal communications,
most ID gestures serve to reinforce the linguistic message.
ID speech is perceived visually as well as aurally by in-
fants. Facial imitation has been reported for human
neonates (Meltzoff 1988), and 3- to 4-month-old infants im-
itate mouth movements only when auditory and visual rep-
resentations of vowels are temporally coordinated (Leg-
erstee 1990). Four-month-old infants also prefer vowels
that are presented with the visual image of the appropriate
mouth shape (Kuhl & Meltzoff 1988). Similarly, 5-month-
old infants prefer speech sounds that are steadily increased
in amplitude when they are presented with gradually
opened mouths (Mackain et al. 1983). These studies sug-
gest that infants attend to mouth shapes that correlate with
speakers’ utterances (Gogate et al. 2000). Maternal speech
is also tied to facial expressions at other levels (Schmidt &
Cohn 2001): The muscles of facial expression participate in
the mother’s articulation of speech sounds (Massaro 1998)
and contribute information about their meaning (Ekman
1979). Facial expressions on the part of the infant, on the
other hand, provide cues about his or her attentiveness to
the mother’s speech. Interestingly, women appear to be
more sensitive and accurate decoders of facial expressions
than men (Hall 1984; McClure 2000), and infants appear to
vary their facial expressions depending on the sex of the
parent (Forbes et al. 2000).
Gogate et al. (2000) studied multimodal motherese in-
volving vocal, gestural, and tactile stimuli in European,
American, and Hispanic mother/infant pairs representing
three developmental ages: prelexical (5 –8 months), early-
lexical (9–17 months), and advanced-lexical (21– 30
months) infants. Mothers were asked to teach novel names
for two brightly colored puppets (dubbed chi and gow) and
two verbs (pru meaning leap, and flo meaning shake) to
their infants by any means they would normally use. Nearly
100% of the mothers’ communications were multimodal,
with mothers tailoring their productions to the infants’ lex-
ical development when specifically teaching words. Moth-
ers spoke the target words synchronously with moving the
puppets and touching their infants with them (“auditory-
visual-tactile synchrony”) in decreasing frequencies from
earlier to later developmental stages. This suggests that
mothers’ trimodal coordination “highlights word-referent
relations for infants on the threshold of lexical develop-
ment” (Gogate et al. 2000, p. 890). Mothers of advanced-
lexical infants, on the other hand, were more likely to name
objects and actions when the object remained static or was
held by the infant. Further, “the decrease in maternal use
of temporal synchrony... appears to be well-timed with in-
fants’ at 14 months increased ability to detect word-refer-
ent relations without temporal synchrony on the basis of ob-
ject motion alone.... In addition, mothers’ naming of
objects or actions with static objects seems well adapted to
older infants’ ability to glean word-referent relations on
their own” (p. 891).
Messinger and Fogel’s fascinating 1998 study demon-
strates how vocalizations combined with certain gestures
become increasingly intentional or instrumental rather
than emotionally induced as infants mature, and supports
the opinion that intentional gestures were important dur-
ing language evolution (Corballis 2002; Rizzolatti & Arbib
1998). In Messinger and Fogel’s study, smiling, gazing at
mothers, and manual gestures (with and without accompa-
nying vocalizations) were analyzed in 11 infants between 9
and 15 months of age as they played with their mothers, sev-
eral times a month. Gestures were coded as requests when
either mother or infant extended an arm toward an object,
pointed to it, or made a palm-up gesture in a context that
indicated a desire for the partner to give the object to the
requester; and scored as offers when either gave an object
he or she was holding to the other. When vocalizations ac-
companied gestures, approximately 96% of them did not in-
volve recognizable words, that is, they were nonverbal. In-
terestingly, the proportion of infant requests involving
vocalizations rose with age, showing “that as infants ap-
proach 15 months of age, they use the behavioral precur-
sors of speech instrumentally to communicate their desire
for objects,” and these “infant vocalizations increased the
instrumental tone of infant gestures, particularly because
the vocalizations were not related to either gazing at mother
or infant smiling” (Messinger & Fogel 1998, p. 587). Infant
offers, on the other hand, did not rise significantly with age,
but were more likely to involve smiling and gazing at the
mother. Thus, “in offering objects to mother, infants ap-
peared to share and create positive social contact” (p. 586).
It appears that infants increasingly use vocalizations with
requests to compensate for the fact that they are more am-
biguous than manual offers (p. 584), and that “in so doing,
they may be combining linguistic topics (the object referred
to) with comments (the request gesture) in a manner that
presages more complex language use” (Rome-Flanders &
Cronk 1995). This important study suggests that develop-
ment of intentional manual gestures in infants is accompa-
nied by increased use of vocalizations that precede the pro-
duction of actual words.
The gestures studied by Messinger and Fogel (1998)
were triadic rather than dyadic (see sect. 2.1.3). The re-
quest and offer gestures were also imperative (“Take this!”
or “Give me that!”) rather than declarative (informing an-
other about an outside entity), and carried out at a distance
from the partner (distal). As such, these gestures were rep-
resentative of the earliest intentional gestures of develop-
ing human infants, which are preparatory to acquisition of
referential gesturing:
Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 497
Developing human infants’ earliest gestures are triadic and dis-
tal, and they produce gestures for declarative purposes soon
thereafter. Soon after that, they produce a totally novel kind of
gesture, the referential gesture, which is clearly learned
through imitation and understood bidirectionally and conven-
tionally from the beginning. (Tomasello & Camaioni 1997,
p. 19)
Although Tomasello and Camaioni emphasize the primacy
of the visual-gestural modality for language evolution,
Messinger and Fogel’s research suggests that vocalization
was the crucial factor that facilitated evolution of the ab-
stract, instrumental aspects of speech. In any event, the dis-
covery that mother-infant multimodal (vocal plus gestural)
communication contains separate elements that serve to
enhance social contact, on the one hand, and to allow in-
fants to communicate their desires instrumentally, on the
other, is concordant with the view that multimodal moth-
erese evolved incrementally from largely affective, multi-
modal ancestral communications to its present, more com-
plex, form.
2.2.2. Mother-infant laughter in humans. Laughter is pre-
dominantly an involuntary behavior that usually occurs in
social situations, is associated with high intensity affect, and
lasts less than two seconds (Nwokah et al. 1999). Provine
(1996, p. 41) underscores the social and emotional aspects
of laughter: “Mutual playfulness, in-group feeling and pos-
itive emotional tone – not comedy – mark the social set-
tings of most naturally occurring laughter.” In adults, most
laughter seems to punctuate speech, for example, by oc-
curring after a spoken phrase. For this reason, speech has
been interpreted as having priority over laughter for ac-
cessing the vocalization channel (Provine 1993).
Bachorowski et al. (2001) propose that laughter influ-
ences listeners through acoustic properties that affect at-
tention, arousal, and emotional responses. A listener’s at-
tention is tweaked by laughter because of learned positive
emotional responses that have been conditioned as a result
of repeated pairings of laughter with positive affect. Al-
though Bachorowski et al.’s (2001) research is on young
adults, their hypothesis is attractive in light of the fact that
infants usually begin to laugh between the ages of 14 to 16
weeks, often during positive interactions with their moth-
ers, and “laughter, smiles and other gestures by the baby re-
inforce the mother’s behavior (tickling, for example) and
regulate the duration and intensity of the interaction”
(Provine 1996, p. 39). Interestingly, women produce signif-
icantly more song-like bouts of laughter than men, who pro-
duce significantly more grunt-like laughs (Bachorowski et
al. 2001).
But what about laughter that is directed toward infants?
Interactions between 13 American mothers and their in-
fants were scored for maternal laughter from videotapes
that were taken periodically as infants grew from 4 weeks
to 2 years of age (Nwokah et al. 1999). Particular attention
was given to co-occurrences of speech with laughter
(speech-laughs) in mothers, which were coded for vowel
elongation, syllabic pulsation, breathiness, and pitch change.
Compared to the near absence of speech-laughs in AD
laughter (Provine 1993), speech co-occurred in approxi-
mately 19% of the total number of ID laughs that were
analyzed, with the figure for individual mothers ranging
from 5% to 50%. In most speech-laughs, speech and laugh
began simultaneously and incorporated prosodic, affec-
tive, repetitive rhythmic features that typify vocal mother-
ese.
Production of speech sounds entails alterations in breath-
ing and manipulation of the respiratory apparatus, which
means that important changes in both the vocal tract and
respiration were required before hominins could begin
speaking (Provine 2000). Because apes and humans both
engage in laughter that is constrained by breathing, com-
parative studies of this behavior provide clues about the na-
ture of those changes. In addition to information about the
anatomical and physiological evolution of respiration in ho-
minins, studies of laughter also illustrate give-and-take turn-
taking (“social syntax”; see Snowdon 1990) between moth-
ers and very young infants. The nearly identical mother-infant
tickling/laughter bouts of chimpanzees, bonobos, and hu-
mans provide some of the best evidence for the continuity
hypothesis with respect to the evolution of mother-infant
communication. Despite the similarities in these bouts,
however, the breathing and vocalizations that they entail dif-
fer fundamentally between apes and humans, and walking
upright appears to have been the critical event in the respi-
ratory/vocal transition that accompanied not only the evolu-
tion of laughter, but also of speech (Provine 2000).
3. Prelinguistic evolution in early hominins
3.1. The role of bipedalism and loss of infant clinging
Two features related to development in chimpanzees and
humans differ in profound ways that are important for for-
mulating hypotheses regarding the prelinguistic substrates
of language. Difference 1: Although infants of both taxa ex-
hibit remarkable similarities in the sequence and timing of
various developmental phenomena (e.g., helplessness at
birth, distress at separation from mother, disappearance of
blind rooting responses, production of social faces, and fear
of strangers [Plooij 1984]), landmarks related to control of
posture and locomotion (pushing off, sitting and standing
without support, creeping on all fours, and walking biped-
ally (Plooij 1984) appear much later in humans than in
chimpanzees. Difference 2: Unlike chimpanzee mothers,
human mothers continually produce affectively positive vo-
calizations to their infants. Below, it is reasoned that this
first difference between humans and chimpanzees is asso-
ciated with the evolution of bipedalism and the subsequent
trend for brain size increase in late australopithecines/early
Homo (Falk et al. 2000), and that the second derived from
an initial evolution of prosodic and instructional vocaliza-
tions in early hominin mothers. Further, it is hypothesized
that these differences are related, that is, that the prelin-
guistic substrates for protolanguage began to evolve from
ID vocalizations similar to those of chimpanzees as brain
size started to increase in bipedal hominins.
But how? To explore this question, one must address the
definitive trait that makes a hominin a hominin, namely,
bipedalism. Many candidates (summarized in Falk 2000)
have been proposed as the main advantage (or selective
pressure) that led to bipedalism including: freeing of the
hands to carry things (food, water, babies) or to make tools;
increased ability to see predators and game over tall grass
or to reach higher to pick food from trees; better stamina in
running after game and hunting; and enhancement of sex-
ual signals (genital displays). An important advantage of
bipedalism is that upright hominins were more efficient at
Falk: Prelinguistic evolution in early hominins: Whence motherese?
498 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
keeping cool because they had reduced areas of skin ex-
posed to the intense solar radiation (Wheeler 1988) that
would have presented a thermal liability for later australo-
pithecines/early Homo, which dovetails with the radiator
hypothesis of brain evolution (Falk 1990; 1992a; 1992b). Al-
though consensus is lacking about the causes of bipedalism
(and how long it took to become fully achieved), one thing
is certain: Fossil evidence shows that by the time hominins
left Africa to begin colonizing the rest of the world (around
two million years ago), they did so using fully developed
bipedal gaits.
The fossil record also reveals that anatomical changes
that broadened and shortened the pelvis and reshaped the
birth canal began occurring well before this exodus. These
changes, together with the subsequent trend for increas-
ingly large brains that began in late australopithecines/early
Homo (Falk 1998; Falk et al. 2000), would have made par-
turition progressively more difficult. The evolutionary so-
lution to this dilemma is that, today, women give birth
sooner, that is, before infants’ heads are too big to pass
through the birth canal, which results in neonates that are
relatively undeveloped. This is why human babies reach
landmarks related to posture and locomotion later than ape
infants (Difference 1), and it is why they are unable to ride
clinging to their mothers’ bodies. The trend for increasingly
difficult parturition was well underway in Homo by 1.6 mil-
lion years ago, as indicated by the comparatively modern
body proportions, narrow pelvis, and approximately 900
cm
3
cranial capacity of the famous Nariokotome skeleton
from Kenya (WT 15000), which suggests that this youth’s
female relatives would have been subject to difficult deliv-
eries of relatively undeveloped neonates (Walker & Leakey
1993).
Unlike the infants of many prosimian species that are fre-
quently parked in nests or trees, unweaned infants of mon-
keys and apes are rarely parked for any length of time but,
instead, ride clinging to the fur on their mothers’ chests or
backs (Ross 2001). In the infrequent reports of infant-park-
ing in lieu of riding in higher primates (e.g., occasional in-
stances in pig-tailed langurs, Mentawai Island langurs,
Hanuman langurs, patas monkeys, and talapoins), mothers
either place their infants on the ground or leave them alone
in tree crowns before moving away (Fuentes & Tenaza
1995). Apparently, these unusual instances of baby parking
in anthropoids occur where there are few natural predators
and free the mother “from the potential energetic cost of
carrying the infant” (Fuentes & Tenaza 1995, p. 173). It is
important to emphasize, however, that infant parking is ex-
tremely rare in anthropoids; riding in which the infant does
the clinging is the norm. For this reason, riding was pre-
sumably present in the ancestor of all anthropoids and, al-
though energetically costly to the mother, may have been
strongly selected for because it prevented exposure of
parked infants to parasites (in nests), predation, and infan-
ticide (Ross 2001). Observations of parking and riding
across the primate order suggest that once riding had
evolved it was “difficult to lose... [and] the only lineage in
which riding has been lost... is that leading to Homo sapi-
ens” (Ross 2001, p. 765).
The occasional reports of anthropoid mothers parking or
putting down their young infants are almost always in the
context of maternal foraging, which is significant because
foraging was a primary means by which early hominins
made their living. Since chimpanzee mothers and contem-
porary women in hunting and gathering societies (who use
baby slings) usually forage for food with their infants at-
tached to their bodies, one might assume that early hominin
mothers did too. In this context, it is relevant to consider
the interaction of maternal foraging and infant-riding in a
higher primate species that, like humans (Leutenegger
1972), produces relatively large infants. Mother squirrel
monkeys (Saimiri sciureus), for example, normally carry in-
fants that are less than 17 weeks old on their shoulders and
backs, after which time the infants, having grown to be-
tween one-third and one-half the mother’s size, move about
on their own (Lyons et al. 1998).
Experimental evidence reveals that squirrel monkey
mothers stop carrying their infants at earlier ages and spend
more time foraging when food is relatively scarce and diffi-
cult to find, although they do not decrease the amount of
time they nurse (Lyons et al. 1998). For their part, infants
living under harsh foraging circumstances make frequent
unsuccessful efforts to ride on their mothers compared to
infants living under more optimal conditions. Under diffi-
cult conditions, mother squirrel monkeys focus their energy
on obtaining enough calories to feed themselves and to
nurse their infants. Thus, “by rescheduling some transitions
in development (carry rself-transport), and not others
(nursing rself-feeding), mothers may have partially pro-
tected infants from the immediate impact of an otherwise
stressful foraging task” (Lyons et al. 1998, p. 290). Similar
postnatal foraging-related changes in maternal care have
been reported for free-ranging gelada baboons (Barrett et
al. 1995), long-tailed macaques (Karssemeijer et al. 1990),
and yellow baboons (Altmann 1980).
Although it is the mothers that bear the burden of their
infant’s weight during infant carrying, it is the infants that
usually do the hanging-on in anthropoids, with the excep-
tion of humans. Thus, because chimpanzee infants develop
motor skills relatively rapidly compared to human babies
(again, Difference 1), they are able to cling to the mother’s
furry belly after 2 months of age (Plooij 1984) and to shift
to her back for travel as they grow heavier. During the first
weeks of life, however, it is the mothers themselves that
support and cling to infants, frequently in response to their
distress whimpers or hoos. Human babies, on the other
hand, are born extremely helpless and never develop the
ability to cling unaided to their mothers’ (unfurry) bellies or
backs. This observation is corroborated by literature that
documents a strong grasping reflex in human neonates
(Halverson 1937a; 1937b; 1937c). For example, the ability
of a young infant to support its weight by clinging with one
hand decreases from monkeys to chimpanzees, and is ap-
parently extremely limited in human infants despite the fact
that they are born with strong vestigial grasping responses
(Halverson 1937a). However, even if human babies had the
ability to cling to their mothers’ bellies, it would be difficult
for mature human infants to ride unaided for extended
lengths of time on backs that are habitually oriented verti-
cally rather than horizontally. Infant carrying is therefore
entirely up to the human mother (or substitute) and, as any
mother will attest, growing babies soon become heavy.
Although contemporary hunters and gatherers do not
provide exact models for our hominin ancestors, groups
such as the Ache, !Kung san, and Efe pygmies offer clues
that may help us to formulate hypotheses about the lives of
Plio-Pleistocene hominins, including how mothers may
have cared for infants (Small 1998). As a general rule, care-
Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 499
taking of infants in most non-Western cultures is physically
engaging, with demand feeding, close contact with infants
during the day, and sleeping with them at night being the
norm. In order to go about their business with freed hands,
contemporary women from most of the world’s cultures use
slings to secure their babies onto their backs or hips, or onto
the bodies of older siblings (Small 1998). These habits may
seem strange to Westerners that value and nurture inde-
pendence in very young infants, and thus may permit them
to cry for extended periods or to sleep in separate rooms.
The cross-cultural ethnographic evidence pertaining to
baby slings reinforces the suggestion by Zihlman (1981) and
others that baby slings, perhaps made from vegetal matter,
may have been among the first nonlithic tools that were in-
vented.
In what contexts would infant riding have suffered its set-
back in hominins (Ross 2001), and what would have re-
placed it before the invention and general use of baby slings?
Did evolving hominin mothers revert to the prosimian adap-
tation of parking their babies far away for extended periods
of time while they foraged, despite the threats from para-
sites, predators, and (possibly) infanticidal males? Probably
not. For one thing, parking infants would have severely con-
strained travel distances for lactating mothers, since com-
parative primatological and ethnographic data suggest that
infants would have required frequent nursing bouts through-
out the day (Plooij 1984; Small 1998). Instead, as docu-
mented above for a number of anthropoids, early hominin
mothers may have engaged in foraging-related changes in
maternal care. Unlike chimpanzee mothers, by the time
early hominins had evolved into habitual bipeds that bore
relatively helpless young, it would have been adaptive for
them to adopt a “putting the baby down” strategy in which
mothers periodically put their infants down to release their
hands (and energy) for foraging nearby. That way they could
keep their babies within eyesight and, when ready to move
on, simply pick them up and go.
3.1.1. Using vocalizations to “keep in touch.” Infant park-
ing is a rare event in monkeys, apes, and non-Western hu-
man cultures. When it does occur, infants are usually dis-
tressed by the unusual situation of being separated from
their mothers (Ainsworth et al. 1978; Lamb et al. 1985),
which is frequently conveyed by whimpering or crying.
Parked infant pig-tailed langurs, for example, “cry” by emit-
ting high-pitched squeals intermingled with low-pitched
guttural sounds (Fuentes & Tenaza 1995), while infant rhe-
sus monkeys produce a plaintive series of coos when sepa-
rated from their mothers (Small 1998). Infant chimpanzees
whimper and scream loudly if they begin to fall from their
mothers’ chests while traveling (Plooij 1984). Crying is
qualitatively different in human babies, consisting of rhyth-
mic patterns of vocalizations that entail short, breathy expi-
rations alternating with long intakes of air (Frodi 1985).
Human crying makes use of the lungs and vocal apparatus
much as laughter does; and Provine (2000) notes that “al-
though laughter and cying are considered polar opposites
of the emotional spectrum, they are neurologically linked
and share the features of tearing and rhythmic vocalization”
(p. 187). By around 3 months of age, human infants develop
the ability to modulate their cries to express different emo-
tions such as anger, pain, and frustration (Marler et al. 1992;
Small 1998); and, like babbling, crying may be a precursor
to language (Small 1998).
Although crying is universal in human infants, the degree
to which it is manifested varies with culture. In cultures
where babies spend most of their hours in close physical
contact with adult caregivers, infants engage in relatively
little crying; whereas in cultures that encourage infants to
gain independence by leaving them alone for much of the
time (e.g., much of America) babies cry considerably more
(Small 1998). Small believes that crying of infants today is
little changed from when it first evolved in hominins as a
means for communicating infants’ needs. Furthermore,
crying and parental sensitivity to it are adaptive traits be-
cause they:
evolved to serve the infant’s purposes: to assure protection, ad-
equate feeding, and nurturing for an organism that cannot care
for itself. By definition, crying is designed to elicit a response,
to activate emotions, to play on the empathy of another.... The
caretaker has also evolved the sensory mechanism to recognize
that infant cries are a signal of unhappiness, and thus be moti-
vated to do something about it. (Small 1998, p. 156)
It is noteworthy that crying increases the strength of the
grasping reflex in human infants (Halverson 1937a), which
is consistent with experimental research on American in-
fants which suggests that the major reason that infants cry
is to reestablish physical contact with separated caregivers
(Small 1998; Wolff 1969).
Presumably, early hominin babies were no happier at be-
ing separated from their mothers than are anthropoid in-
fants today, and would have been increasingly likely to vo-
calize distress during the period of evolution when active
infant riding was lost and babies were put down periodically
so that mothers could forage. It is also reasonable to assume
that the crying of their infants would have produced aver-
sive stimuli for early hominin mothers, as it does for con-
temporary monkey (Small 1998), chimpanzee (Plooij
1984), and human (Small 1998) mothers.
What could hominin mothers have done to discourage
separated babies from crying? For one thing, they may have
used a strategy commonly employed by contemporary
Western women, that is, inducing infants to fall asleep be-
fore putting them down. One way to do this would have
been to nurse infants because, if they resembled modern
babies, “an infant who is fully fed or fatigued is likely to be
quiet, if not actually sleepy” (Halverson 1937a, p. 381).
Early hominin mothers may also have used other tactile
strategies to soothe babies before putting them down, for
example, cradling, and rocking – the latter being a coe-
volved “rhythmic, temporally patterned, jointly main-
tained” interaction between mothers and infants (Dis-
sanayake 2000, p. 390). (Perhaps the human habit of
rocking babies to sleep is effective because it produces a
gentle barrage of stimuli that mimics physical contact with
the mother.) The very act of placing babies in horizontal po-
sitions may also have encouraged them to sleep, as sug-
gested by experiments which show that captive chimpanzee
infants that are left horizontally in cradles most of the day
sleep more than wild infants that are carried semi-upright
by mothers (Plooij 1984). In addition to these tactile strate-
gies, hominin mothers may also have used rhythmic, tem-
porally patterned vocalizations to lull infants to sleep: pre-
cursors of the first lullabies (Dissanayake 2000).
What about instances in which hominin infants refused
to sleep and, instead, fussed and cried when mothers put
them down? Perhaps early hominin mothers then re-
sponded “voice to voice.” Already accustomed to regulating
Falk: Prelinguistic evolution in early hominins: Whence motherese?
500 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
older infants’ travel with vocalizations as chimpanzee moth-
ers do today, early hominin mothers may have elaborated
calls from their vocal repertoires into affectively positive,
rhythmic melodies as a means, not only to lull them to sleep,
but to reassure them that “mommy is near” when they were
awake (a kind of vocal rocking
2
, or non-tactile way of “keep-
ing in touch”). In a sense, then, prosodic utterances would
have become disembodied extensions of mothers’ cradling
arms. This suggestion is consistent with the fact that singing
to human infants to provide comfort and ease unhappiness
is a derived practice that appears to be cross-culturally uni-
versal (Trainor et al. 1997). It is also consistent with the
finding that a “squealing baby, in fact, can be stopped dead
in its vocal tracks by a sudden stream of baby-talk” (Small
1998, pp. 145–46).
The argument that mother-infant communication shifted
away from being based almost exclusively on direct physi-
cal contact between the signaler and recipient (as baby
clings to mother) to being distal (when baby is regularly put
down) also applies to gestural communication. For exam-
ple, while most chimpanzee gestures involve physical con-
tact between the signaler and the recipient, the earliest ges-
tures of developing humans do not, that is, like vocal
communications, they have become distal (Tomasello &
Camaioni 1997). Facial expressions are believed to have
been important during the evolution of speech (Schmidt &
Cohn 2001), and would have enhanced communication be-
tween hominin mothers and their nearby babies. Putting
infants down may also have had a significant impact on the
development of certain circular and imitative self-teaching
devices (Baldwin 1906; Piaget 1952) that are hypothesized
to have been uniquely associated with the evolution of sym-
bolic communication in higher primates, especially humans
(Gibson 1986; 1990; 2001; Parker 1993; 1996). For exam-
ple, a secondary circular reaction (Piaget’s 3rd stage) occurs
in babies that are 3 to 5 months when they persistently fo-
cus on the contingent behavior between their hands and
inanimate objects (Parker 1993) and “the midline supine
posture . . . focuses the infant’s eyes on both hands” (Parker
1993, p. 318). The fact that the “putting the baby down” hy-
pothesis entails continuity in the evolution of prelinguistic
vocalizations of early hominins from the vocalizations of ape
ancestors does not mean that gestural communication is
not, or was not, an important complement to speech-based
communication (Armstrong et al. 1994; Corballis 1999;
2002; Hewes 1973; King 1996; Rizzolatti & Arbib 1998;
Tomasello 1999; Tomasello & Camaioni 1997).
3.2. The broader evolutionary context
3.2.1. The emergence of protolanguage from prelinguis-
tic behaviors. Just as ID speech of women first expresses
emotions and engenders them in infants, and later becomes
instrumental in socializing and influencing their behaviors
(Fernald 1994; Monnot 1999), the prosodic ID vocaliza-
tions of hominin mothers would have taken on less emo-
tional and more pragmatic aspects as their infants matured.
As is true for human babies toward the end of their first
year, prosodic (and gestural) markings by mothers would
have helped early hominin infants to identify the meanings
of certain utterances within their vocal streams (semantic
bootstrapping, Pinker 1987; 1994). Over time, words would
have emerged in hominins from the prelinguistic melody
(Fernald 1994, p. 65) and become conventionalized. The
prosodic elements of prelinguistic vocalizations would have
contributed not only to hominins’ eventual semantic grasp
of utterances, but also to their acquisition and shaping of
numerous sensitivities (phonology, boundaries between ut-
terances, monosyllabic utterances, syntax, dialogue struc-
ture, and auditory memory for vocal utterances) that, ulti-
mately, became entailed in linguistic evolution.
That said, speculation abounds about the precise nature
of protolanguage. For example, it has been suggested that
the earliest language might have had nouns and verbs, but
lacked affixes, functional categories (Heine & Kuteva
2002), and true syntax (Newmeyer 2002). Whatever the ex-
act configuration of protolanguage, however, certain con-
jectures about its emergence are relevant for the discussion
of prelinguistic evolution. Thus, protolanguage is thought
to have been relatively simple grammatically (Heine &
Kuteva 2002), essentially pragmatic in nature (Givon 1979),
and may have developed in early Homo “directly from the
requirements of group foraging... and instruction of the
young” (Bickerton 2002, p. 209). Although foraging is em-
phasized here as the context in which prelinguistic behav-
iors were initially selected, it is worth noting that the
mother-infant dyad is fundamentally social and that, con-
sistent with Dunbar’s (1993) emphasis of selection of lan-
guage for “vocal grooming”:
As soon as protolanguage had achieved the necessary critical
mass (some dozens or perhaps a few hundred meaningful sym-
bols, whether oral or manual is immaterial to the present argu-
ment) it was undoubtedly co-opted for a variety of social pur-
poses, which in turn contributed to its further expansion.
(Bickerton 2002, p. 209)
Thus, instead of remaining static over time (uniformitari-
anism), once protolanguage appeared, it presumably con-
tinued to evolve in a socially meaningful, dynamic, chang-
ing, and directional manner (Newmeyer 2002).
The “putting the baby down” hypothesis is based on two
fundamental premises. First, hominin mothers that at-
tended vigilantly to their infants would have been strongly
selected for; and, second, those mothers would have had a
genetically based potential for modifying their vocal and
gestural repertoires to shape and consciously control the
behaviors of their offspring. The first premise is widely ac-
knowledged to be the case for a variety of primates (and, in-
deed, other mammals), including monkeys (Small 1998),
chimpanzees (Goodall 1986; Plooij 1984), and people
(Small 1998). Not all primate mothers are equally attentive
to their infants, however, and a “natural experiment” on a
mother-infant chimpanzee pair at Gombe supports the sug-
gestion that selection may have intensely favored early ho-
minin mothers who developed a strategy for monitoring in-
fants that lost the ability to cling to their bodies during
travel, as well as infants that vocalized their distress upon
becoming separated:
Madam Bee had raised two infants successfully when one of her
arms was paralyzed during a presumed polio-epidemic.... The
two infants that were born afterwards died within a few months.
I had the occasion to make observations on the first of these two
infants: Bee-hind. Her body was full of wounds and scratches,
so she must have fallen repeatedly. Whenever her mother
moved about without supporting her, she whimpered and
screamed continuously. (Plooij 1984, pp. 45–46)
Just as there is a good deal of variation in the degree to
which healthy chimpanzee mothers living in the wild sup-
port and carry their infants (Plooij 1984), variation in the at-
Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 501
tention provided to infants by hominin mothers would have
provided the raw material upon which natural selection op-
erated. As detailed above, humanlike crying and mothers’
sensitivity to it probably evolved in early hominins to assure
protection, adequate feeding, and nurturing for babies that
could not care for themselves. If the hypothesis presented
here is correct, hominin babies were increasingly put down,
in which case maternal visual attention to gesture and facial
expression would also have acquired high selective valiance.
As noted by Schmidt and Cohn (2001), the fitness effects of
maternal attention to facial expression of infants “are po-
tentially great, considering the intense social and nutri-
tional needs of the infant, as well as possible risks associated
with lack of maternal attention, including failure to thrive,
physical danger, and at the extreme, death from neglect or
abandonment” (p. 12).
The second premise that early hominin mothers would
have had a genetically based potential for modifying vocal-
izations and gestures consciously to control infants is con-
sistent with recent studies that suggest that pitch discrimi-
nation is highly heritable (Drayna et al. 2001), that the
volumes of gray matter in Broca’s and Wernicke’s language
areas of the brain are highly heritable (Thompson et al.
2001), and that the orofacial motor sequencing upon which
speech depends is under strong genetic control (Lai et al.
2001). Thus, in humans, a point mutation in one gene
(FOXP2 on chromosome 7) severely disrupts the ability to
select and sequence fine movements of the mouth and
tongue (a praxic problem) that are necessary for articulate
speech (Lai et al. 2001). Affected individuals tend to garble
pronunciation, put words in the wrong order, and have trou-
ble comprehending grammar and speech sounds, including
sentences. Although the exact function of FOXP2 is un-
known (it may help to regulate embryonic development),
this gene appears to be necessary for the development of
normal spoken language (Lai et al. 2001), and may have
been a target of selection during recent human evolution
(Enard et al. 2002).
Fascinating research on language acquisition in hearing
and deaf subjects strongly suggests that, rather than being
“hard-wired” to process only vocal language, humans are
genetically predisposed to detect aspects of the temporal
and distributional regularities which correspond to prosodic
and syllabic levels of signed or spoken languages (Petitto
2000). Thus, while certain aspects of abstract grammatical
patterning of natural languages may, indeed, be hard-wired
in our species (Donald 1993; Pinker & Bloom 1990), Petitto
offers a persuasive argument that language acquisition is
nevertheless neurologically plastic and biologically flexible
because it can be acquired and expressed easily via the
hands or tongue. (This is not meant to deny the primacy of
vocal over sign languages. All normal people acquire
speech; relatively few learn sign languages.) The dominant
mode in which natural language is expressed is determined
largely by infants’ biological circumstances (e.g., hearing,
deaf) (Petitto 2000), while the particular flavor of language
that they learn (e.g., Chinese, English) is clearly a product
of their cultures.
Just as certain referential calls of vervet monkeys
(Cheney & Seyfarth 1990) and over 30 discrete calls of
chimpanzees from Gombe (Goodall 1986) are produced
and interpreted similarly by members of their respective
social groups, protolinguistic utterances of early hominins
would have become conventionalized across their groups.
But how could the cultural propagation of specific utter-
ances that resulted from a genetically driven propensity to
produce natural protolanguage have happened? Although
a review of the extensive literature on social transmission in
nonhuman primates is beyond the scope of this paper, it is
interesting to consider how protocultural innovations that
arose in foraging contexts were socially transmitted, pri-
marily by mothers and youngsters, in at least one species.
As is well documented for the innovations of sweet potato-
washing and wheat-washing that “were invented” by a fe-
male Japanese macaque named Imo (Kawai 1965), the
process of propagation of new behaviors may have gone
through two stages: In the initial “Period of Individual
Propagation” (Kawai 1965, p. 5), novel behaviors are trans-
mitted between youngsters, and from them to older fe-
males and siblings. After the behaviors became fixed (adult
males being the last to acquire them), a second “Period of
Pre-cultural Propagation” (Kawai 1965, p. 8) ensues in
which infants learn the behaviors from their mothers and
the practices are thus passed to future generations.
3
If one
applies this model to early hominins, once bipedal mothers
began using vocalizations to reassure and instruct their in-
fants, processes similar to those documented for Japanese
macaques could have facilitated the use, sharing, and un-
derstanding of utterances between youngsters and from
youngsters to their mothers. As youngsters matured into
adults and these utterances became fixed across all mem-
bers of groups (conventionalized), new generations of in-
fants would begin acquiring the vocalizations from their
mothers. This is one example of how individually developed
“words” could have come to be shared. It is also worth men-
tioning that the calls of different groups of chimpanzees are
now thought to have different cultural dialects (Gibbons
1992; Mitani & Brandt 1994; Mitani et al. 1992), which is
consistent with the possibility that multiple dialects of pro-
tolanguage may have eventually arisen.
3.2.2. What’s in a name? Although the exact nature of pro-
tolanguage is (I believe) unknowable, one may at least
speculate about the referents for the first protolinguistic
words (or, rather, their English equivalents). Many work-
ers assume that naming was the basic protolinguistic vocal
behavior (Harnad 1996a; Horne & Lowe 1996); that a
study of the origin of names is a study of the origin of sym-
bolic categories (Harnad 1996b); and that naming was
eventually transformed into language by “enhancing the
ability of hominids to comment on and think about the re-
lationships between things and events, that is, by enabling
them to articulate and communicate complex thoughts”
(Armstrong et al. 1994, p. 354). But what concrete cate-
gories would the very first names have referred to? Possi-
ble answers include “kinfolk, tribesmen, enemies, foods,
predators, weather conditions, tools, places, discomforts,
[and] dangers” (Harnad 1996b). With respect to the kin-
folk category, recent research on the English word “Mama”
(Goldman 2001; MacNeilage 2000; Tincoff & Jusczyk
1999) is particularly relevant for the “putting the baby
down” hypothesis. According to MacNeilage (1998; 2000),
“Mama” is an example of two successive cycles of a pure
frame (i.e., utterances generated by mandibular oscillation
alone, with the tongue held still), each of which begins with
a consonant and ends with a vowel, which MacNeilage be-
lieves probably typified earliest speech. A study of 75 in-
fants of less than 6 months of age revealed that babies be-
Falk: Prelinguistic evolution in early hominins: Whence motherese?
502 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
gan producing “Mama” at a modal age of 2 months, usually
as part of a cry (Goldman 2001). The results showed that
some infants uttering “Mama” appeared satisfied if a fa-
vorite caretaker approached and paid attention to them,
whereas others also needed to be picked up. Another study
revealed that, by the time infants are 6 months of age, they
understand that the word “Mama” specifically refers to
“my Mom” (rather than to any woman), which suggests
that they have begun to form a lexicon with sounds that are
linked directly to socially significant people (Tincoff &
Jusczyk 1999). Thus, it does not seem unreasonable to sug-
gest that the equivalent of the English word “Mama” may
well have been one of the first conventional words devel-
oped by early hominins. After all, wouldn’t maturing
prelinguistic infants, then as now, be inclined to put a name
to the face that provided their initial experiences of
warmth, love, and reassuring melody?
4. Concluding thoughts
Motherese has provided a rich source of information for
this discussion, which is appropriate since it is the only
available model for elucidating how humans universally
acquire spoken languages today, and therefore may have
acquired them in the past. The behaviors of primate (in-
cluding human) mothers, of course, are pivotal for perpet-
uating their genes (and their offsprings’) into future gen-
erations. The central thesis regarding motherese is that
bipedal mothers had to put their babies down next to them
periodically in order to go about their business, and that
prosodic vocalizations would have replaced cradling arms
as a means for keeping the little ones content. It is not a
stretch to suggest that such vocalizing (and the elaboration
of distal gestures) would have had strong selective value. It
is reasonable to speculate that by the time individuals
across social groups began to originate and conventionally
share simple instructive utterances, protolanguage was in
the process of emerging from the prelinguistic melody.
Whatever its precise nature, however, protolanguage and
the other languages that eventually evolved would, forever
after, retain some of that melody. Thus, rather than being
totally separate from language (Burling 1993), tone of
voice represents a signature from its very origin that, as
transpired for the cosmic microwave background signature
left over from the Big Bang, should be recognized and in-
vestigated.
It is hoped that readers will consider the ideas developed
in this article as possible alternatives to suggestions that lan-
guage could not have emerged from an earlier primate
communication system (Burling 1993; Hurford 2002); that
it was evolved primarily for internal thought and only ap-
plied secondarily to communication with conspecifics
(Burling 1993); and that the Upper Paleolithic record of
artwork indicates it evolved only recently (Davidson & No-
ble 1989). That said, the precise role of gesture during
prelinguistic evolution and the exact nature of the first lan-
guage are likely to remain academic bones of contention
until we get the time machine. In the final analysis, how-
ever, at least the suggestion that true syntactic language
probably did not evolve until after the emergence of the
genus Homo around 2 million years ago (Corballis 2002;
Rizzolatti & Arbib 1998) rings true to many, if not most,
workers.
ACKNOWLEDGMENTS
I thank Robin Dunbar, Michael Corballis, Bill Parkinson, and nu-
merous referees for helpful comments. Ann Powell is gratefully
acknowledged for library research.
NOTES
1. Although human fathers engage in motherese (or par-
entese), chimpanzee fathers are unrecognized in the wild and
adult males interact relatively little with infants. The parental fo-
cus of the present analysis is therefore on females.
2. At least one familiar western lullaby reifies the concept of
lullabies as substitutes for physical rocking. Interestingly, it also
refers to infants falling from trees (or off traveling mothers?). One
might therefore suggest (somewhat fancifully) that this lullaby
soothes a primordial fear retained from the time when hominin
mothers still slept with infants in tree nests (cradles of boughs), as
chimpanzee mothers do. Primary stresses are capitalized and un-
derlined; secondary stresses are underlined (modified slightly
from Trainor et al. 1997, p. 388):
ROCK-a- // bye / ba- // by / ON the // tree / top ///
WHEN the // wind / blows // the / CRA- / dle / will / rock ///
WHEN the // bough / breaks // the CRA- / dle /will / fall ///
And // / DOWN / will / come / Ba- // by / CRA- / dle / and / all ///
3. Another invention of female Japanese macaques that applies
to a natural (rather than provisioned) food was not propagated in
this way. Nakamichi et al. (1998) report that 11 free-ranging adult
females pulled grass roots from the ground, carried them to a river,
and washed them (sometimes on flat stones), but that this behav-
ior did not propagate to most of the group. Six of the animals were
from one matriline, and two others were a mother and her adult
daughter. The authors speculate that root-washing did not spread
widely for several reasons: Roots are eaten only during a brief pe-
riod of the year; carrying is not common among macaques in nat-
ural environments; and pulling long roots from the ground would
have been difficult for juveniles. One might therefore conclude
that, to become conventionalized, an invented behavior must be
possible during much of the year, feasible for juveniles as well as
adults, and utilize anatomical substrates that are widely available.
Open Peer Commentary
Prelinguistic evolution and motherese:
A hypothesis on the neural substrates
Francisco Aboitiz
a
and Carolina G. Schröter
b
a
Departamento de Psiquiatría y Centro de Investigaciones Médicas, Escuela
de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile;
b
Facultad de Medicina, Universidad de Chile, Santiago, Chile.
faboitiz@puc.cl
Abstract: In early hominins, there possibly was high selective pressure for
the development of reciprocal mother and child vocalizations such as pro-
posed by Falk. In this context, temporoparietal-prefrontal networks that
participate in tasks such as working memory and imitation may have been
strongly selected for. These networks may have become the precursors of
the future language areas of the human brain.
Falk proposes a hypothesis on prelinguistic evolution in early hu-
mans that is based on the development of prosodic vocal commu-
nication between mothers and their babies. Vocal communication
was used as a mechanism to compensate for the lack of physical
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 503
contact between mother and child during foraging, a consequence
of the acquisition of a bipedal posture. We welcome this attempt
to incorporate evolutionary theory into the question of language
origin, especially in the context of a still-widespread notion that
some aspects of language may not have evolved gradually by a
mechanism of natural selection.
We would like to contribute to Falk’s hypothesis with a more
neurobiological perspective. More specifically, our commentary is
related to the neural substrates for the development of reciprocal,
vocal interaction between mother and child and its relevance to
the elaboration of increasingly complex, learned utterances that
served for communication and eventually became language. Pre-
viously, Aboitiz and García (1997b) proposed a hypothesis for the
origin for the neural substrate for language based on the notice-
able similarity between the neural networks involved in language
and the neural networks required for auditory working memory.
The neural networks involved in language may have evolved as a
specialization of widespread parietotemporal-prefrontal networks
in the cerebral cortex, which were involved in, among other things,
generating a working memory device for processing and learning
complex vocalizations. A critical aspect of language origins is that
language probably developed as part of a communicative system
that was learned by imitation. The process of learning increasingly
complex phonological utterances requires a specialized neural sys-
tem based on the ability to internally rehearse a phonological tem-
plate and to compare the output with the internal representation.
For this, a sophisticated working memory system is needed, which
is provided by parietotemporal-prefrontal networks connecting
higher order auditory and multimodal association areas. The clas-
sical Wernicke’s and Broca’s language areas, which are supposedly
connected via the arcuate fasciculus, may have evolved as a spe-
cialization of these original networks involved in phonological im-
itation.
Interestingly, recent findings (Chaminade et al. 2002; Iacoboni
et al. 1999; Moo et al. 2003) indicate that parieto-frontal networks
are involved in imitative learning of hand movements. As Falk
mentions, mother and child interactions imply a complex, multi-
modal set of signals – phonological, prosodical, and gestural –
which the child imitates, perhaps by using these complex pari-
etotemporal-prefrontal networks. The physical detachment of
mothers and their babies during foraging activities may well have
put emphasis on vocal communication and the development of
relatively complex phonological signals to help mother and child
recognition. Therefore, we postulate that under these conditions,
early human mothers engaged in reciprocal vocal interactions with
their babies in which the parietotemporal-prefrontal circuits of
mother and child became, so to speak, “locked” in such a way that
the child incorporated the exaggerated gestures and vocalizations
of the mother, generated a template and rehearsed these in the
presence of the mother, who then produced new vocalizations or
repeated the same ones.
The acquisition of complex phonological utterances probably
occurs in ontogenetic stages somewhat older than those in which
baby talk normally takes place. However, some of the elements of
baby talk, such as the exaggerated gestures, prosody, and smiling,
possibly serve to generate appropriate internal representations of
social signals in the child and also to establish close contact be-
tween mother and child. These representations will permit the
mother and child’s locking of parietotemporal-prefrontal circuits
that enable both of them to establish reciprocal, conversational in-
teractions by which the child eventually masters complex com-
municational signals. Furthermore, this locking of parietotempo-
ral-prefrontal circuits probably serves as the basis for the
development of more complex reciprocal interactions between
older individuals, which may be the precursors of true conversa-
tions.
The sequence of events we are proposing – an early engage-
ment in baby talk between mother and child, which initiates the
coordinated activation of parietotemporal-prefrontal networks in
both of them and leads to the generation of increasingly complex
communicative networks, which eventually result in conversa-
tional behavior between older individuals – is basically ontoge-
netic. In phylogeny, it may be that initially there was an ontoge-
netic limit to the development of these neural networks by
reciprocal social interactions, but with the increase in brain size
during hominin evolution, these networks may have become more
complex and more plastic, gradually releasing the barriers which
limited the development of reciprocal social, protolinguistic in-
teractions. This opened the possibility of living in a world of con-
versations which transmitted internal, emotional states but also re-
ferred to the surrounding world.
Hemispheric dominance for language may have evolved as a
consequence of the development of slightly different tem-
poroparietal-prefrontal networks in both hemispheres, perhaps
because these networks might not have been fully compatible
within one hemisphere. In this context, the recent finding that the
same stimulus can be processed by the left or the right hemisphere
depending on the task to be performed (Stephan et al. 2003) sug-
gests that hemispheric asymmetry might rely at least in part on dif-
ferences in large-scale neurocognitive networks. It is thus con-
ceivable that the neural network involved in some aspects of
mother-and-child reciprocal vocal interactions was somehow bi-
ased to develop in one side of the brain. (An additional factor pro-
moting brain asymmetry may have been the increasing transmis-
sion delay across the corpus callosum in large brains, which
impaired adequate coordination of both hemispheres for certain
tasks; Aboitiz & Montiel 2003.)
In summary: mother-and-child reciprocal vocalizations, which
in human evolution possibly increased as a consequence of a
“putting the baby down” strategy, were probably based on inter-
locked temporoparietal-prefrontal networks between mother and
child. These networks may have served as templates for the de-
velopment of more sophisticated neural networks, which permit-
ted acquisition of a syntactically based language.
Mothering plus vocalization doesn’t
equal language
Derek Bickerton
Department of Linguistics, University of Hawaii, Honolulu, HI 97822.
bickertond@prodigy.net
Abstract: Falk has much of interest to say on the evolution of mothering,
but she fails to address the core issue of language evolution: how symbol-
ism or structure evolved. Control of infants does not require either, and
Falk provides neither evidence nor arguments supporting referential sym-
bolism as a component of mother-infant interactions.
Falk has put together a very thorough and detailed account of
child-rearing practices among humans and chimpanzees. This ac-
count sheds considerable light on the origins of speech. Unfortu-
nately, it tells us nothing about the origins of language.
All too many researchers – Falk, alas, is one of them – still seem
to regard the terms “speech” and “language” as synonymous and
interchangeable. They are not. Speech, like signing, is a modality.
Language is a system of expression, one that may function by
means of speech, sign, Morse code, talking drums, smoke signals,
naval flags, and, doubtless, modalities not yet conceived; or it may
keep its productions within the individual’s brain, not employing
any modality at all. Until the distinction between speech and lan-
guage is clearly grasped, little progress will be made in our un-
derstanding of language evolution.
It is highly likely that the exigencies of maternal care formed a
significant factor in determining that the preferred modality for ex-
pressing language should be speech, with sign as a fallback for those
with impaired auditory systems. If an infant is running away from
you, a sound will work better than a gesture. It is even possible that
the distinctive prosodic patterns of “motherese” evolved directly
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
504 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
from the maternal calls of our prelinguistic ancestors. However,
these are characteristics of speech, not of language. Had evolution
proceeded by slightly different steps, language might have emerged
with signing as its preferred modality, with speech as a fallback for
those with impaired visual systems. Language has two major (along
with many minor) distinguishing features that have nothing to do
with modality and that persist regardless of the modality chosen:
One, it employs referential symbols, and two, it assembles those
symbols to form structured wholes. How symbols emerged and how
structures emerged are consequently the two most basic questions
in language evolution. Falk’s work, although useful and thorough,
entirely fails to address either of these questions.
Although referential use of symbols is basic even to a struc-
tureless protolanguage, Falk gives us no reason for supposing that
mother-infant exchanges would have needed any referential con-
tent. Reassurance, disapproval, warning, and all the other types of
messages required for such exchanges could have been – indeed,
most probably were – conveyed by prelinguistic means: a quick
grab and/or a quick slap are often preferred even by modern hu-
man mothers to “That berry is poisonous, darling, so don’t eat it,”
and are certainly no less swift and efficient. Senseless, soothing
sounds work far better at calming a fractious baby than the most
persuasive of verbal arguments. Naturally, when language came
along, those soothing sound patterns would have been co-opted to
form part of the very specialized genre employed by mothers and
other caregivers to address infants. There are two developmental
histories here, however, not one; that these may now have become
linked carries no implications whatsoever for their earlier phases.
We are all continuists nowadays – at least in the sense that no-
body worth taking seriously believes that human capacities derive
from anything but generally accepted evolutionary processes. The
people whom I have called continuists in the past would be better
described as genre continuists – they believe, with a certainty
sometimes bordering on the dogmatic, that every human capacity
developed from a similar prehuman capacity (if language com-
municates, it must have come from prehuman communication).
Why don’t they come straight out and deny the existence of exap-
tation? I see no support for genre continuism in the work of Dar-
win or any other leading evolutionist, but Falk seems to accept it
without questioning either its basis or its antecedents.
Falk claims that “vocalizations of hominin mothers would have
taken on less emotional and more pragmatic aspects as their in-
fants matured.” How? Why? No reason is given, no description or
explanation of the process is offered – it is taken on trust. She
claims that “prosodic (and gestural) markings by mothers would
have helped early hominin infants to identify the meanings of cer-
tain utterances.” This totally begs the question. For the infant to
identify those meanings, they first had to be there, in mother’s ut-
terances. But how did she acquire them? Are we in for an infinite
regress? Impossible to say, because the cited remarks in section
3.2.1 constitute virtually the totality of Falk’s proposals about how
meaning got into language.
In her conclusion, Falk expresses the hope that “readers will
consider the ideas developed in this paper as possible alternatives
to suggestions that language could not have emerged from an ear-
lier primate communication system.” However, those ideas can-
not be alternatives because they address only the emergence of
the speech modality and not that of language itself. Those of us
who reject genre-continuist scenarios do so because none of those
scenarios have come anywhere near explaining how symbolic units
evolved or how syntactic structure evolved. Frankly, I cannot see
why the issues involved should be as opaque as they seem to be.
When the obvious becomes invisible, ideological blinkers are
often to blame. Usually it is those who reject genre continuism
who are accused of ideology. For instance, in a recent BBS com-
mentary, Crawford (2002) stated that our species has “a very
strong desire to be special,” a desire that “sometimes hinders our
attempts to understand our nature.” It wouldn’t bother me one
iota if it turned out that humans were as banal as grasshoppers.
However, it remains a fact, not a desire or a conjecture, that hu-
mans do have one or two unique adaptations, which include sym-
bolic, referential units and the ability to link these in rule-gov-
erned (and potentially infinite) structures. Some of us want to
have those adaptations explained – not just explained away. What’s
so special or ideological about that?
Which came first: Infants learning language
or motherese?
Heather Bortfeld
Department of Psychology, Texas A&M University, College Station, TX
77843. bortfeld@psyc.tamu.edu http:// people.tamu.edu/ ~bortfel/
Abstract: Although motherese may facilitate language acquisition, recent
findings indicate that not all aspects of motherese are necessary for word
recognition and speech segmentation, the building blocks of language
learning. Rather, exposure to input that has prosodic, phonological, and
statistical consistencies is sufficient to jump-start the learning process. In
light of this, the infant-directedness of the input might be considered su-
perfluous, at least insofar as language acquisition is concerned.
A topic of much speculation among researchers who study lan-
guage acquisition is the observation that caretakers consistently
address their infants with a unique tone and manner of voice. This
form of speech has come to be known as infant-directed, or moth-
erese. The apparent universality of this phenomenon only serves
to underscore it as (possibly) a key factor in humans’ easy passage
through the early stages of language learning. Theorizing about
the relevance of motherese to language acquisition began years
ago (e.g., Ferguson 1964; Fernald 1984), and efforts to causally
link the two phenomena have been a fixture in the language-learn-
ing literature ever since. In particular, as questions about the un-
derpinnings of infants’ remarkable ability to acquire language
have become increasingly complex, appeals to universally avail-
able “guides” in this process have become increasingly common.
Although there is still no consensus about the relative contribu-
tion of the way caretakers address infants to the total language-
learning toolkit, there is general agreement that this strange way
of talking to infants must serve some evolutionary purpose. In her
article, Falk addresses a paradox that is often neglected: If moth-
erese evolved to help infants acquire language, how did those who
initially produced it learn to speak? The evolution of language it-
self is perhaps the more appropriate link between motherese and
language learning, as suggested by Falk.
Learning to recognize spoken words is a difficult task. Instances
of words vary phonetically and acoustically, depending on the dis-
cursive, syntactic, and phonological contexts in which they occur.
This is in addition to variations introduced by changes in talker
identity, speaker affect, and so forth. At the earliest stages, word
recognition (e.g., grouping tokens according to type) must be
guided by features of the tokens themselves. Although it remains
unclear precisely which aspects of the auditory signal initiate
recognition, acoustic prominence – an important characteristic of
infant-directed speech – is one factor that has been considered in-
fluential in jump-starting this process. Data supporting this view
indicate that infants generally prefer to listen to acoustically
salient speech, where “salient” can mean either affectively or em-
phatically so. However, my own and others’ recent research indi-
cates that when it comes to recognizing a variety of tokens as in-
stances of a particular type (e.g., a specific word), infants are more
sensitive to the acoustic similarity among tokens than to their
acoustic salience. Infants appear to preserve substantial memory
for acoustic detail from one encounter with a word to the next,
which in turn guides early word recognition. If word learning were
predominantly guided by acoustic prominence, then we should
not see this kind of sensitivity to acoustic differences.
Infants face another difficulty when it comes to speech segmen-
tation. In fluent speech, words are not separated by pauses, and the
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 505
cues that may serve to signal word boundaries vary from language
to language. Nevertheless, and despite these challenges, normally
developing infants begin to succeed at recognizing words in fluent
speech as early as 7.5 months of age ( Jusczyk & Aslin 1995). This
has been attributed, in large part, to caretakers’ tendency to repeat
content words when addressing their infants. In fact, repetition of
the full form of a word is perfectly reasonable – even expected –
in speech directed to infants, and this repetition is quite distinct
from the reduction to pronominal form that occurs across mentions
of content words in adult-directed speech. Although repetition is
often listed as one of many characteristics of speech directed to in-
fants, it is generally viewed as subordinate to the prosodic quality
of such speech. So although repetition appears to be an important
feature in guiding speech segmentation, it is not the aspect of
motherese that is most often referred to as influential in language
learning. However, as Falk points out, naming is a fundamental as-
pect of theories on the origin of symbolic categorizing, and so
should play an important role in theories of language acquisition.
In other work, my colleagues and I have observed that infants’
early recognition of highly familiar words (e.g., their own names)
serves to anchor them in the speech stream for subsequent pars-
ing. Our data indicate that not only will 6-month-old infants listen
longer to a word previously paired with their own name in fluent
speech relative to a word paired with another name, but they do
not listen reliably longer to the word that followed another name
relative to a nonfamiliarized control word. These findings hold
even when infants’ first names are replaced in fluent speech with
the words “mommy” or “mama” (depending on how an infant’s
caretaker refers to herself when addressing her infant). This is the
youngest age at which speech segmentation has been shown to
take place, indicating that infants’ recognition of their own names
and the names of important others is a basic tool that they can use
to break into the speech stream early on.
Both sets of findings reported here are consistent with Falk’s
analysis. Her observations are an important step toward refocus-
ing the debate about the origins of motherese from one about how
infants learn language (in all its complexity) to one about how lan-
guage itself evolved. Although the relevance of one set of ques-
tions to the other should be apparent, there has been a general un-
willingness on the part of those who study language acquisition to
address the language evolution side of the debate. In avoiding that
question, we have missed the most logical link between motherese
and language learning. Rather than viewing motherese as a bias
on the part of caretakers that has led infants to acquire language,
motherese may instead be considered the egg that begat the
chicken itself. That is to say, without motherese there would per-
haps be no language at all. With this analysis, arguments about the
underpinnings of language acquisition come full circle (e.g., back
to claims about the origins of language). Falk has mapped out an
elegant way of thinking about this problem. One can only hope
that it will influence how researchers think about the influence of
motherese on language learning.
How plausible is the motherese hypothesis?
Paul Bouissac
French Linguistics, University of Toronto-Victoria College, Toronto, Ontario
M5S 1K7, Canada. paul.bouissac@utoronto.ca
http:// www.semioticon.com/Bouissac /Home.htm
Abstract: Falk’s hypothesis is attractive and seems to be supported by data
from primatology and language acquisition literature. However, this etio-
logical narrative presents a fairly low degree of plausibility, the result of
two epistemological fallacies: an implicit reliance on a unilinear model of
causality and the explicit belief that ontogeny is homologous to phylogeny.
Although this attempt to retrace the early emergence of prelinguistic ca-
pacities in hominins falls short of producing a compelling argument, it
does call attention to an aspect of linguistic behavior which may indeed
have evolved under the pressure of nurturing constraints.
Dean Falk notes in her conclusion that only the invention of the
time machine could bring to a close the academic controversies
concerning the origin of language. That may be true, but in the
meantime we have no choice but to debate the plausibility of var-
ious narratives and to construct arguments based on indirect data.
The reasoning may appear more or less compelling and the data
more or less relevant. It ensues that theories of the origin of lan-
guage differ by their degree of plausibility within the confines of
commonsense logic or “bounded rationality” (Gigerenzer & Sel-
ten 2001). The linguists who promote the genetic mutation hy-
pothesis develop an argument by default: It is not because there
is yet any direct evidence that this is the case but, rather, because
they cannot explain otherwise the apparent ease with which chil-
dren universally master language in spite of the assumed incom-
pleteness of the input. Those who prefer the gradualist approach,
following Darwin’s view that evolution proceeds through small
changes selected by the environment, look for the steps that may
have led to human language either as an emergent phenomenon
or as a cumulative process. Dean Falk proposes a narrative of the
latter kind. The question is not whether she is right or wrong – be-
cause by her own admission there cannot be any definite answer
– but what is the degree of plausibility of her motherese hypothe-
sis.
Falk’s argument is attractive because it locates the origin of lan-
guage in the mother-child dyad, which is the locus of language
transmission among modern humans, but its degree of plausibil-
ity is not very high due to questionable epistemological assump-
tions. First, the argument follows a unilinear model of causality.
Looking for a single cause as the source of an event that has con-
sequences for our survival is certainly statistically adaptive. Homo
has evolved cognitive strategies to locate quickly the origin of dis-
ruptions in its environment as the best way to control a potential
danger. If the cause is correctly identified, predicting what comes
next is usually easier. Wasting time scanning too much informa-
tion in order to get a complex picture of various factors can be fa-
tal in real life conditions. Scientific inquiry is likewise driven by
the urge to identify causes; however, assuming a single cause for
each phenomenon is an epistemological fallacy. There are other
candidates, in the same order of possible origins of articulate lan-
guage, such as primate vocalizations with meaningful acoustic
variants (Owren et al. 1997; Rendall et al. 1999; Zimmerman et al.
1995) or intensive vocal interactions aimed at courting, pair bond-
ing (Deacon 1997, pp. 385–410), expressing commitments (Silk
2002), and social grooming (Dunbar 1997), which are not mutu-
ally exclusive and are equally plausible as adaptive vocal behaviors
leading to protolanguage, and which could even be construed as
prerequisites for motherese, rather than the reverse as Falk
claims. Her argument arbitrarily isolates the mother-child dyad
and locates it artificially in relatively safe surroundings. To many,
in agreement with Falk, a gradualist hypothesis appears more
plausible, if only because the social nature of language makes it
unlikely that a single mutation of this importance in a single indi-
vidual could have been selected. Since simultaneous identical mu-
tations are highly improbable, the proponents of a sudden emer-
gence now tend to favor exaptation as their explanatory principle
(Pinker 1994).
This, of course, is not incompatible with Falk’s argument, but
neither does it mean that a single cause can account for the evo-
lution of language. Moreover, the single point of origin considered
by Falk, in spite of its multimodal framework, fails to explain how
the most specific properties of human languages such as referen-
tiality and syntax could have developed from these early phatic vo-
calizations. Invoking bootstrapping to explain the move from
phatic to referential communication begs the question if it is not
explicitly shown how it may have proceeded, and under which
evolutionary constraints. Finally, the use of the term “motherese”
is particularly misleading in this context because, in contemporary
pragmatics, it applies to the mother’s or other adult’s ID verbal
productions which tend to distort and simplify already fully con-
stituted languages. If motherese presupposes the possession of a
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
506 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
language, does not Falk’s logical argument collapse? Or, should
she have used protomotherese and then explained under which
constraints motherese evolved from protomotherese? The root of
this aporia may be the second fallacy that permeates this article.
From the kind of indirect evidence the author marshals in sup-
port of her hypothesis, she is clearly implying that the observation
of developmental behaviors, in both primates and humans, pro-
vides reliable information regarding the evolution of these behav-
iors. The assumption made by Haeckel in 1866 that ontogeny “re-
capitulates” phylogeny seems to be the reason Falk devotes a good
half of her paper to reviewing the abundant literature pertaining
to early language learning and to apes’ maternal behavior. Al-
though the latter may have changed over time in response to en-
vironmental conditions and may even be susceptible to cultural
variations (Van Schaik et al. 2003), the former cannot yield any
clues regarding the origin of language. To think that the ontogenic
development of language learning can be a window on the evolu-
tion of language as such is not a tenable option. The idea that on-
togeny (the growth of an embryo) recapitulates phylogeny (the
evolutionary history of a species) has long been discredited. Yet
Haeckel’s notorious biogenetic law still provides a powerful
metaphoric model to which Falk bears witness in her conclusion:
“Motherese... is the only available model for elucidating how hu-
mans universally acquire spoken languages today, and therefore
may have acquired them in the past” (sect. 4, para. 1, emphasis
Falk’s). This statement is either trivial (all children learn their na-
tive languages along the same developmental steps now as they
did in the past) or a play on words (“acquire” does not have the
same meaning in the premise as in the conclusion). This is not a
robust argument.
Would only going back in time, as Falk rhetorically suggests
in her final paragraph, enable us to discover the “true” origin of
language and settle the debate? This is unlikely. We are con-
temporary of many evolutionary processes that we experience
and observe without understanding them in spite of investing con-
siderable resources to solve the problem of their true nature. The
conundrum of language origins is only one aspect of our ignorance
of the very ontology of language. This does not mean that we
should not keep trying to formulate hypotheses, notably regard-
ing interdependent, multilinear evolutionary factors.
Bipedalism, canine tooth reduction, and
obligatory tool use
C. Loring Brace
Museum of Anthropology, University of Michigan, Ann Arbor, MI 48109.
clbrace@umich.edu
Abstract: Bipedalism in the earliest hominid specimens is always accom-
panied by the reduction of projecting canine teeth. Body size is smaller
than chimpanzees or humans, but molar teeth are markedly larger. Use of
a pointed stick for defensive purposes on the one hand, and digging for
USOs on the other, may be why bipedalism was selected for. Passing such
learned behavior to the next generation may have played a role in select-
ing for language.
There is another aspect of the circumstances associated with the
adoption of a bipedal mode of locomotion that may well have con-
tributed to the development of the linguistic realm. Hominid
bipedalism is slow, and our early bipedal relatives were relatively
small of size (Hartwig-Scherer 1993). The survival of small, slow-
footed hominids on the African savannas or the adjacent open
woodlands would only have been possible if they had possessed a
means of defense which transcended that of the other primates
that are found in similar kinds of habitats. Baboons defend them-
selves with truly formidable canine teeth, but the early hominids
had canine teeth that did not project beyond the occlusal surface
of the rest of the teeth in the dental arch.
The late Sherwood Washburn made the observation that if the
baboon were to employ a digging stick to assist in foraging, it could
nearly double its food-getting efficiency (Washburn 1959; 1960).
This may very well have been the key that allowed the early ho-
minids to compete successfully with baboons and warthogs for
survival in the African savannas during the Pliocene. It has been
noted that “the digging stick redirected is a more effective defen-
sive weapon than even the formidable canine teeth of the average
male baboon” (Brace 1995). The need for carrying such a dual-
purpose tool may well have been the selective force that led to the
development of habitual bipedalism, as it is awkward at best for a
quadruped to move effectively from one place to another when
one forelimb is carrying an item essential for survival and is there-
fore unable to play a role in support.
It has recently been said that “Canines don’t just fade away, they
must have been actively reduced by natural selection,” yet no sce-
nario for their reduction by selection has been suggested (Deacon
1997). However, the case has been made that traits will “just fade
away” if they are not maintained by selection. Darwin offered a se-
ries of examples in the Origin of Species (1859, pp. 134–49, 454).
Forty years ago, I labeled the process the “probable mutation ef-
fect” (Brace 1963; 2000, Ch. 5). This is fully compatible with “neu-
tral theory” in molecular biology (Brace, in press; Kimura & Ohta
1969).
The australopithecine life history pattern as shown by tooth root
formation is more like that of a chimpanzee than that of a modern
human being (Smith 1992). Newborns, then, may well have had
the aspects of greater maturity characteristic of chimpanzee
neonates, and also may well have been able to cling to the mater-
nal fur. It has been said that “it is not inconceivable that the first
step across the symbolic threshold was made by an australop-
ithecine with roughly the cognitive capabilities of a modern chim-
panzee” (Deacon 1997, p. 340), and with a tiny “vocabulary” of 5
or 10 words and only two or three types of combinatory rules like
toddlers’ syntax (p. 41). Those beginnings may very well have been
analogous to what is being called “motherese.” From that point on,
the adaptive value of symbolic expression can very easily be seen
as the selective force that led to the increase in brain size. The co-
evolution of language and brain size follows from there to the
point where language as we know it characterizes all human
groups (Deacon 1997, Part 3).
Oldowan tool-making australopithecines were evidently scav-
enging in the Late Pliocene in Africa about 2.5 million years ago
(Hay 1976; Shipman 1986). By the Early Pleistocene, just under
1.9 million years ago, the Oldowan toolmakers were practicing
what has been called “persistence hunting” (Bortz 1985). Two ma-
jor changes in the australopithecine body made this possible, one
clearly documented and the other surmised on reasonable
grounds. The first was the achievement of the body proportions of
recent members of the genus Homo. Early Pleistocene body pro-
portions are remarkably similar to those of living humans (Ruff &
Walker 1993). The other change, which we can infer but cannot
prove, is the loss of the normal mammalian fur coat, presenting a
bare and sweat-gland-endowed skin to the atmosphere. If, as we
guess, our early hunters were engaging in persistence hunting,
then there should have been strong selection for developing the
means of dissipating metabolically generated heat. We know that
humans today have more capacity for sweating than any other
mammal (Macfarlane 1976, p. 185), and Falk herself has shown
that a mechanism for cooling the contents of the cranium was
clearly evolving between the time of the late australopithecines
and that of the early members of the genus Homo (Falk 1990).
The emergence of that toolmaking and hunting member of our
own genus also saw a major increase in brain size, putting it about
halfway between that of the chimpanzee and the human ranges
(Begun & Walker 1993; Vekua et al. 2002). Rate of maturation was
also about halfway between the ape and the human condition
(Smith 1993). Homo was now living the life of a facultative carni-
vore that had spread out of Africa and across the warmer portions
of the entire Old World. As with that other highly mobile member
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 507
of what Alan Walker has called the “large carnivore guild,” Canis
lupus, the wolf, there should have been mate exchange between
adjacent groups throughout the entire expanse of hominid occu-
pation: that is, no isolation and no speciation. The advantages for
symbolic communication in a creature so poorly endowed to be a
carnivore had to constitute a considerable force of selection. How-
ever, the chances are implausibly remote that more than one
species of hominid undertook to pursue a way of life that is so
wildly atypical for a primate. Now as to why there is no hint of the
beginnings of symbolic usage in any other species in the world, it
may well be because not one of them uses tools invented by pre-
vious generations as elements essential to their survival.
Hominin infant decentration hypothesis:
Mirror neurons system adapted to subserve
mother-centered participation
Stein Braten
Department of Sociology and Human Geography, University of Oslo,
Blindern, N-0317 Oslo, Norway. stein.braten@sosiologi.uio.no
Abstract: Falk’s hominin mother-infant model presupposes an emerging
infant capacity to perceive and learn from afforded gestures and vocaliza-
tions. Unlike back-riding offspring of other primates, who were in no need
to decenter their own body-centered perspective, a mirror neurons system
may have been adapted in hominin infants to subserve the kind of
(m)other-centered mirroring we now see manifested by human infants
soon after birth.
A necessary condition for the selective advantage and protolan-
guage emergence and propagation specified by Falk may have
been an emerging infant capacity to perceive, understand, and
learn from the gestures and vocalizations afforded by the vigilantly
attending mothers. Pertaining inter alia to meaning identification,
acquisition, and propagation (sect. 3.2.1), I propose this hominin
infant-decentration hypothesis: Compensating for the loss of the
body-clinging advantage that enables offspring of other primates
to perceive and learn without having to transcend the body-cen-
tered perspective shared with the carrying mother, those hominin
offspring able to learn to cope and take care by (m)other-centered
perception of distal vocalizing and gestural articulation would
have had a selective advantage and a contributing impact.
A neurosocial support system has been discovered that may
have lent itself to subserve such an emerging capacity. Rizzolatti
and Arbib (1998) have identified a mirror neurons system in the
modern chimpanzee and in the human brain (see also Stamenov
& Gallese 2002), and I have suggested that this system has been
adapted to subserve infant learning by other-centered perception
in human interaction (Braten 2000; 2001; 2002; 2003a).
Comparative studies of infant-adult interaction in humans and
chimpanzees. In conjunction with the pertinent comparative
findings referred to in the target article, the virtual absence of pro-
longed eye-contact in chimpanzees, as stressed by Bruner (1996,
p. 163, with reference to Savage-Rumbaugh et al. 1993), should
be mentioned. Having compared for a decade infant-adult inter-
actions and infant-carrying modalities in humans to those in chim-
panzees, I can confirm this, at least as pertaining to the chim-
panzees I have studied in a southern Norway zoo and wildlife park.
When clinging to the mother’s back, offspring of great apes learn
to orient themselves in the world in which they operate from the
carrying mother’s stance. Moving with her movements, they may
even be afforded the opportunity to learn by copying her move-
ments (perhaps in the way that Byrne [1998] terms “program-level
imitation”) without having to transcend their own (egocentric)
body-centered perspective. In my periodic studies of captive
chimpanzee-offspring relations, I have observed how an infant,
when old enough to cling to its mother’s back, not only bodily
moves with her movements but often adjusts its head to the
mother’s movement direction, thereby appearing to be gazing in
the same direction as the mother. When a mother holds the infant
in front of her for grooming (which adults more often do from be-
hind one another), a sort of face-to-face situation is established,
but not for the kind of reciprocal interplay entailing mutual gaz-
ing and gesticulation that we observe in human infant-adult pairs.
1
Before the invention of baby-carrying facilities (attributed by
Leakey 1995, p. 94, to early Homo erectus), hominid species may
have been faced with extinction when turning bipedal, I have sub-
mitted, if their young offspring were unable to listen and learn to
cope and take care by (m)other-centered mirroring and participa-
tion (Braten 2000, p. 275). Such a capacity is at play in early hu-
man ontogeny.
On the ontogenetic path to verbal conversation. Regardless of
whether they are “hardwired” to process speech and sign language
(sect. 3.2.1), human newborns demonstrate a readiness to mirror
facial expressions and gestures (Kugiumutzakis 1993; 1998; Melt-
zoff & Moore 1977; 1998), and young infants’ impressive speech
perception may entail an innate perceptual-motoric link (Kuhl
1998, p. 306). In contrast to the Piagetian attribution of an ego-
centric point of departure for children’s development of language,
requiring decentration as the child matures, we believe we have
now found evidence of infant capacity for altercentric mirroring
and self-with-other resonance soon after birth (Braten 1998; Stern
2000; Trevarthen 1998), facilitating the ontogenetic path to
speech in the culture into which the infant is born. This path com-
prises inter alia these steps: The first vocal imitation of /a/ in the
first hour of life (Kugiumutzakis 1998), as well as mutually attuned
protoconversation in the first months of life (Trevarthen 1974;
1990; 1998), and speech perception entailing that by age 6 months
the infant has already begun to “turn a deaf ear” to sound distinc-
tions that make no sense in the ambient language (Kuhl 1998).
This is soon followed by the babbling onset of well-formed sylla-
bles and production of vowels approaching those of the native lan-
guage, coinciding with joint attention and acknowledgment of
self-other agency at about age 9 months (Akhtar & Tomasello
1998; de Boysson-Bardies 1999; Hobson 1998; Locke 1993;
Tomasello 1999a). Such steps are precursory and supportive of
verbal conversation to come with its reciprocal and participant
characteristics. Not only may the speaker coprocess his own pro-
duction from the listener’s stance (in line with Mead’s [1934] no-
tion of anticipatory response). The listener may co-articulate the
speaker’s production as if she or he were a coauthor, as predicted
by Liberman’s (1957; 1993) motor theory of speech perception,
and by Braten’s (1974; 2002) simulation-of-mind model of con-
versation. Such virtual coarticulation from the other’s stance is
manifested when a listener completes the speaker’s aborted sen-
tence or answers a half-spoken question, supported by the capac-
ity for other-centered mirroring and resonance that we see at play
in protoconversation and response to motherese (Braten 1988;
2003b; Stern 2000; Trevarthen 1998).
Neurosocial support. In the primate neurobiology there ap-
pears to be a ground for systems that could have lent themselves
to adaptation for decentration in the genus Homo to subserve such
(m)other-centered mirroring. Mirror neurons have been found to
discharge in the macaque brain both when another is observed
grasping a morsel and when the monkey itself is grasping the
morsel (Di Pellegrino et al. 1992). Referring to evidence of a mir-
ror-neurons system in the human brain, Rizzolatti and Arbib
(1998) suggest its possible support of the first primitive dialogue,
and I have predicted that such a system would be found to sub-
serve learning by other-centered perception and participation,
and will be found to be impaired in subjects with autism (Braten
1998; 2002). Entailing an allocation that comprises Broca’s area,
which is activated not only upon speech but also upon (imagina-
tion of) hand rotation, such an adapted mirror-neurons system
may thus pertain to the phylogenesis and sociogenesis of both spo-
ken and sign language (see target article, sect. 3.2.1) by subserv-
ing virtual (other) participation (Braten 2003a; 2003b) in the per-
formance of observed instructors and partners.
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
508 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
NOTE
1. When chimpanzee infants, however, are nursed by human caretak-
ers and sensitized to face-to-face interaction with humans, they appear
able, as Bard (1998) has shown, to imitate human facial models of certain
gestures. I have a video record of a chimpanzee infant (age 39 days) en-
gaging in a sort of turn-taking vocal interplay with his foster parent, but I
have never observed this in infant-adult chimpanzee interaction. Further,
although adult males in the wild may interact relatively little with infants
(see target article, Note 1), the captive males I have observed sometimes
do. For example, a Beta male is sometimes used as a baby-sitter by one of
the mothers when she goes off in search of food (Braten 2000, p. 282). In
any event, never have I observed prolonged facial eye-to-eye contact be-
tween infants and adults, males or females.
Prosody does not equal language
Robbins Burling
Department of Linguistics, University of Michigan, Ann Arbor, MI 48109.
rburling@umich.edu http:// www-personal.umich.edu/ ~rburling
Abstract: Prosody, in motherese as in all forms of language, has a very dif-
ferent form and a very different use than the central lexical, phonological,
and syntactic components of language. Whereas the prosodic aspects of
motherese probably derive from primate vocalization, this does not help
us to understand how the more distinctive parts of language emerged.
Dean Falk makes a strong argument that human motherese be-
gan with affective vocalization and that “the use of prosody in hu-
man maternal speech is similar to the use of vocal signals by some
nonhuman primates” (Fernald 1994; quoted and given emphasis
by Falk). Even though, as Falk makes clear, neither chimpanzee
nor bonobo mothers engage in much infant-directed vocalization,
I am sympathetic with her argument that the prosodic component
of human infant-directed (ID) speech shows continuity with pri-
mate communication and I find it plausible that ID vocalization
could have formed an important bridge between primate and hu-
man communication. Among other things, the early development
of ID vocalization in phylogeny could help to solve the puzzle of
how vocal /auditory language, rather than a manual/visible rival,
became dominant. Most primates have much better voluntary
control over their hands and arms than over their mouths and
tongues. This should have given a decisive head start to a manual
language. If voluntary control over the vocal organs had already
been achieved with the help of such things as ID vocalization, a
vocal language might have been viable from the very start.
Most of Falk’s article is concerned with very early forms of par-
ent-infant communication, and I am in general agreement with
her discussion. I feel less comfortable with the sections of the pa-
per, starting with 3.2, where Falk seeks to relate motherese to lan-
guage. If Falk is right, the earliest forms of human ID vocalization
had none of the specifically linguistic features that have been so
difficult to account for in an evolutionary framework. Present-day
motherese makes use of the same words, combinatorial phonol-
ogy, and heirarchical syntax that we find in other linguistic styles;
it is set apart primarily by its characteristic prosodic features.
Falk’s hypothesized prelinguistic ID vocalization has prosodic fea-
tures of the sort found in modern human motherese but lacks its
lexical, phonological, and syntactic features. To say “Over time,
words would have emerged in hominins from prelinguistic melody
(Fernald 1994, p. 65) and become conventionalized” (sect. 3.2.1)
seems to beg the question. Just how would this emergence have
come about?
Tone of voice, the ability of the voice to convey such emotions
as joy, excitement, and anger, and the soothing tones of motherese
are important uses of prosody, and I find it reasonable to see them
as emerging from (and still, I believe, belonging to) a primate call
system. However, this prosody lacks the system of contrastive
phonology that is characteristic of language. As with other kinds
of human and animals calls, the referential potential of prosody is
more limited than that of words. Prosody is better at conveying
emotions, whereas words are better at reference. To be sure,
prosody has become deeply entangled with contrastive phonology
in modern languages, but they do remain easily distinguishable.
Parents have no trouble extracting their infant’s first words from
the abundant primate vocalizations that they have been listening
to since the baby’s birth. Some features of that vocalization will
forever accompany their child’s language in the form of prosody.
In other words, prosody has both a different form and different
functions than phonology or the lexicon, and it is the new form and
functions of language that need to be accounted for if we are ever
to understand how it emerged in phylogeny. I continue to think
that the best way to understand what happened is to conclude that
“tone of voice [along with the other aspects of prosody] amounts
to an invasion of language by something that is fundamentally dif-
ferent” (Burling 1993, p. 30). We ignore the most interesting and
difficult parts of the puzzle if we take for granted that all of lan-
guage somehow emerged from prosody.
I seem to have failed to make myself clear in my 1993 article,
for Falk is not the first person to conclude that I believe in the sud-
den emergence of language. In that article, I did express deep
skepticism about finding the origins of language in a call system,
but such skepticism need not imply that language emerged sud-
denly. One could believe, and I do believe, that language emerged
very gradually from something other than a primate call system.
Human cries, laughs, and screams, after all, constitute a fine pri-
mate call system – the call system of the human primate – and nei-
ther our own calls nor the calls of other primates show the degree
of continuity with language that we might expect if language had
emerged from a call system. Falk is right that the phonetic aspects
of motherese are derived from primate vocalizations. Sadly, this
tells us very little about the origin of the most distinctive parts of
language: contrastive phonology, syntax, and the lexicon.
Early hominins, utterance-activity, and
niche construction
Stephen J. Cowley
Department of Social Sciences and Humanities, University of Bradford,
Bradford BD7 1DP, United Kingdom; Psychology, University of KwaZulu-
Natal, South Africa. s.j.cowley@bradford.ac.uk
Abstract: Falk’s argument takes for granted that “protolanguage” used a
genetic propensity for producing word-forms. Using developmental evi-
dence, I dispute this assumption and, instead, reframe the argument in
terms of behavioral ecology. Viewed as niche-construction, putting the
baby down can help clarify not only the origins of talk but also the capac-
ity to modify what we are saying as we speak.
Invoking “protolanguage,” Falk uses cross-primate comparisons
to speculate on how hominins set off toward full-fledged language.
Putting the baby down, she suggests, prompted words to arise in
response to alterations in mother-infant interaction. Use of a com-
parative method allows due weight to be given to the multi-
modality of this “utterance-activity.” Instead of emphasizing the
prosodic, however, Falk’s argument stresses conventionalized
events. Rejecting this focus on “words” and protolanguage, I use
behavioral ecology to reframe the thesis. Stronger arguments arise
if caregiver-infant interaction is seen in terms of “niche construc-
tion” (Laland et al. 1999).
Taking the folk view that words distinguish us from chimps and
bonobos, Falk posits a “genetically driven propensity to produce
natural protolanguage” (sect. 3.2.1, para. 7). Did this exist? First,
as no other species exploits simple language, words may owe more
to brain-culture coevolution than to genes (Deacon 1997). Sec-
ond, intention attribution is crucial in learning to talk because,
without sympathetic others, infant vocalizations make little sense.
In Dennett’s terms (1987), taking an “intentional stance” may be
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 509
no less necessary to early talk than infant design (Spurrett & Cow-
ley 2004). Third, not only may babies lack genetic propensities for
word production but persons, not brains, seem to sustain early
speech. As neural systems self-organize, infants come to control
action and perception in ways that prompt vocally mediated in-
teraction. Generally, then, Falk’s argument is weakened by the un-
supported claim that word-based protolanguage emerged from a
genetic propensity. Other problems also arise. Above all, Falk links
infant-directed speech to conventional form-based meanings
rather than to interpersonal, affective events. By making prehis-
toric talk sign-based, protolanguage becomes a matter of produc-
ing and recognizing speech acts. However, unless communication
draws on interpersonal events, syllabic invariants are likely to be
products of an individual’s recurrent affective states. In modern
infants, this is not what occurs. Rather, words arise from iconic-in-
dexical events that integrate activity between persons and across
modalities (Cowley et al. 2004). Finally, Falk’s appeal to ontoge-
netic and phylogenetic parallels is often not persuasive. If, say,
phonological and semantic bootstrapping occur in ontogenesis,
they rely on producing formally consistent meanings. By defini-
tion, however, form-based processes cannot precede protolan-
guage.
Many reject the view that species differences depend on words.
Neither Chomsky’s recent work (Hauser et al. 2002) nor that
based on Wittgenstein invoked genetic propensities to explain ver-
balizations. Whereas Taylor (1997) and Shanker (2001) posited no
inner linguistic mechanisms, Hauser et al. (2002) has hypothe-
sized that “most, if not all” verbal aspects of language use “mech-
anisms shared with nonhuman animals” (p. 1573). For both sets
of theorists, what sets language apart is a human capacity for off-
line modification of utterance-activity. Hauser et al. (2002) ap-
pealed to a neurally based mechanism for “recursion” and Taylor
(2000) emphasized our capacity to talk about talk, or “linguistic re-
flexivity.” Remarkably, both sets of theorists agree that what mat-
ters is that, in the course of speaking, we modify what is uttered.
It follows therefore that (nonverbal) Ur-language emerged as ho-
minins extended bodily expression. Wittgensteinians and Chom-
skyans concur that no specialized genetic propensities are needed
to sustain simple vocal-production. While disagreeing about how
to explain off-line modification, they agree that nonhumans share
social mechanisms used in language. In defending a continuity
view, Falk addresses the wrong target. The folk mislead us: Even
if words are unique, they are not the taproot of language.
Given emphasis on multimodality, Falk’s argument can be re-
framed in terms of the origins of utterance-activity or Hauser et
al.’s (2002) “language faculty-broad sense.” Putting the baby down
changed ecology in line with both bipedalism and neonates’ en-
larging brains. The thesis, then, sustains the view that joint be-
havior is shaped by mother-infant interaction. In phylogeny, as
Wray (1998) argued, this may have used holistic vocal (and, pre-
sumably, other) patterns. Like social grooming (Dunbar 1996), ut-
terance-activity may have come to dominate social coregulation.
Then, as now, in Fernald’s (1993) terms it may have “engaged and
persuaded” infants by inducing “subtle changes in emotions and
intentions” (p. 80). If so, instead of appealing to ontogenetic and
phylogenetic parallels, we can ask how interactional events give
rise to cognitive outcomes. With Laland et al. (2000), putting the
baby down may have led to “choices, activity, and metabolic
processes” (p. 132) that influenced natural selection through
“niche construction.” The newly created niche altered both ma-
ternal vigilance and the epigenetic processes that affect how in-
fants attend and respond to multimodal expression. As infants be-
came sensitive to the mother’s appraisal of circumstances, there
would have been a partial decoupling of expression from affect.
Real-time feedback could shape the mother-infant relationship
and, by extension, the evolution of development. With Owings and
Morton (1998), “assessment” would drive an arms race which en-
sured that increasingly more differentiated expression was being
used to “manage” infants. Utterance-activity began to exploit Ek-
man (1972) and Fernald’s (1993) invariants as well as the micro-
temporal dynamics of infant-caregiver play (Bateson 1979; Stern
1977). As joint events became affectively coregulated, vocal power
and sensitivity increased. In this view, the ability to use words de-
pends not on genes but on mutual adjusting that is supported by
neurodevelopmental change.
Niche construction allows putting the baby down to be seen as
helping prosody and gesture take on new affective, cognitive, and
practical roles. Social learning may have used behavioral ecology
to reshape both intrinsic motive formation (see Trevarthen et al.
1999) and perception-action systems (Preston & de Waal 2001).
Study of this natural history can throw light on, say, coregulation
(Fogel 1993), interactional synchrony (Condon & Sander 1974),
emotional contagion (Hatfield et al. 1994), accommodation (Giles
et al. 1991), and real-time understanding (Cowley 1998; Gumperz
1996). Reframed in terms of niche construction, Falk’s argument
can promote new thinking about language. Not only does it allow
for skepticism about the role of words in Ur-language, but it
prompts us to ask how joint behavior induces belief in verbal en-
tities. Beyond that, there lies a harder question: Is consilience pos-
sible between seeking the taproot of language in neural capacities
for recursion and viewing reflexivity as the product of how infants
participate in – and talk about – utterance activity?
Continuity, displaced reference,
and deception
Lee Cronk
Department of Anthropology, Rutgers University, New Brunswick, NJ 08901.
lcronk@anthropology.rutgers.edu
http:// anthro.rutgers.edu/ faculty/cronk.shtml
Abstract: Falk’s contribution to a continuity theory of the origins of lan-
guage would be complemented by an account of the origins of displaced
reference, a key characteristic distinguishing human language from animal
signaling systems. Because deception is one situation in which nonhumans
may use signals in the absence of their referents, deception may have been
the starting point for displaced reference.
Falk’s interesting and persuasive argument that human language
was built, at least in part, upon a substrate of infant-directed com-
munication is framed in terms of the contrast between continuity
and discontinuity theories of the origin of language. However, un-
less we resort to saltationism, a choice between continuity and dis-
continuity is as false in the study of language origins as it is in any
evolutionary scenario. Although examination of the end points of
any episode of divergence will create the appearance of disconti-
nuity, gradual change is the only plausible scenario within a Dar-
winian framework.
This is not to say that evolution’s gradual, continuous, and in-
cremental nature means that “differences between human lan-
guage and nonhuman primate communication are only quantita-
tive” (King 1996, p. 193). Even a gradual process can result in
important qualitative differences over time. Human language dif-
fers from nonhuman signaling systems in a variety of ways. Falk
shows that infant-directed communication is likely to have had a
role in bridging that gap, and King (1996) has provided a similarly
plausible gradualist account of the origins of syntax. Another key
difference between nonhuman signaling systems and human lan-
guage is displaced reference – that is, the ability to refer to things
and to understand references to things that are absent. Unlike hu-
mans, nonhumans can use their signaling systems to discuss only
things that are currently in evidence: “There is a predator nearby,”
“Here is a food source,” and so on. Although they can signal the
presence of, say, a snake, they cannot use that signal as the start-
ing point for a discussion about snakes or as a way to teach their
young about the dangers of snakes. They can express their own
hunger, but they cannot have a conversation about the problem of
hunger while their own bellies are full.
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
510 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
A gradualist account of the origins of displaced reference might
start with the observation that the only circumstance in which
nonhumans send signals in the absence of the referent is when
they are engaging in deception, such as when birds send false
alarm signals in order to frighten competitors from a food source
(Munn 1986). Of course, in order for our ancestors to have been
able to discuss things not in evidence, the receiver of the signal
would have had to be clued into the trick, which would preclude
actual deception. Perhaps the line was crossed when two individ-
uals formed a coalition to deceive another, enabling the coalition
members to share an understanding that a signal was to be used
independent of its referent. Once it was established that a signal
could be used without its referent being present, it would have
been a relatively short step to real displaced reference, uncon-
nected to deception. Although it is a very long way from coalition-
based deception using signals to human language as we now know
it, perhaps this was how the transition from an animal signaling
system to human language began (see Wray [2002] for more on
the evolution of displaced reference). As Knight (1998a; 1998b;
see also Knight et al. 1995) has pointed out, such a scenario would
require high levels of trust among coalition members. This might
have been facilitated by kinship and, in line with Falk’s scenario,
a signaling system rooted in the trustworthy soil of motherese and
its precursors.
Whether displaced reference has its origin in coalitional de-
ception or somewhere else, one thing is certain: Only a continuity
theory of the origin of human language can account for this or any
other discontinuity between it and nonhuman signaling systems.
ACKNOWLEDGMENTS
I thank William Irons, Beth Leech, and Lars Rodseth for their comments
on a draft of this commentary.
Syntax: An evolutionary stepchild
Danielle Dilkes and Steven M. Platek
Department of Psychology, Drexel University, Philadelphia, PA 19102.
Danielle.Dilkes@drexel.edu steven.m.platek@drexel.edu
http:// psychology.drexel.edu/platek.htm
Abstract: Dean Falk has strategically explored “mother-infant gestural
and vocal interactions... in chimpanzees and humans” in order to offer
hypotheses “about the evolutionary underpinnings that preceded the first
glimmerings of language.” Though she offers compelling evidence for
many interesting hypotheses as to the epigenesis of language, other possi-
bilities have yet to be explored. Here we explore the role of gestural com-
munication among deaf signers and the neural correlates associated with
this type of communication.
In her article Prelinguistic evolution in early hominins: Whence
motherese?, Dean Falk strategically explores “mother-infant ges-
tural and vocal interactions... in chimpanzees and humans” in or-
der to offer hypotheses “about the evolutionary underpinnings
that preceded the first glimmerings of language.” Though she of-
fers compelling evidence for many interesting hypotheses as to the
epigenesis of language, other possibilities have yet to be explored.
One such possibility is whether the structure/syntax of the lan-
guages we use today was molded to best fit a preestablished cor-
tical organization for linguistics and the related tasks, and, if so, is
this organization modality dependent? Is linguistic structure/syn-
tax a function of the organization of the left-hemisphere? Is lan-
guage innate; can it be evolutionarily traced? If so, what implica-
tions does this have in the ever-present question of the evolution
of language?
We know from existing literature and in vivo studies that non-
human primates communicate using gestures, a type of “signed
language,” and that humans for the most part communicate using
a spoken language. The primary difference between signed and
spoken language is that sign relies “on spatial contrasts while
speech is linear and non-spatial” (Goldin-Meadow 1999). In ver-
bal communicators, a lesion to the left hemisphere usually pro-
duces deficits on linguistic tasks, whereas damage to the right
hemisphere usually produces deficits in spatial tasks. Similarly,
when human nonverbal communicators sustain damage to the left
hemisphere, they perform more poorly on linguistic tasks but do
not exhibit the same spatial deficits that signers with right-hemi-
sphere damage do. The implications of these findings are that in
humans, sign seems to be processed as linguistic information
rather than spatial information; therefore implicating the left
hemisphere in linguistics, regardless of transmission (Goldin-
Meadow 1999).
When deaf children of nondeaf parents are not taught to sign
and have not acquired speech because of their hearing impair-
ment, they independently create a system of gestural communi-
cation that takes on a structure similar to that of spoken language
and is consistent across cultures (Goldin-Meadow 1999; Goldin-
Meadow & Mylander 1998). A possible explanation for why deaf
children create linguistically oriented gestures and hearing chil-
dren do not, may relate to the notion that gesture needs to take on
grammatical properties only when it has to carry the full burden
of communication. When used in conjunction with speech, ges-
ture does not have to convey (all) meaning; therefore, it does not
assume a language-like form (Goldin-Meadow 1999).
A cortical region implicated in nonverbal communication is the
superior temporal sulcus (STS). When congenitally deaf signers
and hearing expert signers are presented both with sign language
and with nonmeaningful gestures, activation of the STS was noted
(Allison et al. 2000). Furthermore, while viewing American Sign
Language sentences, those who are unfamiliar with the language
showed no activation of the STS. These results are indicative of
the STS’s role in the perception of ASL. Further support of this
hypothesis can be seen when studying monkeys. “In monkeys, re-
sponsiveness of STS cells was greater to a hand making a move-
ment than to a bar of the same size making the same movement,
demonstrating that the cells are preferentially responsive to bio-
logical motion” (Allison et al. 2000; Rizzolatti & Arbib1998; 1999;
Rizzolatti et al. 1996; 2002). This applies to humans in that the cel-
lular organization of the STS may provide a predisposition for the
perception of communicative or meaningful hand gestures, but
not for meaningless hand movements.
The cortical response to the observation of action in both hu-
man and nonhuman primates is very similar and supports the
above findings. In humans, PET studies revealed that the obser-
vation of an action, such as grasping, activated the STS, the infe-
rior parietal lobule, and the inferior frontal gyrus (area 45); all ac-
tivation sites were limited to the left hemisphere (Rizzolatti &
Arbib 1998). The activation found in humans parallels that found
in nonhuman primates on similar tasks, thereby indicating “that,
in primates, there is a fundamental mechanism for action recog-
nition” (Rizzolatti & Arbib 1998). This is very interesting because
the stimuli used in these experiments were not tied to linguistics;
however, the findings may implicate “that this action-recognition
mechanism has been the basis for language development” (Rizzo-
latti & Arbib 1998).
These findings suggest that the left hemisphere may not simply
be responsible for language tasks, but for all linguistic tasks, in-
cluding the recognition and processing of multiple modalities of
communication – one of these modalities being gestural commu-
nication, from which it may be possible that language as we now
know it has evolved. In Rizzolatti and Arbib (1998), a notion is put
forth that the nonhuman primate homolog to the human cortical
area known as Broca’s is area F5 (the rostral part of the monkey
ventral premotor cortex). “The reasons for this view are that both
F5 and Broca’s area are parts of inferior area 6 and their location
within the agranular frontal cortex is similar; and cytoarchitecton-
ically, there are strong similarities between area 44 (the caudal
part of Broca’s area) and F5” (Rizzolatti & Arbib 1998).
The major difference in conceptualization of these two areas is
that Broca’s is commonly associated with speech, F5 with hand
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4 511
movements. However, it is only the dorsal part of F5 that is re-
sponsible for hand movements, not inclusive of the ventral part,
which is representative of mouth and larynx control, the prereq-
uisites for speech (Rizzolatti et al. 1998). Furthermore, PET stud-
ies (such as the ones mentioned above) have implicated Broca’s
area in action-recognition of certain hand movements (e.g., Riz-
zolatti et al. 1996). It is possible that this hand movement recog-
nition was the precursor to the recognition of meaningful hand
movements (e.g., pointing or indicating danger), which are also
processed in the left hemisphere. Furthermore, these meaningful
hand movements are a basis for communication, a gestural com-
munication that, coupled with the development and evolution of
controlled mouth and larynx movements, could have evolved into
a verbal communication that we use today.
ACKNOWLEDGMENT
The authors thank Tom Myers for help in preparing this commentary.
Motherese is but one part of a ritualized,
multimodal, temporally organized,
affiliative interaction
Ellen Dissanayake
Walter Chapin Simpson Center for the Humanities, University of Washington,
Seattle, WA 98195. edissana@seanet.com
Abstract: Visual (facial), tactile, and gestural, as well as vocal, elements of
mother-infant interactions are each formalizations, repetitions, exaggera-
tions, and elaborations of ordinary adult communicative signals of affilia-
tion – suggesting ritualization. They are temporally organized and enable
emotional coordination of the interacting pair. This larger view of moth-
erese supports Falk’s claim that the social-emotional elements of language
are primary and suggests that language and music have common evolu-
tionary foundations.
Falk’s article emphasizes the important roles of visual, gestural,
and tactile signals to infants, in addition to the vocal aspects that
have been the primary locus of language origin studies. Her argu-
ments about the importance of sociality and affect in mother-in-
fant prelinguistic interchanges would be strengthened if they also
incorporated provocative evidence that in the interactions these
multimodal behaviors are temporally coordinated. If mothers
“[modify] their vocal and gestural repertoires to shape and con-
sciously control” infant behavior (sect. 3.2.1), it can be pointed out
that shaping and controlling are temporal processes.
Infants are born prepared to engage in temporally organized in-
teractions (Trevarthen 1997; 1999). Desynchronization experi-
ments reveal that infants as young as 4- to 8-weeks old (Murray &
Trevarthen 1985) expect social contingency, defined as “interper-
sonal sequential dependency,” in which the behavior and affect of
both partners (as expressed in face, voice, and bodily movement)
are coordinated or “attuned” (Jaffe et al. 2001, pp. 13 –14; Stern
et al. 1985). When normal ongoing playful interaction via dual
video is experimentally desynchronized (i.e., the baby is presented
with a slightly delayed replayed recorded sequence of just-experi-
enced positive interaction with the mother), 6- to 12-week-old in-
fants show signs of psychological distress such as averted gaze,
closed mouth, frown, grimace, fingering of clothing, and the dis-
placement activity of yawning (Murray & Trevarthen 1985; Nadel
1996; Nadel et al. 1999). This emotional/behavioral coordination
is more than “social.” It is relational, and, like motherese (which
is but one element in the engagement), it has developmental ben-
efits and adaptive implications.
I have argued (Dissanayake 2000; 2001) that mother-infant in-
teraction is a ritualized behavior like those described by etholo-
gists (e.g., Eibl-Eibesfeldt 1989, pp.439– 40; Tinbergen 1952) for
other animals, in which behaviors from one context (here, ordi-
nary communicative indications of adult friendliness or readiness
for contact) are altered – simplified or stereotyped, repeated, ex-
aggerated, and elaborated – and take on new meaning in a new
context (here, mother-infant interaction). The “ritualized” facial
expressions of adults in interactions with infants typically include
widened eyes, raised eyebrows, and a sustained open mouth or
smile, all of which in their unritualized form indicate affiliation or
friendly intention. Gesturally, adults sharply bob back their heads
or nod rhythmically to infants, again presenting an exaggeration of
head movements that conventionally signal affiliation in adults.
Adults lean toward and away from an infant and give rhythmic
touches and pats – again, friendly human gestures that are also
common in many nonhuman primates. Vocalizations to infants by
human mothers, as Falk describes, are soft, breathy, undulant and
inviting, or soothing, with much repetition – that is, exaggerations
of nonthreatening and affiliative adult utterances.
These components of mother-infant interaction do not occur in
isolation, and they appear to be processed crossmodally (Schore
1994), as the pair co-create and share a common pulse and emo-
tional quality which Trevarthen and Malloch (2000) call “affecting
chains” or sequences of expression.
Ritualized, multimodal, temporally coordinated interactions
are important in their own right at 4 to 12 weeks of age, long be-
fore they are co-opted and altered further for didactic language-
learning purposes at age 5–8 months and later. Falk remarks (sect.
2.2) that ID speech contributes initially to emotional regulation,
then to socialization, and finally to the organization of speech. If
for “ID speech,” one substitutes “the package of ritualized behav-
iors, including temporal, dialogic, and emotional aspects,” one fur-
ther emphasizes the importance of the emotional (prosodic) ele-
ments of speech (phylogenetically and ontogenetically), and its
dialogic nature – overlooked aspects that Falk seeks to remedy.
Incorporating this additional evidence of the social-emotional
nature of the interaction also supports Falk’s suggestion that
motherese could have been a precursor to (or antecedent of) the
social grooming origin and function of language. It additionally
supports suggestions that music and language have a common
evolutionary foundation (Morley 2002).
Falk describes well in section 3 the anatomical changes in
bipedal, large-brained hominins that required new adaptive
strategies for the survival of relatively undeveloped infants. If
mothers made ritualized affiliative signals in several modalities to
their infants, they would concurrently reinforce affiliative circuits
in their own brain; infants in turn would respond affectively, dis-
playing their interactive lovability and thereby attracting maternal
care. Co-creating a dialogue within a common pulse would further
coordinate the affective state of the participants, promoting will-
ing maternal care (i.e., infant survival and maternal reproductive
success). Even today, neurobiologists describe the pathological ef-
fects to infants of defective interactive abilities of either infant or
mother (Aitken & Trevarthen 1997, Koulomzin et al. 2002; Schore
1994; Trevarthen & Aitken 1994) corroborating others’ findings
about the beneficial effects of mother-infant interaction.
I suggest that putting the baby down and interacting vocally at
a distance would have come, evolutionarily, after the establish-
ment of ritualized mother-infant interaction as described here.
The importance of face-to-face communication is evinced in “still
face” experiments with 2- to 9-month-old infants (Murray & Tre-
varthen 1985; Tronick 1989), in which an expressionless mother
provoked infant distress, and also in the prominence of mutual
gaze, a striking feature of mother-infant interaction in many if not
all cultures. Falk points out that “mothers unconsciously estab-
lish eye contact with infants and then use motherese to maintain
joint attention” (sect. 2.2). Actually, however, the capacity for
“sustained mutual visual regard” – normally a threat signal, al-
though it also appears in affiliative contexts in bonobos – is pre-
sent by approximately the second month (Beebe 1982, p. 171).
Accompanied by adult smiling and soft, repeated vocalizations,
mutual gaze in an infant’s early weeks accomplishes more than
joint attention. Some researchers consider face-to-face commu-
nication and/or mutual gaze critically important to subsequent
Commentary/Falk: Prelinguistic evolution in early hominins: Whence motherese?
512 BEHAVIORAL AND BRAIN SCIENCES (2004) 27:4
infant socioemotional development (e.g., Cohn & Tronick 1987;
Schore 1994).
These comments are meant not to challenge Falk’s original and
stimulating ideas, but, rather, to suggest other supportive avenues
for consideration and exploration. Future studies of the nature,
function, and origin of language would do well to recognize, as
Falk does, the importance of its social and emotional elements.
Chimpanzees are not proto-hominins
and early human mothers may not
have foraged alone
Agustín Fuentes
Department of Anthropology, University of Notre Dame, Notre Dame, IN,
46556. afuentes@nd.edu
Abstract: Modeling the evolution of human behavior, including language,
is a complex but important undertaking. The over-reliance on chim-
panzees as models to assess basal hominin patterns and the implicit as-
sumption that hominin mothers did not have significant assistance in car-
ing for young weaken this model for the emergence of language from
mother-infant vocal interactions.
This very interesting article proposes a scenario for the evolution
of human language via a form of vocal contact interaction between
hominin mothers and infants. Unfortunately, the hypothesis rests
firmly on a series of assumptions about hominin social organiza-
tion and behavior and anthropoid behavioral patterns that may not
be valid. Among these assumptions are that chimpanzees (genus
Pan) are the most appropriate models for understanding the be-
havior of hominins in the late Pliocene and early Pleistocene (3
million to 1 million years ago); that female hominins on the hu-
man lineage foraged alone; and that alloparenting, paternal care,
or other communal care was not a significant factor in human evo-
lution.
Falk uses observations of infant parking in nonhuman primates,
mostly prosimians and a few anthropoids (Fuentes & Tenaza 1995;
Ross 2001), to emphasize the potential costs of infant carrying in
difficult foraging situations. However, our observations of infant
parking in the colobine monkeys Simias concolor (Fuentes &
Tenaza 1995) and Presbytis potenziani(Fuentes 1994) may or may
not support the cost of foraging hypothesis. We proposed the cost
of infant carrying as a possible explanation of a rare behavior for
an anthropoid (parking), but also suggested that the parking of in-
fants may have been an antipredator strategy (ease of escape for
the mother) or, alternatively, a response to relaxed predation. Only
some females parked infants, and our observations were too lim-
ited to establish any clear relationships between the parking and
specific foraging strategies. Obviously issues of milk quality,
weight of infant, predation threat, allocare and cooperative care,
and activity patterns affect significantly infant parking in primates,
especially humans. In her overview of parking and carrying in pri-
mates, Ross (2001) suggested that in humans, nonhabitual carry-
ing of infants may be related to the availability of nonmaternal
caretakers.
For the basal component of the proposed hypothesis, Falk re-
lies on information from a few studies of wild and captive chim-
panzee, some “ape language” studies, and a very general concep-
tualization of late Pliocene/early Pleistocene hominins. Although
the exact timing of the lineage split between hominins and the an-
cestral lineage of the genus Pan is contended, most would agree
that it occurred in the vicinity of 6 to 8 million years ago. By at least
2.5 million years ago, the Bouri hominins (either Australopithecus
or Homo) were using stone tools and thus manipulating their en-
vironment in a way no other primate had (de Heinzelien et al.
1999). By the undisputed appearance of members of the genus
Homo, approximately 1.8 million years ago, dramatic anatomical
and, assumedly, behavioral changes appear evident in the fossil
record (Aiello & Wells 2002; Gabunia et al. 2001). Given this, one
should exert caution when making direct comparisons between
modern members of the genus Pan and modern members of the
genus Homo. In both these genera, locomotary patterns, brain
structures, group structure, and social interaction patterns have
diverged under varied selective pressures and trajectories. Chim-
panzees are not proto-hominins, and all of the hominins, although
sharing some behavioral patterns in common with Pan, may have
been encountering selectively different challenges (or at least
dealing with similar challenges in different ways). The relative
success of humans using language and broadscale extrasomatic
manipulation, versus Pan not using language and manipulating the
environment in diverse yet less complex ways, suggests that there
are some distinct evolutionary patterns at play. Obviously, due to
their relatively recent common ancestry, humans and chim-
panzees share much of their adaptive history, but in those aspects
that differentiate them (e.g., spoken language) we can expect that
the underlying patterns and evolutionary pathways might be dif-
ferent. It is also noteworthy that chimpanzees themselves display
remarkable diversity in behavioral patterns both within and be-
tween species (Boesch et al. 2002).
It is popular to model single female foraging as a baseline for
hominoid behavior (Wrangham 1979), and Falk (citing chim-
panzee researchers Stanford [1998] and Nishida [1968]) suggests
that hominin mothers traveled in the company of dependent off-
spring and a small number of other individuals. However, given
what we know from the fossil record and from comparative stud-
ies of hominoids, it is far from clear that adult female hominins,
especially early members of the genus Homo, foraged alone, or
even relatively alone, with their offspring (Aiello & Wells 2002;
Fuentes 2000; O’Connell et al. 2002). Mothers may have been ac-
companied by older children or related adults, thus siblings or
other kin may have played a role in infant care, and some individ-
uals may have stayed behind during foraging to care for depen-
dent young. Food may have been shared among group members
or there may have been some form of provisioning of mothers with
dependant offspring, or both. Unfortunately, we do not have clear
evidence about what types of nonmaternal care, if any, occurred
in the hominins on our lineage. A wide array of possible forage tar-
gets would have affected the patterns of foraging and thus the
placement of offspring relative to the mother or other caretakers
as well (Aiello & Wells 2002; O’Connell et al. 2002; Wrangham et
al. 1999). Using digging sticks to extract underground tubers,
stone tools to process plant and/or meat items, and picking and
transporting fruits or herbaceous matter over long distances all
have distinct implications for the positioning of a dependant child
and the relative impact it had on the mother or other caretaker. In
short, it is not at all clear that the simple foraging patterns assumed
by Falk as a baseline and driving factor in the putting the baby
down hypothesis did indeed characterize early humans.
It is also not clear that the aspects of behavioral variation sug-
gested as the raw material for selection to act on are as robust as
Falk proposes. Simply assuming that variation in the attention
mothers provided their infants acted as the “raw material” for se-
lection creates an overly simplistic, linear notion of natural selec-
tion. What are the variables for “attention” and what are the costs?
Can attention to infants really be treated as a trait independent of
foraging patterns, group demography, individual life histories, and
size, health, and behavior of the infant?
Not all of this critique is to say that the scenario proposed by
Falk is incorrect. It is an