ArticlePDF Available
[205--229] 17.7.2015 4:23PM
Part Three
and syntax
[205--229] 17.7.2015 4:23PM
[205--229] 17.7.2015 4:23PM
Babbling and words:
a dynamic systems
perspective on
Marilyn M. Vihman, Rory A. DePaolis
and Tamar Keren-Portnoy
10.1 Introduction
What is the developmental function of babbling in relation to language, if
any? How is it related to the child’s first words, and can this relationship
shed light on the highly controversial issue of the origins of grammar in
acquisition? Studies of both infant speech perception and early vocal
production have produced a wealth of findings over the past forty years,
but theoretical progress has been slow, with deductive ideas framed
within current theoretical models often diverging from empirical studies
based on observational and experimental studies.
Dynamic systems theory (Thelen & Smith 1994), with its emphasis on
the role of variability in developmental advances, on the independent
emergence of related skills as self-organizing catalysts for behavioural
change and on the deep interconnectedness between perception, action
and learning offers a promising perspective on early speech development.
While reviewing the empirical findings of studies of production and of
dynamic systems theory, this chapter will also consider the relationship of
those findings to dynamic systems theory.
10.1.1 The challenge: construction of a rst system
A central concern of the study of child language is to account for the
developmental sources of linguistic knowledge. In one influential approach
[205--229] 17.7.2015 4:23PM
to this problem, innately given Universal Grammar (or UG) is assumed to
provide the knowledge of linguistic structure that serves as the starting
point for language acquisition, leading to the basic question: what exactly
needs to be learned? (Peperkamp 2003). This must then be followed by the
question of the nature of the triggering process needed to establish the
specifics of a given language: how does the child recognize the critical data that
will make it possible to set the appropriate parameters, or to rerank constraints in the
appropriate way? (see, for example, Fikkert 1994, Lleo
´& Prinz 1997). For
approaches that assume no innate grammatical knowledge, such as the
constructivist approach (see Chapter 5 and Menn, Schmidt & Nicholas
2013), the questions are the converse: with what knowledge, if any, does the
child begin?, followed by the complementary question: how can the child gain
knowledge of linguistic structure or system?
The role of phonology in the development of linguistic knowledge is
often given short shrift by researchers interested in word learning (e.g.
Bloom 2000, Hollich, Hirsh-Pasek & Golinkoff 2000, but see Chapter 11);
similarly, researchers focusing on perceptual advances generally disregard
production. Yet before a child can begin to develop linguistic meaning or
make referential use of words he or she must be able to represent and
access word forms or phrases, which can then come to be associated with
recurrent situations, objects or events (see Zamuner, Fais & Werker 2014).
It also seems short sighted to assume that perceptual advances alone can
suffice to account for language learning. A long tradition of both diary and
planned observational studies has found wide individual differences in the
rate and pathway of emergence of word production and phonological
knowledge across children developing normally, even within the same
ambient language group (Vihman 2014); experimental group studies of
word recognition and learning provide little evidence regarding this criti-
cal aspect of phonological development since individual differences are
typically masked in the reporting of group results. It should be evident that
both lexical and phonological learning depend on the development of
representations that integrate perception and production (Vihman,
DePaolis & Keren-Portnoy 2014), yet this remains a central issue that has
so far attracted insufficient attention.
In this chapter we will adopt the second position identified above, which
looks for broad biological foundations to language but posits no specific
linguistic knowledge as part of that foundation. Following Braine (1994),
we will argue that, rather than innate knowledge of linguistic principles, a
powerful learning mechanism – combined with initially strong perceptual
abilities and the more slowly developing speech motor capacities – can be
identified as the source of the remarkable human capacity for language.
Pierrehumbert (2003: 118) proposed that the phonological system is
‘initiated bottom-up from surface statistics over the speech stream, but
refined using type statistics over the lexicon’. She does not elaborate on
the source of the lexical knowledge that supports the second cycle of
[205--229] 17.7.2015 4:23PM
statistical learning, however. We argue below that the missing link is
production experience, which brings the specific adult lexicon to which
the child is exposed into focus and into partial or incipient mastery,
leading, as Pierrehumbert says, to a new cycle of statistical learning
based on types, not tokens. We will suggest that, after an initial period of
perceptual experience of ambient speech, ‘intake’ from that experience is
boosted by the maturational emergence of vocal production of adult-like
syllables. We will demonstrate the role played by babbling practice in
supporting attention to and memory for first words, and we will argue
that those early words in turn provide a database for distributional learn-
ing, the proximal source of emergent phonological systematicity.
10.1.2 Dynamic systems theory (DST) and the origins of grammar
In general, developmental ideas have been scarce in the literature on
phonological acquisition, which has tended to draw instead on formal
models of adult language and to apply them in a deductive way to child
language patterns. Yet when we turn to Thelen and Smith’s (1994) devel-
opmental model, we find that the ideas have a remarkable degree of
correspondence with the empirical findings that have accumulated over
the past thirty-odd years of intensive study of infant speech perception and
production, despite the fact that those findings are outside the domain of
Thelen and Smith’s own research (although Thelen (1991) relates dynamic
systems ideas to the development of vocal production in the prelinguistic
A key dynamic systems idea is that we must examine process in order to
understand the origins of structure, which also means accepting variability
as the very stuff of development. As Thelen and Smith (1994) emphasize,
development seems orderly and stage-like when viewed from a distance;
upon closer inspection, however, the details of individual paths and tran-
sitions suggest exploration, the opportunistic seizing of possibilities,
momentary choices and their exploitation in the service of function or
use (for a case study that illustrates these points, see Vihman & Velleman
1989). In what follows we will provide a brief account of the process by
which babbling is transformed, through such an exploratory, ‘opportunis-
tic’ process, into the first word production.
Thelen and Smith (1994) report detailed studies of such motoric achieve-
ments as learning to walk or to successfully reach for objects, emphasizing
the variability and the self-organizing nature of each such development. As
will become apparent, the basic principles of development that Thelen and
Smith propose, based on those studies among others, are also fundamental
to our story. Out of the variability of behaviour or performance, within
a specific situational framework – the basis for each child’s individual
‘history of actions’– comes the regularity of lexical development, with
relatively accurate first word production followed by regression and
Babbling and words 209
[205--229] 17.7.2015 4:23PM
systematization. In what follows we will trace the emergence of one child’s
flexible word-production patterns, taking into account variability and
context. This process can be expected to be different for each child,
given the differences in individual histories of exposure, of ‘intake’ or
internalization of what is heard in input speech, of early vocal production
preferences or skills and of first word use.
The non-linearity emphasized by DST is found again and again in empiri-
cally grounded accounts of language acquisition as well as in other areas of
development (e.g. in morphology: Cazden 1968; for examples drawn from
early vocal development, segmental discrimination and final syllable length-
ening as well as word production, see Vihman 2014). The notion of a pre-
dictable succession of categorically distinct ‘stages’ is generally revealed, on
closer analysis, to be a false lead. Rather than progressive, step-by-step
advances we find temporary regressions as an effect of overgeneralization
or reorganization. We will outline the non-linearity of early phonological
development, in which the first, largely accurate word forms, described in
the next two sections, give way to a long period of template-based produc-
tion, which is less accurate but more systematic, reflecting the first steps in
the construction of a phonological grammar (see Section 10.4).
We will refer to the notion of ‘attractors’, which are central to DST: this
refers to modes of processing or production that are familiar to the infant,
whether, in the case of phonological development, through exposure to
high-frequency units or structures in the input or through repeated pro-
duction practice. In general, an ‘attractor’ is a stable state, a mode to which
the organism naturally returns again and again. It is not necessarily an end
point, but to the extent that an attractor state constitutes a stable ‘plateau’ in
development, powerful forces from other, parallel processes are required to
drive the system forward from one attractor state to another.
10.2 The starting point: biological precursors
Babbling is now generally accepted as providing the raw material for early
words. The continuity between babble and first words should not, how-
ever, be taken as evidence that the onset of canonical babbling (Oller 2000)
is primarily a language-driven activity. There is strong evidence that bab-
ble is just one of many rhythmic motor skills that come online in the first
year of life, providing the infant with the tools with which to gain knowl-
edge of the world (Iverson, Hall, Nickel & Wozniak 2007, Thelen 1981).
In Piaget’s (1952) terms, babble is a kind of ‘secondary circular reaction’, a
perceptuomotor link that helps to lay the foundations for intelligent
Campos et al. (2000) documented the cascading effect of cognitive
advances springing from the ability to initiate locomotion (see also
Walle & Campos 2014). Considered in a social context, the onset of babble
[205--229] 17.7.2015 4:23PM
can be expected to have a similar cascading effect. Currently there is a
growing consensus that babble is best viewed as a multimodal activity,
involving both proprioceptive and auditory experience (Guenther &
Vladusich 2012, Westermann & Miranda 2004). This provides powerful sup-
port for perceptuomotor learning, an excellent illustration of the way that
simple linear progression in a basic motor system makes possible the learn-
ing of complex cognitive structures (see, e.g., Rochat 1998, Westermann
The babbling patterns of infants are highly individual and yet subject
to very simple biological constraints. The earliest stable supraglottal
consonants produced (excluding glides, which are sufficiently vowel-
like to be observed in the precanonical period) are stops and nasals
(Locke 1983, McCune & Vihman 2001), both of which can be articulated
by simple raising and lowering of the jaw. Davis and MacNeilage (1995)
have formulated this process in terms of the frame/content theory of
early speech organization. In their account, early speech is dominated by
successive cycles of mandibular oscillation (the ‘frames’), in which the
starting tongue position determines both consonant and vowel. Thus,
alveolar stops co-occur with front vowels (e.g. [di], velar stops with back
vowels (e.g. [ko]) and bilabial stops with central vowels (e.g. [ba]).
babbling becomes more variegated, combining different consonants
within a single vocalization, the infant gains control over the ‘content’
within each syllable, leading to a wider range of consonant/vowel com-
binations. The predicted patterns of co-occurrence of consonants and
vowels in early speech have been found in numerous languages (but see
Chen & Kent 2005).
The gaining of voluntary motoric control over a specific consonant is the
next step towards incorporating these articulatory gestures into early
words. McCune and Vihman (2001) tracked these simple early speech
patterns – termed Vocal Motor Schemes (VMS) – in 20 infants. They char-
acterized a VMS as ‘a generalized action plan that generates consistent
phonetic forms ... a formalized pattern of motor activity that does not
require heavy cognitive resources to enact’ (McCune & Vihman 2001: 152).
They operationalized the onset of a VMS as the production of 10 or more
occurrences of a given consonant in each of three out of four successive
30-minute observational sessions (disregarding voicing distinctions,
which are poorly controlled in infants; thus, for example, [p/b] is a labial
stop VMS).
The VMS thus incorporate an element of both consistency and
stability over time. Attainment of a VMS means that the infant is able to
consistently access a speech-like motoric pattern with the expenditure of
only very limited cognitive resources, freeing those resources to support
For an introduction to phonetics we refer readers to Ladefoged and Johnson (2011).
In the experimental work that we report in the next section, VMS is alternatively established on the basis of fty
observed uses of a single consonant.
Babbling and words 211
[205--229] 17.7.2015 4:23PM
the novel attention and memory tasks of associating an arbitrary sound
pattern with a meaning.
10.3 The role of babbling: the accuracy of rst words,
preselectionand the articulatory lter
Ferguson and Farwell (1975) were the first to notice the relative ‘accuracy’
of many early child words, with later regression to more primitive forms.
This finding has since been supported in many studies (see Fikkert & Levelt
2008, Menn & Vihman 2011: appendix 1, which lists the first 5–6 recorded
words of 48 children, each acquiring one of ten different languages).
Table 10.1 presents the first four words of a Dutch child, Thomas (based
on Elbers and Ton 1985).
Like most early words, the Dutch target words are one or two syllables in
length and include mainly early-learned consonants (labial and coronal
stops, the glide /j/ and the fricative /s/, less common in babbling and early
words but still one of the core consonants; See Locke 1983). Somewhat
unusually, however, two of the words include changes of consonant place,
or place and manner: part, pus. Elbers and Ton note that eight of Thomas’
first twenty words involved more than one place of articulation but all but
one respect the labial–coronal (or front–back) sequence. This melodic
pattern is common in early words (e.g. Jaeger 1997, Vihman & Croft
2007), but not universal (the reverse pattern, back–front, is reported in
Berman (1977), Macken (1979), and Vihman, Velleman and McCune
We see in Table 10.1 that the child forms are remarkably close to the
adult models. Thomas’ first four words are ‘accurate’ enough, if we allow
for variability in aspects of production that the child does not yet control
(stop voicing – see Macken (1979), cluster reduction and place of articula-
tion in fricatives), and seem to have been ‘preselected’ for their relatively
simple and accessible target forms. Interestingly, Elbers and Ton (1985)
Table 10.1 First word forms: relative accuracyand high variability
Thomas (Dutch, 1516 months)
adult form gloss child form
/auto:/, /o:to:/ car[at], [atə], [aut],[auto:], [o:t], [o:to:]
/hap/, /hapjə/, /hapi/ a (little) bite[ap], [apə], [hap], [hapə], [hab], [habə]
/pa:rt/, /pa:rtjə/horse, horsie[pa:t], [pa:tə], [ba:t], [ba:tə]
/pus/, /pusjə/cat, kitty[pusj], [pəx], [bəx], [pux], [bux]
A general preference for the sequence labialcoronal has been alternatively ascribed to motoric (MacNeilage &
Davis 2000) or perceptual factors (Sato, Vallée, Schwartz & Rousset 2007, Tsuji, Gonzalez Gomez, Medina, Nazzi &
Mazuka 2012).
[205--229] 17.7.2015 4:23PM
note that the babbling patterns [at(ə)], [pa:t(ə)] and [bəx] were already
observed in ‘playpen monologues’ (i.e. babbling) before they recurred in
child word forms.
A question arises here as to what could be the mechanism underlying
the evident ‘preselection’ of forms to attempt. Vihman (1993) proposed
that an ‘articulatory filter’ might mediate the input, rendering salient
those patterns with which the child was already familiar from his or her
own babbling production. In this model, the emergence of adult-like
syllables, in the middle of the first year, provides the child with a valuable
resource (a kind of ‘bootstrap’, or easily accessible facilitator) for focusing
in on selected portions of the fast-moving input speech stream. This
resource would exercise its effect involuntarily: once one or more conso-
nants have been well practised – some weeks or months after canonical
babbling begins – the child’s attention is likely to be captured by sound
patterns that constitute a ‘good enough’ match to his or her own babbled
productions – like auto, hapje or pus for an infant who is already producing
[at(ə)], [pa:t(ə)] and [bəx]. This is much like an adult’s attention being
captured by overhearing a highly familiar proper name in a conversation
not consciously attended to (Wood & Cowan 1995). Such an implicit
experience of a match of own vocal pattern to input speech would
eventually lead to the child’s production of the pattern in certain fre-
quently repeated or routine situations; the consequence would be a
small number of known lexical items, the first identifiable words, typically
produced only in limited situational contexts (Vihman & McCune 1994; see
Figure 10.1).
Evidence for something like an ‘articulatory filter’ has now been pro-
vided by a series of experimental studies testing the effect of well-practised
consonants (VMS) on the child’s attention to nonwords embedded in short
sentences (DePaolis, Vihman & Keren-Portnoy 2011: 18 English infants,
aged 10–16 mos.) or presented in isolation, as word lists (Majorano,
Vihman & DePaolis 2014: 26 Italian infants, 8–11 mos.). Infants were tested
in both studies as soon as they showed frequent and stable production of at
least one consonant. Each nonword stimulus contained a single recurring
consonant, either a VMS produced by the child being tested (‘own’) or a
consonant not often used by that child but occurring as a VMS for other
infants (‘other’). (These studies also included a ‘non-VMS’ control, the
labiodental fricative [f/v], rarely produced in babbling and yet to be
reported as a VMS for any child.) The results of the two studies were
similar: infants with a single VMS looked longer in response to nonwords
featuring their own VMS, familiar from frequent use, whereas infants with
more than one (‘multiple’ VMS) showed more attention to ‘other’. Thus,
the extent of a child’s prior use of a particular consonant had an effect, as
predicted, on his or her perceptual attention to that consonant – but the
effect shifted, with increase in experience in consonant production, from
attention to what is familiar to attention to what is novel. In another study
Babbling and words 213
[205--229] 17.7.2015 4:23PM
Repeated vocal
leading to VMS
Word form
similar to child
vocal pattern
used repeatedly
in routine
Salience of
Articulatory Filter:
cross-modal mapping of
production onto
production of
child vocal
Figure 10.1 The matching of self- and other-produced vocal patterns to own production, supported by a familiar situational and/or verbal context, helps the infant to
chooserelatively accurate rst words.
[205--229] 17.7.2015 4:23PM
DePaolis, Vihman and Nakai (2013) tested English and Welsh infants,
using a different experimental design; they too obtained a novelty effect
from the infants with the most production practice with the consonant
The ‘novelty’ effect seen in these studies was not anticipated. It clearly
indicates a production effect on the processing of speech, at an age when
infants are beginning to produce adult-like syllables. To that extent it is
congruent with the articulatory filter proposal: in dynamic systems terms,
‘action’ (vocal production) supports perception (speech processing).
However, only one group of infants (Majorano et al. 2014, single-VMS
infants) listened significantly more to what they themselves were fre-
quently producing. ‘Multiple-VMS’ infants, instead, responded more
strongly to what they were not yet producing. Thus infants at a more
advanced point in vocal production gave greater attention, in the
restricted context of a listening task in the lab, to what was unfamiliar.
However, in the far more challenging task of accessing a specific word
form representation in order to produce it in a situation of social interac-
tion (as illustrated in Table 10.1), infants exploit the fact that some often-
heard adult words constitute a near-match to their own well-practised
patterns, making it possible to fall back on a pre-existing vocal routine
both to access and to reproduce a situationally relevant word (see Vihman,
DePaolis & Keren-Portnoy 2014).
Another possibility is that parental response might guide the child to
pick up on a match between his or her own vocal expression and similar
patterns in input speech and that the accuracy of the early words is thus
driven by parental responses. Several recent studies have demonstrated
the potential effect of mothers’ responses on their child’s vocal production
(Goldstein et al. 2003, Goldstein & Schwade 2008) as well as, complemen-
tarily, an effect of infants’ vocal forms on the adults, with canonical
babbling resulting in more conversational responses or imitations and
less advanced infant vocal forms eliciting sound play (Gros-Louis, West,
Goldstein & King 2006). However, the support afforded to phonological
advance from differential parental responses is inherently limited to par-
ticular child-raising conditions that, while common enough, cannot be
presumed to apply to children in all cultures. Moreover, this social per-
spective does not further clarify the unanticipated experimental finding
that more vocally advanced infants show greater interest in the nonwords
made up of less practised sounds.
It seems fair to say that social reinforcement alone would not be a
sufficiently robust mechanism to account for the steady increase in con-
sonant production observed over the period of transition from babbling to
words, throughout the single-word period and beyond (a developmental
trajectory consistent enough to stand as the best index of later phonologi-
cal advance: Stoel-Gammon 1992). Infants’ sensitivity to their own vocal
production, as mirrored in the input, has now been shown to have
Babbling and words 215
[205--229] 17.7.2015 4:23PM
potential roots in an early preference for vowels that reflect the proper-
ties of infant production already in the pre-babbling period (Polka,
Masapollo & Me
´nard 2014). As discussed above, when infants first begin
to produce canonical babbling, this can be expected to highlight aspects
of the input that resemble their own newly discovered production skills
(a little different for each child, and with differential shaping by cross-
inguistic differences in ambient speech; for a review see Vihman 2014,
and see Chapter 4).
One’s own production (more effortful than passive listening) is inher-
ently memorable and rich in multimodal cues (auditory, proprioceptive);
the match of input forms to own production will thus render roughly
matching word forms more memorable as well as more accessible motori-
cally, given existing output patterns. As vocal practice adds to the reper-
toire of familiar patterns, creating a more diversified collection of
remembered forms (supporting ‘phonological memory’: Keren-Portnoy,
Vihman, DePaolis, Whitaker & Williams 2010), however, infants gradually
broaden their attention to a wider range of input patterns. This enables
them to avoid becoming stuck in what might otherwise be a closed cycle.
The experimental studies described above provide early evidence for such
a shift, but it is seen behaviourally only some months later, as children
move from their ‘accurate’ first words to the more systematic patterning
but less accurate rendition of adult targets observed in the period of word
templates, to which we now turn.
10.4 Word templates: the beginnings of phonological
10.4.1 Holistic early word representations: production vs
Recent experimental studies, addressing either word recognition or word
learning, have suggested that early (perceptual) representations are ‘finely
detailed’, giving rise to the ‘phonetic specificity’ hypothesis, which
assumes that these representations are segmentally well defined and
adult-like (based on eye-tracking: Swingley & Aslin 2000, 2002, Swingley
2003, Chapters 7 and 8; preferential looking: Bailey & Plunkett 2002; or the
‘switch paradigm’: Werker, Fennell, Corcoran & Stager 2002). These stu-
dies test children’s ability to detect differences between novel or familiar
words that are minimally distinct phonologically, often without the neces-
sity for recourse to long-term memory. In contrast, Ferguson and Farwell
(1975) proposed that the first phonological representations are whole-
word based and as such may not involve analysis into segments. This
proposal has been supported by findings from production studies, which
necessarily involve accessing representations in long-term memory, often
in the absence of any immediate verbal or situational priming.
[205--229] 17.7.2015 4:23PM
The nature of infant ‘phonological representation’ is as yet poorly
understood. Different results are obtained, depending on accentual pat-
tern (English vs French: Vihman, Nakai, DePaolis & Halle
´2004) and, as
indicated above, on differing task demands –specifically, word recogni-
tion, word learning and word production. The task differences are impor-
tant: in the case of word recognition, both the word form and the
contextual situation or the image of a referent object may be expected to
prime memory for the word and its associations, making the memory load
negligible (e.g. Swingley 2003, Bailey & Plunkett 2002, Fennell & Werker
2003, Zamuner et al. 2014).
In the case of word learning significant attentional resources must be
allocated to the problem of retaining the arbitrary sound-meaning link, as
Werker and her colleagues (2002) have argued (see also Storkel 2001, who
made the same point on the basis of a word learning experiment with
3-year-olds). This should make the task of learning new words particularly
difficult for children who lack a stock of well-practised production pat-
terns or routines to support memory for the new word form. One indica-
tion of this is the finding, reported by Werker et al. (2002), that after
habituation training to associate /bɪ/ to one novel object and /dɪ/ to another,
the only 14-month-olds who responded with surprise to the ‘switch trial’,
in which the new ‘word form’ is associated with the wrong object, were
those with a reported production vocabulary of over 25 words (whereas
the 17-month-olds were ‘successful’ as a group in showing word learning
in this sense). A larger production vocabulary has also been found to
support advanced performance in semantic processing of familiar words
in 12-month-olds (see Friedrich 2008, Friedrich & Friederici 2010, cf. also
Conboy & Mills 2006, Marchman, Fernald & Hurtado 2010, Mills, Coffey-
Corina & Neville 1997) and also both fast-mapping and online word pro-
cessing in slightly older children (Fernald, Perfors & Marchman 2006,
Fernald, Swingley & Pinto 2001, Torkildsen et al. 2009, 2008, Zangl,
Klarman, Thal, Fernald & Bates 2005). Thus there is considerable experi-
mental evidence that production experience supports the accessing and
use of familiar word representations.
The contradiction between the apparently ‘detailed’ representations
suggested by perception experiments and the holistic representations
imputed to children on the basis of production studies can be reconciled,
then, if we bear in mind that word production requires cognitive resources
above and beyond what is required for word recognition or even new word
learning – in particular, memory and planning as well as motoric skill. As
children begin to make use of larger numbers of word types they must rely
on temporarily activated representations for production, often showing
regression in accuracy in the word forms they produce. These later repre-
sentations, although dependent on perceptual experience of a sound pat-
tern, give us good reason to accept Waterson’s (1971) judgment that they
are holistic ‘schemas’ or, in our terms, templates, in which the child’s
Babbling and words 217
[205--229] 17.7.2015 4:23PM
previous production practice strongly influences his or her memory for
word forms. (For an example of template-based regression to a less accu-
rate form for a particular word, consider the child Maarja’s use of [patə]
for pasta at 1;5, followed by the less target-like [pajə] a few weeks later,
which is more similar to many of her other forms, based on a palatal
template: Vihman & Vihman 2011). We address the question of the
source of the holistic representations below, in our discussion of learn-
ing mechanisms.
10.4.2 Whole-word phonology: variability
Several arguments for whole word representation as the basis for produc-
tion are summarized in Vihman and Croft (2007: 689); we review them
here, beginning with illustration and discussion of the first, ‘variability’.
The three remaining arguments – holistic match of child to adult form,
similarity among child forms, and response to challenges – will be dis-
cussed in the next section.
1. Variability. a sound may be produced differently in different early
words, and individual words may be more or less variable (Ferguson &
Farwell 1975). This suggests that although the child has gained knowl-
edge of particular words (‘item learning’), those word forms are not
consistently analysed into their component segments.
Ferguson and Farwell (1975) famously reported twelve widely varying
productions of the word pen over the course of a single session at about
15 months by K, one of the two American children they observed, with
alternate labial or alveolar, oral or nasal onsets, or neither, and with a
range of oral or nasal low to mid vowels.
A similar example of a ‘hard word’ is circle, attempted six times by an
English child, Jude (also aged 15 months but already producing 25 words in
a half-hour session – hereafter the ‘25-word point’, which corresponds to a
cumulative lexicon of over 50 words: Vihman & Miller 1988). Jude’s var-
iants range widely, in full or partial whisper:
̩ɬu], [ts
̩tʰə] (x2), [tʰtʰ], and in direct imitation, [tɑ
̥], [kʰtɬʉ]
Here we see evidence of child attention to the sibilant and its co-occur-
rence with a stop and a lateral, although the place of the stop appears to be
uncertain, as does the sequencing of the various segments, despite the
presence of an immediate adult model in the case of two of the variants. It
is evidently not the individual sounds that Jude cannot accurately repro-
duce, since he articulates each of them in at least one attempt at the word.
Similarly, there is no reason to believe that accurate perception of the
adult segments is at issue. Instead, the child’s difficulty appears to derive
from the planning and production, in proper sequential order, of the
pattern as a whole, with its rapidly changing series of consonantal
[205--229] 17.7.2015 4:23PM
gestures. The pattern to which the largest proportion of Jude’s word forms
conform in the session is CVCV, with harmony across the two consonants
(accounting for 38 per cent of all variant prosodic shapes). Furthermore, of
the four sibilants he produces in the session, in chocolate [ʃʊ˳tʰa], circle, fish
̩:] and trousers (imitated as [təts
̩]), all but one are syllabic. Thus the four
distinct consonant targets in circle provide a challenge to which Jude has
not yet found a solution; his attempts are best described as exploratory,
reflecting the interaction of perceptual and motoric influences.
The children’s ‘underlying representations’ cannot easily be inferred
from such production efforts. They are better described as dynamic or
fleeting than as set or stable (or reliably accessible), with apparent influ-
ence on the momentary remembered form of the word not only from the
percept of the target word itself but also from coexisting (‘whole word’)
production patterns available in the child’s repertoire – patterns which
must be accessed for vocal expression.
10.4.3 Templates in one childs word production
Three further arguments for whole word phonology were cited in Vihman
and Croft (2007).
2. Holistic match of child to adult form. Comparison of early child words to
their adult models on a segment-by-segment basis is often difficult, as
Waterson (1971) showed in the case of her son ‘P’. Instead, the child
may appear to be targeting a ‘whole gestalt’.
3. Similarity among child forms:.The interrelation between the child’s own
words may be more evident than the relation to the adult models
(Macken 1979).
4. Response to challenges. The ‘gestalts’ or ‘templates’ which are taken to
underlie the common patterning of a child’s words can be seen as
responses to one or more challenges posed by the segmental sequence
or structure of the word form as a whole. The primary challenge, in
most cases, is to produce different consonants, vowels or both within a
single syllable of a word (e.g. pen) or across syllables (circle).
The relationship of child to adult form and the sources of child difficulty
have already been illustrated with Jude’s variable forms. Proper apprecia-
tion of the patterning seen in a child’s word forms requires that one
consider the full set of variants produced in a given session, however, or
over a delimited period of time (e.g. Priestly 1977); for data illustrating
template use of a similar kind in several younger children learning several
different languages, see Vihman & Keren-Portnoy (2013).
In order to further illustrate these principles and to show their inter-
relationship we draw here on the patterns observed at the 25-word point
for one British child, Jack, who was included in a study of expressive Late
Talkers identified at 24 months on the basis of their small vocabulary or, in
Babbling and words 219
[205--229] 17.7.2015 4:23PM
Jack’s case, lack of word combinations (Vihman, Keren-Portnoy, Whitaker,
Bidgood & McGillion 2013). When formally tested on the Reynell-III Scales,
at 2;6, Jack’s expressive score was within four months of the norm for that
age; accordingly, he was reclassified as a ‘transitional Late Talker’.
Children who, like Jack, make a later start than is typical differ from the
younger children whose templates have been described previously in the
literature by virtue of their far larger (age-appropriate) receptive lexicon in
relation to their expressive word use. Their lexical targets are accordingly
more advanced than those of typically developing children at the same
level of phonological development, which means that the ‘adaptations’
observed are sometimes more radical than those reported for younger
children. At the same time, their word forms reflect phonological categor-
ization and template formation in a way that closely resembles that of
younger children. The process of induction of template patterns that we
describe in Section 10.5 can thus be understood to be the same for both
younger and older children.
Analysis of later word forms: methodological note
The analyses that follow are designed to identify the phonological-shape
categories that the child is developing as his vocabulary begins to grow
beyond the first few word forms produced. We will use the term prosodic
shape to refer to variant child formsof a single target word (or phrase treated
as a single lexical unit) that differ in number of syllables or in presence or absence of
onset or coda (see examples below). Based on frequency of occurrence and
with due allowance for differences in articulatory resources, we then iden-
tify as few prosodic structures (categories of prosodic shapes) as are compatible
with the variability in the child’s production, in accordance with the follow-
ing guidelines:
We categorize separately differing prosodic shapes for the same target
word and thus include these in the analysis and calculation of frequency
of use.
We treat differences in the mora count of the nucleus – short or long
vowel, monophthong or diphthong – as representing different shapes
only when the result is two categories each accounting for 10 per cent or
more of the total different shapes produced.
We include no more than one variant of any target word within a single
prosodic structure, and we include any one variant in no more than one
In short, for any one child the number of prosodic shapes included in the
analysis will exceed the number of different adult words targeted to the
extent that the child uses variable prosodic shapes for a single target form.
This makes it possible to obtain the broadest possible look at the diversity
and stability of the child’s phonological resources on the basis of a single
half-hour recording.
[205--229] 17.7.2015 4:24PM
A final step in the analysis is to identify templates, or overused (‘general-
ized’) prosodic structures that may reflect either disproportionate selection
of target word forms involving a particular prosodic structure or adaptation
of dissimilar (and thus, for the child, more challenging) target forms to fit
that structure, or both (Vihman & Velleman 2000). Adaptations are defined
as cases in which the target word form is changed in such a way as to meet the
prosodic structure specified in the template. In general, we require a minimum
of 10 per cent of all prosodic shapes to fit a particular structure in order to
count as evidence that the child has developed a template. We also expect
to see either cases of adaptation to the template or other evidence, based on
disproportionate use or over-selection, that the structure is serving as an
attractor in the child’s word-form learning.
Jacks prosodic structures and templates
In the 25-word point session – at 26 months – Jack was primarily engaged
in looking at a book with his mother. He produced 57 different identifiable
target word types (including five used only in the four combinations
produced). Including for analysis all distinct prosodic shapes of a single
word type results in a total of 71 variant shapes. These variants can be
classified into five prosodic structures (as shown in (a)–(e), with proportion
of all variant shapes produced indicated in parentheses–including both
selected forms, which accommodate to the target, and adapted forms,
which assimilate the target to a pre-existing child pattern):
a. <CVVo> (i.e. long or short, mono- or diphthongal open
monosyllables: 24%);
b. <CVSTOP> (13%);
c. <CVVN> (18%);
d. <CVGLIDEV> (25%);
e. full or partial reduplication (13%).
Of the five prosodic structures listed, (c) CVVN and (d) CVGLIDEV attract
the largest proportion of adapted forms, or words fitted into the pattern
despite having a different target shape; these can accordingly be consid-
ered templates for this child. We briefly describe the make-up of each
prosodic structure in turn.
a. <CVVo>: This is a common default structure for early words, cross-
linguistically. Jack produces seven ‘selected’ forms in this structure for
open monosyllabic targets (disregarding the expected cluster reduction in ski,
snow). Eight of the remaining ten forms of this structure have a target CVC
but show coda omission; Jack produces codas highly inconsistently in this
session. Despite the competing CVSTOP and CVVN structures, both final
stops (bed, boat, shark) and nasals (clown, down, moon, train) are omitted in
these variants. The two remaining forms are adapted to the CVVo pattern
by truncation: balloon [bʊ:::], bananas [baʊ]. Despite its high frequency, then,
Babbling and words 221
[205--229] 17.7.2015 4:24PM
this is not a stable or consistently used template, given the high variability
of coda production. Instead, it may be a ‘fall-back’ pattern for Jack, pro-
duced in alternation with his competing templatic structures, reflecting
instable or rapidly evolving representations and inconsistent access
to them.
What is noticeable in this and other structures is Jack’s preference for
selecting and producing the diphthong [aʊ] (8 variants, or 11 per cent).
Altogether in this session Jack targets two words with the diphthong
(clown, down) and adds it idiosyncratically to his production of four additional
words (e.g. bananas [baʊ], strawberries [daʊ:wɪ]; see Tables 10.2 and 10.3). For
Jack, the diphthong itself, though not a whole-word pattern, may be con-
sidered ‘templatic’, or an attractor for vowel nuclei. This suggests that
segments and sequences as well as whole-word structures may be templatic,
or projected from familiar motoric routines to novel word targets.
b. <CVSTOP>: Of the nine word shapes in this structure eight must be
considered ‘selected’, despite inconsistent place of articulation in the coda
(variable replacement of /k/ in bike, cake and socks by [ʔkʰ], [tʰ] or glottal stop,
resp., and of a harmonizing coda in grape [geɪk] and sheep [ədit]). The
remaining form, robot, is produced as [mɑʔ]or[m
̩bɑʔ]. Since in this case
the target is changed in ways that allow it to fit the specifications of the
prosodic structure, we consider the form to be adapted. That is, the target
robot has as its initial (and stressed-syllable) onset a consonant that the
child does not yet produce and a second syllable that matches one of his
favoured patterns. Accordingly, the adaptation reflects attention to those
features of the target (labial C
onset, CVC structure of the more accessible
second syllable) that fit the template and a disregard for its other features.
c. <CVVN>: Most forms are ‘selected’, with consistently matching rhyme
(vowel nucleus and coda) but with cluster reduction and changes to the
onset consonant (Table 10.2). The forms of the adapted words are particu-
larly idiosyncratic: boat also occurs as CVV ([məʊ], [bəʊ:]), ladybird shows
consonant harmony as well as inclusion of the favoured diphthong, mush-
room shows truncation and hammer appears to be a blend of hammer and
bang, with bang then appended in a different form.
d. <CVGLV>: Few ‘selected’ productions seem to occur in this structure,
but moon and worm can be considered ‘selected’ if we allow the structure to
include an optional final nasal, as can Harriet if we accept the glide as the
child approximation of medial /r/ (see Table 10.3). All these changes relate
to missing elements in the child’s phonetic repertoire; the child’s word
Although the prexed syllabic nasal might seem to stand infor the missing initial syllable in Jacks production of rob ot,
consideration of the word shapes produced in the session as a whole make this seem unlikely: of the 71 variants
produced, 15 (21%) include a ller vowel or syllabic nasal (grayed out in the child form): cf., e.g., page [m
̩bɛɪ], socks
̩dɑʔ], cake [nədɛɪtʰ], dinner [ɪɲjɪnja] as well as hoover, mushroom and sheep, cited elsewhere in this section.
[205--229] 17.7.2015 4:24PM
forms are perhaps as good an approximation of these particular target
words as the child’s current resources permit.
The structure fits most closely with target open monosyllables with a
long vowel nucleus; the glide, in several cases, is a more or less automatic
consequence of the vowel sequence, even where not indicated (see
Table 10.3: moon [bʊ:[w]ən], worm [bɛ[j]ʊm], bee [bi:[j]a], no [nəu::[w]ə]). Note
that most of these forms alternate in the same session with monosyllabic
variants, CVV
:no [nəu:], ski [gi], two [duə]). The most striking adaptations,
however, involve closed monosyllables (paint, toast) and longer words
produced with this pattern (banana, strawberries). These forms seem to
reflect Jack’s ease in producing diphthongs, which can also extend into a
second syllable.
e. Words with full or partial reduplication (harmony)
Five words are fully reduplicated. None are based on reduplicated
targets, which are relatively rare in English; all reflect instead active
reduplication of the child’s monosyllabic patterns (including glottal
stop, which often serves as a coda consonant here; see Table 10.4). In
Table 10.3 Later word forms: Jacks disyllabic <CVGlV> pattern
Select ADAPT: monosyllabic target adapt: longer target
moon [bʊ:ən] bee [bi:a] banana(s) [bɛ::|aʊ]
worm [bɛʊm] no [nəu::ə] guitar [gi:a:]
Harriett [heɪjɛ:] paint [beɪ:ə] pizza [m
̩bia, biə]
ski [ŋi:a] strawberries [dau:wi]
toast [dəu::a] twit twoo [du::ə]
two [du:ə]
whee [bi:ə]
The vertical line represents a brief pause or break between the two syllables.
Table 10.2 Later word forms: Jacks CVVN pattern
Jack <CVVN>
Select Adapt
clown [daʊn] boat [beɪn]
crane [heɪ::n] hammer (bang?) [baʊm|bæʔ]
green [gi:n] ladybird [la:bwaʊm]
paint [beɪn] (x2) mushroom [m
pink [bɪ]
plane [deɪ:in]
spoon [bʊ:::::n]
sun [dʌm]
train [dəɪn]
The vertical line represents a brief pause or break between the two syllables.
Babbling and words 223
[205--229] 17.7.2015 4:24PM
addition, five words show harmony, two of them (mummy, teddy) being
A handful of forms (7 per cent) fit into none of these categories; of these,
two are quite accurate within the limits of Jack’s production (pizza [pisi],
tool box [dʊ:bæ]), while three are anomalous in that they have no apparent
parallels or models in Jack’s other word forms (Jacob [gidʌb], rabbits [bɛ:gɪʔ],
scooter [m
The patterns we see in Jack’s word forms reflect, as do the patterns of
younger children, his reliance on a small core consonant inventory, one
which consists primarily of stops, nasals and glides. Recall that, unlike
younger children at this level of expressive vocabulary, Jack has good
comprehension of a much larger number of words and, given his later
start, more experience of the world of possible referents: this is evidenced
by the words he produces, which would be surprising in children of 16 or
18 months, for example (e.g. robot, shark, hoover). Beyond that, the many
‘adapted’ forms, or forms that fail to match the target (even in cases where
he clearly has the necessary articulatory or phonetic resources to make a
more accurate match, e.g. boat, toast), provide evidence that Jack is indu-
cing generalized patterns from his own output.
Thus our analysis of Jack’s word forms, towards the end of the single-
word period, illustrates a later moment in the developmental sequence
that we have been presenting here. Once a child has learned a certain
number of adult-based words, usually at the fairly slow pace characteristic
of ‘item learning’, word learning becomes easier and accordingly more
rapid (Vihman 2014). This greater facility can be ascribed to the emergence
of one or more well-practised ‘motor plans’ or templates, attractors that
serve to support attention and memory to the form–meaning link.
Consequent to the repeated production of a certain number of (selected)
words that fit within the child’s existing vocal patterns, the well-established
motoric routines or templates lead to adaptation, or the online categoriza-
tion of more complex adult target forms in terms of those simpler existing
structures. We see this as the beginning of phonological systematicity – in
other words, as an emergent phonological grammar. At this point the
child’s patterns are based on the intersect between his or her own output
forms and common input patterns.
Table 10.4 Later word forms: reduplication and harmony in Jacks
word shapes
Based on <CVVo> Based on <CV> Based on <CVCV>
bear [m
̩ba:ba:] Bike [m
̩baɪʔbaɪʔ] dinner [ɪɲjɪnja]
twit twoo [dudu::] choc(olate) [dɑʔdɑʔ] hoover [əwʊ:wæʔ]
pas(ta) [m
̩bæʔbæ] Jacob [dɪdʌb]
mummy [mʌmɪʰ]
teddy [dæʔdɪʰ]
[205--229] 17.7.2015 4:24PM
10.5 Learning mechanisms
Studies of artificial grammar learning in adults (e.g. Reber 1967) suggested
the importance of statistical or ‘distributional’ learning over forty years
ago, but it is only within the past twenty years that experimental findings
have made it clear that infants, like adults, automatically tally distribu-
tional regularities in the environment (Saffran, Aslin & Newport 1996; see
also Chapter 3). This learning capacity is not restricted to speech (i.e. is not
‘domain specific’), however, but has been shown to apply automatically to
any regularly recurring sequence in the infants’ environment (Kirkham,
Slemmer & Johnson 2002). If we relate these findings to the host of experi-
mental studies of prelinguistic responses to speech that began to be
reported in the 1990s (Jusczyk 1997, Vihman 2014), we can conclude that
over the course of the first year infants gradually gain a sense of the
patterns to which they are exposed in their ambient language or languages
as regards sequences at any level of linguistic organization – segments,
syllables, accentual patterns, words, phrases, clauses. Based on adult stu-
dies (e.g. Saffran, Newport, Aslin, Tunick & Barrueco 1997), it seems clear
that this learning occurs in the absence of any specific intent to learn or
even of (conscious or focused) attention to linguistic patterning as such.
However, word production requires that the child register arbitrary form–
meaning relationships; the word forms repeatedly used in a given situa-
tion must persist in the child’s memory, together with their context of use
(or meaning), in order to lead to recognizable word use. This need not
imply focused attention or a specific intention to learn. Rather, the routine
recurrence in a given situation of a sound pattern familiar from the child’s
own vocal practice can be taken to lead to the child’s implicit registering of
that familiarity, which would involve only a minimal degree of attention
and would in effect ‘prime’ the child to produce that pattern in the often
experienced situation (see Figure 10.1). Each instance of word production –
which necessarily involves motoric effort (Elbers & Wijnen 1992) – can be
expected to strengthen the memory trace, making future deployment of
the pattern more likely (Edelman 1987) and facilitating memory for both
form and meaning. Such early word production, supported by the experi-
ence of a perceptual match between adult and child form, can be taken to
be the source of the relatively ‘accurate’ first words, as indicated above.
This is ‘item learning’; each word must be remembered individually as a
whole, form and meaning together. It is thus quite different from the
rapid, automatic registering of recurrent regularities (‘distributional
Current thinking in neuroscience supports the idea of complementary
memory systems (Davis & Gaskell 2009, Lindsay & Gaskell 2010, O’Reilly &
Norman 2002). It is widely accepted that the hippocampus is required to
consolidate detailed, multimodal episodic memories, which are the basis
of learning from unique experiences – such as the item learning just
Babbling and words 225
[205--229] 17.7.2015 4:24PM
described (McClelland, McNaughton & O’Reilly 1995, Squire & Kandel
1999). Furthermore, the registering and recall of arbitrary form–meaning
pairs also generally depends on processing in the frontal lobes (known to
be involved in the selection of percepts for focused attention). In contrast,
the registration of regularities – the essence of distributional learning –
occurs even in the face of hippocampal damage, permitting amnesic
patients to abstract structure from a set of related items, for example
(Knowlton & Squire 1993).
There is thus ample evidence to support a distinction between two types
of learning – one probabilistic, statistical, sensitive to distributional prop-
erties such as frequency of occurrence and sequential patterning,
the other responding to chance conjunctions of unrelated elements (nota-
bly, for our purposes, the arbitrary ‘episodic’ association of form and
meaning), essential for the construction of a lexicon. What is most impor-
tant is the idea that once the child has begun to produce words, the ‘input’
to the child’s distributional learning mechanism will necessarily begin to
include the child’s own word forms. This is a critical change: Now the
internal structure of the first ‘selected’ words will automatically be
induced, in a process in which that structure is (i) filtered through the
child’s limited vocal production experience and (ii) analysed through dis-
tributional learning. This will provide the child with implicit phonological
structures that can be ‘projected’ onto the input speech stream, ‘capturing’
(or retaining in memory) possible words to say that more or less approx-
imate one of the familiar patterns or templates that have emerged for the
child. These words may then gradually become more ambitious, complex
or challenging and also less familiar, because they are more distant from
the vocal patterns available to the child. The new words need share only a
minimal resemblance to the induced patterns; they will be altered in
individual, sometimes quite idiosyncratic ways. The resulting word
forms will cluster into different accessible structures or templates such
as those described above.
The interaction between lexical and phonological learning, over the
period from first words to the beginnings of syntax, may thus be concep-
tualized as follows. The initial constraints on word learning derive from
limitations in terms of motoric (articulatory) practice, experience with the
planning of articulatory gestures and, not least, phonological memory,
which is constructed on the basis of repeated exposure to speech as it is
both heard and produced (Keren-Portnoy et al. 2010). In the ‘first words’
period, which may extend from days or weeks to months (Menn & Vihman
2011), item learning – with the support of experience gained through
babbling practice and the critical addition of attention-based learning of
the situational contexts of adult word use – leads to a small repertoire of
first word forms, which may or may not be closely related to each other in
overall prosodic shape (although these shapes are universally short and
[205--229] 17.7.2015 4:24PM
As described above, this new database provides the foundation for gen-
eralization over known word forms, making it possible for the child to
attempt harder words while maintaining relatively simple structures.
However, as the child produces more and more words, vocal experience
accumulates along with ongoing implicit comparison of child to adult
word forms and increasing ability to retain patterns in memory; all of
this results in the accumulation of some critical mass of vocabulary
(a lexical level that in itself differs from one child to the next: D’Odorico,
Salerni, Carubbi & Calvo 2001, Ganger & Brent 2004, Parlade
´& Iverson
2011), until it is sufficient to move the system to a new attractor state,
releasing the child from the need to adapt new forms to a small number of
pre-existing templates. In the final ‘leg’ of the U-shaped curve templatic
patterns diminish and fade out altogether, leaving only the segment- or
sequence-specific production errors familiar from the speech of toddlers
(for English, fricative stopping, cluster reduction, etc.). The processes of
item learning, generalization and construction of better memory for
speech are sequential in onset but parallel in development, as predicted
by the dynamic systems model.
The whole process is data-driven from the bottom up and self-organized
through the powerful learning mechanisms highlighted above. As noted
earlier, Pierrehumbert (2003) describes a similar process of generalization,
or of inducing ‘type statistics’ from existing internal word representations,
although she supposes that this must happen only much later than the
period of the first words, on the basis of a much larger lexicon. Here we
have seen that from very early in their development children begin to
overuse production patterns. This kind of pattern induction and general-
ization has been found to facilitate and accelerate the process of further
lexical learning. Furthermore, in parallel with producing new word forms
that conform to an internally developing templatic system the infant is
also gradually moving closer to the adult system, as described above.
10.6 Conclusion. From babble to words:
a developmental account
In order to better understand the processes that might account for the
origins of phonological system we have presented evidence to help
uncover the essential continuity between babbling and first words. We
claim that babbling is only one of many manifestations of the child’s
general motoric development, with its rhythmic base and its cascading
socio-cognitive consequences. And we argue that a child’s babbling prac-
tice provides the essential resources for the identification and shaping of
early word forms. We provided experimental evidence to back up the
claim that the apparent pre-selection of adult targets reflects implicit
multimodal matching of the child’s own vocal production patterns to
Babbling and words 227
[205--229] 17.7.2015 4:24PM
frequent input speech sequences. In dynamic systems terms, maturational
advances in vocal production – primarily the emergence of rhythmic
canonical babbling syllables in the middle of the first year – provide fuel
for a phase-shift to first word production. But the presence of speech-like
syllables in repertoire is not in itself sufficient to catalyse this shift.
Instead, the reliably expected environment of a growing child – the pre-
sence of responsive caregivers, the infant’s sense of reward elicited by the
production of vocal forms that echo some of their caregivers’ talk, the
auditory and proprioceptive feedback obtained from the articulation of
the syllables that provide that reward – makes available numerous sup-
porting experiences to tune those syllables in the direction of the ambient
language. Critically, in a parallel process, the child must begin to form
links between particular input forms and their meanings or, more broadly,
the situational contexts in which they occur (see also McCune 1992, 2008).
The match of certain input sequences to forms familiar from babble is only
one part of this process, supporting the form side of what needs to be
learned; ultimately, multimodal experiences and the memory traces they
create are as essential, to begin to transform babble into word use, as is
familiarity with word forms.
The route that we described from babbling to words is ‘universal’ but
also highly individual, since the starting points (the particular first sylla-
bles or consonants to be mastered) differ, as do the pathways followed. We
noted that particularly challenging word forms may give rise to an excep-
tional degree of variability (for evidence of an increase in the variability of
a child’s word forms in the weeks immediately preceding the first manifes-
tation of a stable templatic pattern see Vihman and Velleman (1989) and
Vihman et al. (1994)). We also considered both first words (from a typically
developing Dutch child: Table 10.1) and later words (from a transitional
late talker learning English: Tables 10.2–4). In each case we saw individual
phonetic constraints deriving from variable motor skills and practice and,
in the case of the later words, we saw that those constraints translated into
particular pathways leading to phonological structure. Although we con-
sidered only a single recorded session for one child’s later words, nonli-
nearity was reflected, if indirectly, in his word patterns if we assume that
his first words, like those of other children, included a subset of the
structures seen at the 25-word point and were relatively accurate. In the
data that we described here the child’s ‘adapted’ word forms were some-
times quite remote from their targets, yet in each case they were similar to
the forms that the child was producing more faithfully. (For more direct
evidence of nonlinearity in phonological development one must turn to
such longitudinal case studies as Macken 1979, Oliveira-Guimara˜es 2013,
Priestly 1977, Vihman & Velleman 1989, Vihman & Vihman 2011; see also
Vihman 2014: appendix 3, ‘Word template analysis: a diary study’.)
As outlined by Thelen and Smith (1994), knowledge here again reflects
the history of actions of each child, although we did not trace individual
[205--229] 17.7.2015 4:24PM
babbling patterns through the accurate first words to the generalized
patterns of the later words. We demonstrated that each child constructs
knowledge in their own way, based on their own specific perceptuomotor,
lexical and phonological experience. Finally, we argued that there is no
need to posit innate knowledge structures (UG) in order to explain the
emergence of language. The learning mechanisms we invoke, unique in
humans due to the combinatory power of distributional and item learning,
seem to us to be sufficient to account for the formation of a phonological
Suggestions for further reading
Pierrehumbert, J. (2003). Phonetic diversity, statistical learning, and acqui-
sition of phonology. Language and Speech, 46, 115–54.
Thelen, E., & Smith, L. B. (1994). A Dynamic Systems Approach to the Development
of Cognition and Action. Cambridge, MA: MIT Press.
Vihman, M. M. (2014). Phonological Development: The First Two Years, 2nd edn.
Malden, MA: Wiley-Blackwell.
Vihman, M. M., & Keren-Portnoy, T. (eds.). (2013). The Emergence of Phonology:
Whole Word Approaches, Cross-linguistic Evidence. Cambridge University
Babbling and words 229
... Moreover, during the babbling period which extends over several months (from 6 months to more than 12 months), modifications in infant vocal behaviors occur as a result of the interaction between biological changes (Kent, 2021(Kent, , 2022, the increase in motor control (Green et al., 2000;Nip & Green, 2013) and the influence of auditory exposure to ambient language (Vihman et al., 2009). Although the impact of auditory experience on babbling output is shown on consonant sounds (Majorano & D'Odorico, 2011) or on the acoustic characteristics of vowels (Boysson-Bardies et al., 1989), it is rarely assessed on the syllable structures which can also be language-specific. ...
... If babble output is in part driven by biological factors and by limited oro-motor skills, it is also well accepted that the auditory exposure to adult speech has an impact on early productions (Vihman et al., 2009). There is a frequency effect of the perceptually based patterns from the language environmental input on the infant's learning path, and babbling is considered to already share some of the features of the target language (Morgan & Wren, 2018 for reviews; Rvachew & Alhaidary, 2018). ...
Cross-linguistic studies describing the syllabic structures of babbling productions agree on the high prevalence of the CV structure, but few have addressed the other types of syllables emerging during this pre-linguistic stage. However, studying the evolution of the distribution of syllabic structures during babbling would make it possible to test both the influence of motor constraints and the influence of the perceptually based patterns from the infant’s language environmental input on the production of early syllables. A monthly follow-up of 22 French infants from 8 to 14 months showed that the distribution CV>V> CCV>CVC>VC was shared by the majority of infants in the sample and remained the same throughout the observation period. The comparison of the frequencies of the structures observed with those attested in adult-French and in 4 other languages (Dutch, Korean, Moroccan Arabic and Tunisian Arabic) revealed significant differences between all adult samples and infant productions. The results have implications for understanding the nature of factors impacting syllable production at the babbling stage. We discuss the possibility that the target language does not affect the production of babbled syllables.
... In this perspective, babbling has been proposed as the basic mechanism underlying the setting of the perception/production coupling, a perceptuomotor activity allowing the perceptuocognitive connection between one's speech actions and the proprioceptive and acoustic percepts generated by this action (e.g., Vihman et al., 2009). Neurofunctional evidence supports these hypotheses. ...
Full-text available
Growing evidence shows that early speech processing relies on information extracted from speech production. In particular, production skills are linked to word-form processing, as more advanced producers prefer listening to pseudowords containing consonants they do not yet produce. However, it is unclear whether production affects word-form encoding (the translation of perceived phonological information into a memory trace) and/or recognition (the automatic retrieval of a stored item). Distinguishing recognition from encoding makes it possible to explore whether sensorimotor information is stored in long-term phonological representations (and thus, retrieved during recognition) or is processed when encoding a new item, but not necessarily when retrieving a stored item. In this study, we asked whether speech-related sensorimotor information is retained in long-term representations of word-forms. To this aim, we tested the effect of production on the recognition of ecologically learned, real familiar word-forms. Testing these items allowed to assess the effect of sensorimotor information in a context in which encoding did not happen during testing itself. Two groups of French-learning monolinguals (11- and 14-month-olds) participated in the study. Using the Headturn Preference Procedure, each group heard two lists, each containing 10 familiar word-forms composed of either early-learned consonants (commonly produced by French-learners at these ages) or late-learned consonants (more rarely produced at these ages). We hypothesized differences in listening preferences as a function of word-list and/or production skills. At both 11 and 14 months, babbling skills modulated orientation times to the word-lists containing late-learned consonants. This specific effect establishes that speech production impacts familiar word-form recognition by 11 months, suggesting that sensorimotor information is retained in long-term word-form representations and accessed during word-form processing.
... Prior research suggests that canonical babble onset is more a meaningful benchmark for motor development than it is for language development (Vihman et al. 2009 representations. This perspective-that canonical babble demonstrates a motor skill that helps prepare the child for language, rather than an early step in language development-aligns with findings showing its cross-cultural and cross-linguistic developmental robustness (Oller 1980;Oller et al. 1998;Oller 2000;Cychosz et al. 2021). ...
Full-text available
Recent evidence shows that children reach expected basic linguistic milestones in two rural Indigenous communities, Tseltal and Yélî Dnye, despite infrequent exposure to child-directed speech. However, those results were partly based on vocal maturity measures that are fairly robust to environmental variation, e.g. the onset of babbling. Directed speech input has been traditionally linked to lexical development, which is by contrast environmentally sensitive. We investigate the relation between child-directed speech and early phonological development in these two communities, focussing on a phonological benchmark that links children’s pre-lexical and early lexical development: the production of consonants. We find that, while Tseltal and Yélî children’s canonical babble onset align with previously attested patterns, their early consonant acquisition shows some divergence from prior expectations. These preliminary results suggest that early consonant production may demonstrate greater environmental sensitivity than canonical babble, possibly via similar mechanisms that link linguistic input and lexical development.
... Our findings of the general development pattern of prelinguistic skills, e.g., eye contact (Berger & Cunningham, 1981;Brooks & Meltzoff, 2005;Dawson et al., 2000), gestures (Bates et al., 1979;Crais et al., 2004;Guidetti, 2002;Iverson et al., 1994;Masur, 1983;Messinger & Fogel, 1998;Perrault et al., 2019;Tomasello et al., 2007), vocalization (Davis & Macneilage, 1994;Vihman et al., 2009), first words (Brooks & Kempe, 2012;Hadley Pamela et al., 2016;Hsu et al., 2017;Mahmoudi Bakhtiyari et al., 2011;Tardif et al., 2008), facial expressions (Cole, 1986;Herba & Phillips, 2004;McClure, 2000), behaviour regulation (Carpenter et al., 1983), joint attention (Carpenter et al., 1998;Mundy et al., 2007), social interaction (Papousek & Papousek, 1975), imitation (Jones, 2009;Wang et al., 2015), object permanence (Baillargeon & DeVos, 1991;Corrigan, 1978;Gopnik & Meltzoff, 2021;Tomasello & Farrar, 1986), play (Casby, 2003) and language comprehension (Cummings et al., 2009;Gervain & Werker, 2008) are consistent with some studies. ...
Prelinguistic skills play an important role in children’s communication development. These skills are considered as significant bases for language acquisition and function conductive to later social development. Means of communication, communicative functions, skills with cognitive bases, and language comprehension are important prelinguistic skills. There is a critical period for acquiring prelinguistic skills and early identification of communication deficits is an important issue to be considered. The present study aimed to develop a communication skills checklist for Persian children aged 6- to 24-month-old and evaluate its psychometric properties. Parents of 277 Persian children aged 6- to 24-month-old participated in the current study. A checklist was first developed after an extensive literature review and various psychometric analyses in addition to regression analyses were carried out to determine its validity and reliability. The final checklist contained 36 items with high face validity and content validity (CVI > 0.62, CVR > 0.79). Also, the checklist demonstrated a high association with the CNCS (Pearson’s correlation coefficient = 0.85, p < 0.001), and the construct validity showed significant differences between the four age groups (F-test = 197.881, p < 0.001). The results of the internal consistency measurement (Cronbach’s alpha coefficient = 0.952) and the test-retest reliability test (ICC = 0.933, p < 0.001) revealed excellent reliability of the checklist. In conclusion, based on the psychometric assessment, this checklist is a promising tool for assessing communication skills in Persian children aged 6 to 24 months.
... However, I see action initiated by the self as central to an understanding of the transition into language as it is to the Dynamic Systems Theory account of infant development more generally (Thelen & Smith, 1994; for discussion and illustration of how closely Dynamic Systems principles accord with this understanding of phonological development in the transition period, see Vihman, 2019, ch. 1;Vihman et al., 2016). These views also fit in well with embodied and associative learning approaches more generally. ...
Phonological memory, or the ability to remember a novel word string well enough to repeat it, has long been characterized as a time-limited store. An alternative embodiment model sees it as the product of the dynamic sensorimotor (perceptual and production) processes that inform responses to speech. Keren-Portnoy et al. (2010) demonstrated that this capacity, often tested through nonword repetition and found to predict lexical advance, is itself predicted by the first advances in babbling. Pursuing the idea that phonological memory develops through vocal production, we trace its development-drawing on illustrative data from children learning six languages-from the earliest adult-like vocalizations through to the first words and the consolidation of early words into an initial lexical network and more stable representational capacity. We suggest that it is the interaction of perceptual and production experience that mediates the mapping of new forms onto lexical representations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... Babbling is generally acknowledged to begin the development of the child's internal model (Locke, 1983;Guenther, 1994;Vihman et al., 2009;McAllister Byun et al., 2016). An optimal control system with fully-formed internal models would be expected to show monotonic improvement in achievement of task goals. ...
Among the first baby communications, we find the bimodal collaborations of gesture and word. In this chapter, we present some considerations on the types of baby gestures and propose a comprehensive observation category system of gestures and vocalizations, either as single signs or in bimodal compositions, based on a relational-semantic pattern. We believe that such a system could bridge the gap between the study of pregrammatical and grammatical communicative expressions. In fact, a multimodal combinatorial role semantics would be better able than other systems to reliably track how simple meanings begin to interweave with one another to express more complex contents within a single message. In this way, the transition from the original multimodal expression to the grammatical form, with its tendency toward unimodality, could be approached in a seamless and integrated manner. In addition, we highlight the pending challenge of expanding the margins of bimodal combinations. In this direction, we point out the need to enrich these categories with the creation of an ecological code of the baby’s integral communication resources, which includes both those signs effectively used in communicative acts and the behavioral variables that accompany these signs and guide the interlocutor with keys for their interpretation.KeywordsGesture Vocalization Bimodal combination Semantic roles Ecological multimodalitySocial cognition
Music-based speech language interventions have shown promise to support young children with autism, other speech and language deficits, and Dual Language Learners (also known as DLL, English Language Learners, or ELL). Online edtech learning programs may produce greater positive outcomes for children by including parents as mediators of the intervention. This study measured the preliminary effectiveness of Sing and Speak 4 Kids (SS4Kids), a music-based online speech and language development game, administered to 26 children ages 2–6 years old with or at risk for a diagnosis of autism, other speech and language deficits or DLL. The children were trained in early intervention settings across 4–6 sessions over a 2-week period in one of three group conditions: (a) teacher only in clinic; (b) parent only at home; (c) both teacher + parent. Measurement of verbal production of target words in pre- and post-training sessions showed that trained words significantly improved from pre-test to post-test Additionally, there was no effect of different group conditions (teacher only vs. parent only vs. both teacher + parent) on children's performance. Results suggest that the SS4Kids program is an effective music-based speech and language training method for supporting target word production in young children across a two week timespan. Importantly, the results also found that group conditions did not influence the improvement, confirming effectiveness of both clinic and home-based parent mediation. During a time when traditional in-person intervention services may be restricted, the current work provides cautious but emerging evidence of the effectiveness of an online edtech evidence-based practice to support the speech and language outcomes for a variety of children in early intervention.
While nativist linguistic theory readily captures the regular processes of adult language, it struggles to account for often-unwieldy data collected from children. Any theory of language must house both the predictable and unpredictable turns a linguistic system takes. Some usage-based theories make strides in accounting for connections between multiple linguistic factors contributing to linguistic representation. Dynamic systems theory (DST) is capable of describing the interaction between numerous factors both linguistic and extra-linguistic. Grounded in embodiment, DST accounts for continuity between bodily and cognitive processes, which together are crucial in understanding the development of language. Conceptualizing systems as self-organizing, DST allows for the emergence of novel forms alongside the predictable. Furthermore, DST explains both continuity between unexpected child forms and eventual target forms and also apparent discontinuity that gives the illusion of discrete developmental stages. To illustrate the advantages of DST in describing language processes, this paper presents data from one American English-acquiring child, which comes from a larger study investigating phonological development beginning at the onset of word production. The data demonstrate the role of phonological templates in development as part of a dynamic system, entailing the interaction between developing phonological categories, lexical representation, and linguistic environment.
Full-text available
Various types of phonological behavior have been identified as evidence of the systematization which is said to occur in the course of the transition from early, “whole-word” phonology to later, segment-based phonology. However, we have a limited understanding of the role of such early phonological behavior in facilitating — or initiating — the emergence of segmental phonology. Furthermore, there has been little acoustic verification of such changes in children's phonological systems. in this study, the lexical production of one child is analyzed in detail from the onset of word use at 10 months to 16 months, when she had a cumulative lexicon of over 70 words. A period of phonological experimentation and the emergence of productive “word recipes” are documented, using both perceptual and acoustic analysis. Implications of such systematization for the later development of segmental phonology are discussed.
Full-text available
Whole-word phonology is a particular approach to early phonological development. This volume is designed to bring together the classic papers which gave rise to it in the 1970s and current studies that build on and extend the model, which in essence took an emergentist and usage-based stance before its time; the book will make no attempt to cover other approaches to phonological development in any systematic way. Many of the papers, including Vihman and Croft (2007, this volume, Chapter 2), with which we begin, use the term “template” to refer to child-specific word patterns identifiable within the first year of word use. Templates, referred to sporadically in the earlier developmental literature (e.g., Menn 1983, this volume, Chapter 6) and given formal status for adult linguistic analyses in Prosodic Morphology (McCarthy and Prince 1995), are a more focused expression of the ideas formulated by Waterson (1971, this volume, Chapter 3), Ferguson and Farwell (1975, this volume, Chapter 4), and Macken (1979, this volume, Chapter 5), which provided the core of the whole-word phonology idea (see Vihman and Croft 2007, this volume, Chapter 2, for a summary of the basic arguments).
Full-text available
Little is known about infants' abilities to perceive and categorize their own speech sounds or vocalizations produced by other infants. In the present study, prebabbling infants were habituated to /i/ ("ee") or /a/ ("ah") vowels synthesized to simulate men, women, and children, and then were presented with new instances of the habituation vowel and a contrasting vowel on different trials, with all vowels simulating infant talkers. Infants showed greater recovery of interest to the contrasting vowel than to the habituation vowel, which demonstrates recognition of the habituation-vowel category when it was produced by an infant. A second experiment showed that encoding the vowel category and detecting the novel vowel required additional processing when infant vowels were included in the habituation set. Despite these added cognitive demands, infants demonstrated the ability to track vowel categories in a multitalker array that included infant talkers. These findings raise the possibility that young infants can categorize their own vocalizations, which has important implications for early vocal learning.
Full-text available
A central component of language development is word learning. One characterization of this process is that language learners discover objects and then look for word forms to associate with these objects (Mcnamara, ; Smith, ). Another possibility is that word forms themselves are also important, such that once learned, hearing a familiar word form will lead young word learners to look for an object to associate with it (Juscyzk, ). This research investigates the relative weighing of word forms and objects in early word-object associations using the anticipatory eye-movement paradigm (AEM; McMurray & Aslin, ). Eighteen-month-old infants and adults were taught novel word-object associations and then tested on ambiguous stimuli that pitted word forms and objects against each other. Results revealed a change in weighing of these components across development. For 18-month-old infants, word forms weighed more in early word-object associative learning, while for adults, objects were more salient. Our results suggest that infants preferentially use word forms to guide the process of word-object association.
Studies of phonological development that combine speech-processing experiments with observation and analysis of production remain rare, although production experience is necessarily relevant to developmental advance. Here we focus on three proposals regarding the relationship of production to word learning: (1) Articulatory filter: The hypothesis that children are influenced in noticing words in input speech by their resemblance to patterns they can produce has recently received experimental support. (2) Systematization and regression: It is proposed that the decline in accuracy that follows first-word production is the consequence of an increase in systematicity (with renewed accuracy emerging only later). (3) Word-production experience facilitates new word learning: Evidence that expressive vocabulary growth in itself facilitates new word learning supports the idea that knowledge is gradient, involving increases in stability and reliability with repeated exposure and use.
Development of initial-consonant production in relation to the acquisition of words is investigated. Longitudinal data of three children are analysed from about age 1;0 until the total recorded lexicon reaches 50 or more words. Using the word as the basis of analysis, PHONE CLASSES are set up for each child and are followed through time in PHONE TREES. As in historical sound change, both lexical and phonetic parameters are involved. Phonological idioms, saliency rules, universal order hypotheses, and acquisition strategies are discussed. Tentative suggestions are made toward a model of phonology.