ArticlePDF Available

Canalization of Language Structure From Environmental Constraints: A Computational Model of Word Learning From Multiple Cues

Authors:

Abstract and Figures

There is substantial variation in language experience, yet there is surprising similarity in the language structure acquired. Constraints on language structure may be external modulators that result in this canalization of language structure, or else they may derive from the broader, communicative environment in which language is acquired. In this paper, the latter perspective is tested for its adequacy in explaining robustness of language learning to environmental variation. A computational model of word learning from cross-situational, multimodal information was constructed and tested. Key to the model's robustness was the presence of multiple, individually unreliable information sources to support learning. This “degeneracy” in the language system has a detrimental effect on learning, compared to a noise-free environment, but has a critically important effect on acquisition of a canalized system that is resistant to environmental noise in communication.
Content may be subject to copyright.
1"
Canalization of language structure from environmental constraints:
A computational model of word learning from multiple cues
Padraic Monaghan
Lancaster University, UK
Max Planck Institute for Psycholinguistics, The Netherlands
Correspondence to:
Padraic Monaghan
Department of Psychology
Lancaster University
Lancaster LA1 4YF
UK
Tel: +44 1524 593813
Fax: +44 1524 593744
Email: p.monaghan@lancaster.ac.uk
2"
Abstract
There is substantial variation in language experience, yet there is surprising similarity in the language
structure acquired. Constraints on language structure may be external modulators that result in this
canalization of language structure, or else may derive from the broader, communicative environment in
which language is acquired. In this paper, the latter perspective is tested for its adequacy in explaining
robustness of language learning to environmental variation. A computational model of word learning
from cross-situational, multimodal information was constructed and tested. Key to the model’s
robustness was the presence of multiple, individually unreliable information sources to support learning.
This “degeneracy” in the language system has a detrimental effect on learning, compared to a noise-free
environment, but has a critically important effect on acquisition of a canalized system that is resistant to
environmental noise in communication.
Keywords: canalization; degeneracy; language acquisition; multiple cues; word learning; computational
modeling
3"
Canalization of Language Structure from Environmental Constraints: A Computational Model of Word
Learning from Multiple Cues
A key question in the cognitive sciences is how, despite the enormous variation in linguistic experience,
the language learner acquires broadly the same language structure, “within a fairly narrow range”
(Chomsky 2005). This perspective has led to proposals for mechanisms that ensure canalization of
language structure. Canalization refers to the means by which an individual’s language is structured to
support learning. Much of this structure may come from language itself, but there is growing realization
that multiple, rich sources of information within the environment, rather than internally to the individual,
may substantially constrain the language learning situation.
In biological evolution, canalization was once considered as a consequence of the natural
selection of mechanisms that operate to minimize phenotypic variation (Waddington, 1942). For
instance, Gallistel (1997) wrote of the default assumption in neuroscience that learning is a consequence
of specialized mechanisms that are implemented as “organs within the brain”. Yet, Wagner (1996)
demonstrated that selecting for canalizing regulators required a rate of mutation that is higher than that
observed in biological evolution, inconsistent with Gallistel’s (2000) suggestion of modular, domain-
specific learning constraints within the individual.
To address this problem, an alternative perspective developed which proposed that minimal
phenotypic variation, despite substantial environmental variation, is more likely to be stably achieved as
a consequence of interaction between multiple regulators as part of the developmental process of the
organism (Siegal & Bergman, 2002). In simulations of the operation of transcriptional regulators during
development, they found that the greater the interactivity between these sources the smaller the
phenotypic variation resulting from environmental variation. Thus, canalization is a consequence of the
process of development itself, realized through the effect of multiple interacting sources within the
system, rather than additional moderators that apply to development.
4"
An analogous perspective can be taken in canalization of social or cultural systems, such as
language, whereby increasing levels of interaction may increase the stability and optimal processing of
an information processing system (Bettencourt, 2009). Canalization, long conceived as being a
consequence of mechanisms that implement resistance to environmental variation can instead be the
outcome of interacting, multiple sources of information.
Recently, there has been reconsideration of the potential richness of the language environment to
support language learning. Instead of just focusing on the syntactic structure of utterances themselves,
there have been recent moves to consider the multiple information sources supporting the situated
language learner. For instance, supporting grammatical category acquisition, there is information from
the distributional structure of language in terms of co-occurrences of words (Redington, Chater, & Finch,
1998), but also substantial information from phonotactics and prosody distinguishing different
grammatical categories, such as distinct stress patterns on nouns compared to verbs (Monaghan,
Christiansen, & Chater, 2007). Furthermore, information about objects and actions within the child’s
purview may further constrain potential referents for words (Yurovsky, Smith, & Yu, 2013), providing
constraining information about the semantic features associated with particular categories.
There have been several accounts for how multiple cues may be combined to support learning. The
redundancy of different information sources may assist the child in increasing saliency of particularly
important information in their environment (Bahrick, Lickliter, & Flom, 2004). Alternatively, the cues
may operate summatively (Christiansen, Allen, & Seidenberg, 1998), or they may operate in a hierarchy,
such that if one cue is available then it is used in preference to other cues, which are relied upon only if
the preferred cues are unavailable (Mattys, White, & Melhorn, 2005).
An alternative perspective, consistent with views in biological evolution, is that the key function
of multiple cues for language learning is their interactivity, resulting in a system stable to variation in the
environment. This property of language is its “degeneracy”, defined as “the ability of elements that are
5"
structurally different to perform the same function or yield the same output” (Edelman & Garry, 2001).
This degeneracy affects not only acquisition – where presence or absence of particular cues will not
adversely affect the structure acquired – but also the robustness of the system once the language is
acquired, due to reduced dependency on any one information source. Computational models of
degeneracy in language and other complex systems have shown that it is important for robustness of
learning (Whitacre, 2010), permitting, for instance, effective processing of speech sounds against
background noise (Winter, 2014).
In this paper, a computational model of multiple interacting information sources is presented as a
proof of concept of degeneracy resulting in canalization of language structure. The domain of study is
word learning, where mappings have to be formed between words in utterances and the intended referent
in the communicative environment. This task is difficult, due to numerous possibilities for the target
candidate words in multi-word utterances and equally numerous, even infinite, possible referents in the
environment to which the target word may map (Quine, 1960). However, multiple cues are known to be
available for assisting in constraining this task. These are present both in the spoken language and in the
environment that surrounds the speaker and listener.
Within the spoken language itself, information about the grammatical roles for words can be
ascertained from distributional information, consequently reducing the number of words in the utterance
that need to be considered as the intended referring word. For instance, nouns are frequently preceded by
articles (the, a) and these also tend to succeed verbs. Use of such simple distributional information has
been shown to assist in determining word referent mappings (Monaghan & Mattock, 2012). Further
information for identifying the critical information in an utterance is also available from prosodic
information. When attempting to teach a child a new word, the speaker tends to increase the pitch
variation, intensity, and duration of the target word within the utterance (Fernald, 1991). Thus, within-
language cues provide valuable multiple information sources to assist in word-referent mappings.
6"
In addition, constraints within the environment also help to reduce uncertainty about potential referents.
One of these information cues is derived from cross-situational statistical information. Even though,
from a single learning trial, there may be several potential referents within the child’s environment given
an utterance, over multiple situations if whenever a particular word is heard in speech, the target referent
in the environment is also present, then the learner can increase their association between the target
word and target object (McMurray, Horst, & Samuelson, 2012). Such cross-situational learning (Yu &
Smith, 2012) can further be supplemented by information that the speaker uses to indicate the field of
reference. For instance, speakers tend to use deictic gestures (finger pointing or eye gaze) toward a
referent which is being described (Iverson, Capirci, Longobardi, & Caselli, 1999).
However, each of these cues on its own is insufficient to perfectly constrain learning. The word
succeeding an article is not always a noun – in English adjectives might intervene, and spontaneous
language is replete with false starts, and word sequencing errors. Similarly, the loudest word in speech is
not always the target word, or a novel word being learned by the listener, and gestural cues are not
always reliable. In Iverson et al.’s (1999) study they found that 15% of utterances were accompanied by
gestures indicating aspects of the immediate environment to direct children’s attention. However, such
unreliability has profound value for learning. Consider if the child always learned from a speaker who
reliably pointed to the intended referent. Then, if ever a situation arose where a referent was not gestured
towards, this could impair effective communication, because the cue may be relied upon as part of the
acquired word-referent mapping.
There are costs to including multiple cues in the learning situation, because this increases the
information required to be processed in each learning situation. So, the trade-off between the increased
strain on the cognitive systems required by processing of multiple as opposed to single, or no, cues and
the potential advantages of interacting information sources for learning must be examined. In particular,
the value of multiple information to support learning was tested, but also the importance of interaction
7"
among information sources in order to promote canalization – robustness of learning in the face of
environmental variation.
A computational model was constructed to test integration of multiple sources of information to
assist in learning relations between words and their referents. Two sets of simulations testing the model
were conducted. The first assessed the contribution of single cues to word learning. The hypothesis was
that adding cues to the input would assist in acquisition of the mapping – gestural cues assisting in
defining the referent, prosodic cues promoting identification of the reference, and distributional cues
supporting acquisition of both. However, the reliable presence of cues may also result in impaired ability
to identify the form-meaning mapping when the cue was no longer present.
The second set of simulations explored the role of multiple cues for learning. The prediction was
that multiple cues would further promote learning, but that “degenerate” cues would be most effect for
supporting not only effective acquisition but also robustness in the learning, immune from effects of
variability in the environment. Thus, a model trained with a degenerate environment should be able to
effectively map between words and referents even when environmental cues that support this mapping
are no longer available.
A Multimodal Model of Word Learning
Previous models of integration of multiple information sources in computational models of
language have been constructed in order to determine how informationally encapsulated each modality
is in processing (e.g., Plaut, 2002). The starting point for the current model used the hub-and-spoke
architecture, where information from different modalities is unconstrained in its integration. The model
then determines the optimal way in which information sources can cohere to support learning. The
model is closely based on a previous model of multimodal information integration in sentence
processing to simulate behaviour in the visual world paradigm (Smith, Monaghan, & Huettig, 2014;
8"
Smith, Monaghan, & Huettig, in press). This modeling approach has a central processing resource that is
connected to and from different sensory modalities, such as visual information about object identity, and
auditory information about spoken word forms. This modeling framework has been effective in
demonstrating how and when different information modalities interact in language processing, and how
the influence of different modalities on language processing derives from the nature of the
representations themselves, rather than requiring architectural assumptions to be imposed on the system.
The model used here is a simplification of this larger modeling enterprise, addressing the special
case of acquiring word-referent mappings. The model is compatible with previous associative models of
word learning (McMurray et al., 2012), as well as being broadly consistent with the principles of
statistical models of cross-situational word learning (Yu & Smith, 2012). The model therefore applies
these general modeling principles to explore the role of multiple information sources in facilitating, and
constraining word learning.
Figure 1. The multimodal integration model of word learning.
9"
Architecture
The model architecture is illustrated in Figure 1. The model is implemented as a recurrent
backpropagation neural network. It comprises a central hidden layer of 100 units which received
connections from various input modalities, and projected to a semantic layer output.
The phonological input represented two word slots, each of which contained 20 units. The visual input
contained two locations each comprising 20 units, where object representations were presented. The
semantic layer was composed of 100 units. For some simulations that included a distributional cue, the
model also received input from a distributional cue layer, which was composed of 2 units. The
integrative layer was also fully self-connected.
Representations
The model was trained to learn 100 words. Representations of each modality of a word was
encoded as a pseudopattern so that the properties of the relations between representations could be
controlled. The phonological representation of each word was composed of four phonemes, randomly
drawn from a set of 10 different phonemes. Each phoneme comprised 5 units, with 2 units active. The
visual representation of the word’s referent was constructed from 20 units with 8 units active for each
representation. The semantic representations were localist, such that one of the 100 units was active for
each of the words.
Fifty of the words were randomly assigned to one category, and the remaining fifty were
assigned to the other category, such that these categories could be defined by a distributional cue.
Training
The model was trained to learn to identify the meaning of the word referred to by an input
phonological and visual representation for all 100 words. Each trial was a simulation of a cross-
situational learning task, where two words and two objects were presented, but only one of the objects
10"
was named by one of the words (Monaghan & Mattock, 2012). The model had to learn to solve the task
by generating the correct semantic representation for the named object.
For each training trial, a word was randomly selected. Its phonological form was presented at
one of the two word slots in the phonological input (position was randomly chosen), and another
randomly selected word’s phonological form was presented at the other word slot. The object
representation of the word’s referent was presented at one of the two visual input positions (randomly
chosen) and another randomly selected visual representation was presented at the other visual input
position.
For the simulations with cues, gesture and prosody were implemented as intrinsic properties of
the visual and phonological input representations, respectively, by doubling the activation at the input of
the target visual object or the target phonological form. This had the effect of increasing the contribution
of the target representation within each representational modality to affect the activation state of the
integrative layer, and was a simulation of increased saliency of that representation (i.e., that a gestural
cue increases saliency of the target object, and prosodic cue is implemented as an increase in intensity,
duration, and pitch of the target spoken word). This is illustrated in Figure 1 as a highlighting of the
uppermost object and the first phonological representation as a consequence of gestural and prosodic
cues, respectively.
The distributional cue was implemented as an extrinsic cue. If the word was from the first
(randomly assigned) category then the first unit in the distributional layer was active, and if the word
was from the second category the second unit was active. This cue could therefore assist the model in
determining which was the target object and spoken word, but the cue did not operate within either of
these modalities.
11"
The simulations of single cues presented each learning trial with the cue present with 100%
reliability (see Table 1). The simulations of multiple cues varied the extent to which the cues were
reliabily present in each learning situation, from .25, through to 1.0 reliability.
Table 1. Proportion of training trials with each cue according to condition.
Condition
Distributional
Cue
Prosodic Cue
Gestural Cue
No Cue
0
0
0
Single Cues
Dist Cue
1
0
0
Prosodic Cue
0
1
0
Gestural Cue
0
0
1
Combined Cues
.25 reliability
.25
.25
.25
.50 reliability
.50
.50
.50
.75 reliability
.75
.75
.75
1.00 reliability
1
1
1
Activation cycled in the model for 6 time steps. At time step one, the visual and phonological
inputs were presented. For two time steps activation passed from the input to the integrative layer and
from the integrative layer to the semantic layer, and from the integrative layer to itself. At time steps 3 to
6 the target semantic representation was presented at the semantic output layer, and activation continued
to cycle around the model. The model was trained with continuous recurrent backpropagation through
12"
time (Pearlmutter, 1989) with error determined by sum squared error of the difference between the
actual and target semantic representations. In one epoch of training, each of the 100 words occurred
once as the target. The model was trained up to 100,000 epochs.
Twenty versions of the model with different pseudopattern representations, different randomized
starting weights, and different randomized ordering of training patterns were run.
Testing
The model’s performance was assessed during training on its ability to produce the target
semantic representation for each phonological and visual input. If the activity of the semantic unit
corresponding to the target word was more active than any of the other units in the semantic layer, then
the model was determined to be accurate.
Accuracy during training was assessed, and also the number of epochs at which the model was
able to accurately detect all 100 words for five consecutive epochs.
At the end of training, the robustness of the model’s learning was assessed by measuring its
accuracy when no cues were present during testing.
Results
Single Cues. The model’s accuracy during training when no cues or single cues were present is shown
in Figure 2.
An ANOVA with time taken to reach criterion as dependent variable, and cue condition (no cue,
distributional cue, prosodic cue, gestural cue) as within subjects factor was conducted to test whether the
model learned differently according to the presence of cues. The result was significant, F(3, 57) = 70722,
p < .001, ηp2 = 1.00. Post hoc tests revealed that the model learned to criterion more quickly for the
prosodic cue (mean epochs = 35,800, SD = 1,005), and gestural cue (mean = 35,650, SD = 745)
conditions than the no cue condition which had not reached criterion by 100,000 epochs (mean
proportion correct was .96), both p < .001. Though the trajectory of learning was distinct as shown in
13"
Figure 2, the effect of distributional cues was smaller, and not significantly different in time to criterion
compared to the no cue condition (mean proportion correct after 100,000 epochs was .99). The prosodic
and gestural cues supported learning more than the distributional cue, both p < .001, but there was no
statistical difference in speed of learning from the prosodic and gestural cues, p = 1.
Figure 2. Accuracy during training for the single cues conditions, compared to no cue condition.
The robustness of the model’s learning to omission of cues during testing is shown in Figure 3.
An ANOVA on accuracy in the post-learning test with no cues present, and cue condition as within
subjects factor was significant, F(3, 57) = 8.982, p < .001, ηp2 = .321. Post hoc tests showed that the
!"
!#$"
!#%"
!#&"
!#'"
("
)!!!!"
%!!!!"
+!!!!"
,!!!!"
-./0/.1/2"3/..456"
7.89292:";0/5<"
=/"3>4?"
@9?6"3>4"
-./?/A95"3>4"
B4?6>.8C"3>4"
14"
distributional cue did not significantly affect robustness of learning compared to the no cue condition, p
= .284, however, the prosodic and gestural cue both resulted in poorer performance than the no cue
condition, both p < .001. The gestural cue resulted in more robust learning than the prosodic cue, p
= .001, but these conditions did not differ significantly from the distributional cue condition, both p = 1.
Next, the difference between the intrinsic cue conditions (prosodic and gestural cues) was
assessed to determine if this was due to their quicker acquisition. Every model was trained to the same
number of training trials (100,000) then tested robustness of learning. The results were similar. Even
with more training, the effect of a single, reliable intrinsic cue was detrimental to the model’s ability to
map between form and meaning when the cue was not present, F(3, 48) = 45.62, p < .001, ηp2 = .740.
Prosodic and gestural cues were now not significantly different than one another, p = .423, but were both
significantly different than no cue and the distributional cue conditions, all p < .001.
Figure 3. Accuracy after training for the single cues conditions, when no cues are present during testing.
!"
!#$"
!#%"
!#&"
!#'"
("
)*"+,-." /0*.*12+"3,-" 4-.5,0-"3,-" 62.5"3,-"
/0*7*08*)"3*00-+5""
90:2)2);"3*)128*)"
15"
Multiple Cues. The model’s accuracy during training for combined cues with different levels of
reliability is shown in Figure 4. For time taken to reach training criterion, an ANOVA indicated that
combined cues with different reliability resulted in a significant effect on speed of learning, F(4, 76) =
3855, p < .001, ηp2 = .99. Post hoc tests indicated that the no cue and the .25 cue reliability condition
were significant slower in learning than the .50 condition, both p < .001, which was in turn slower than
the .75 condition, p < .001, which was in slower than the 1.00 perfect reliability multiple cue condition,
p < .001. Thus, as anticipated, the greater the reliability of information, the faster the model learned to
map between forms and meanings.
The robustness of learning was also compared between these conditions. The results are shown
in Figure 5. An ANOVA demonstrated that the robustness of performance at testing was affected by the
cues presented during training, F(4, 76) = 2.953, p = .025, ηp2 = .135. Post hoc tests revealed that the no
cue and 0.50, 0.75, and 1.00 cue conditions were significantly different, all p < .001. The 0.25 cue
condition was not significantly different than any other condition, all p .718. As reliability increased
from 0.50 to 0.75, the robustness of the model declined, p < .001, and similarly declined from 0.75 to
1.00 reliabiliity, p < .001. Thus, low reliability of cues did not seem to assist in learning quickly or
robustly, but once individual cues appeared at least half the time, increasing further the reliability of the
cues began to reduce the resistance of the model to the absence of cues after training.
16"
Figure 4. Accuracy during training for the multiple cue conditions, compared to no cue condition.
Figure 5. Accuracy after training for the multiple cue conditions, when no cues are present during testing.
!"
!#$"
!#%"
!#&"
!#'"
("
!"
(!!!!"
$!!!!"
)!!!!"
%!!!!"
*!!!!"
&!!!!"
+!!!!"
'!!!!"
,!!!!"
(!!!!!"
-./0/.1/2"3/..456"
7.89292:";0/5<"
=/"3>4?"
@AA"3>4?"$*B"
@AA"3>4?"*!B"
@AA"3>4?"+*B"
@AA"3>4?"(!!B"
!"
!#$"
!#%"
!#&"
!#'"
("
)*"+,-." !#$/" !#/!" !#0/" (#!!"
12*3*24*)"5*22-+6""
7289)9):"5*);94*)"
17"
Discussion
Language learning occurs in situations where multiple, interacting sources of information are
available to support the learning. However, attending to multiple cues increases the processing load on
the individual. However, this degeneracy in language results in two important advantages for the
language learning system.
First, adding a combination of cues improves the speed and accuracy of learning to map between
representations. Providing some guiding information about the intended object in a scene containing
more than one referent and information about the intended reference in an utterance containing more
than one word, along with additional information about the general category of the target, improves
performance. Even when the individual cues occur only 50% of the time, this still resulted in a
significant advantage for acquisition of form-meaning mappings compared to no cues being present at
all.
The second advantage of the degeneracy of language is that the learning that is acquired from a
degenerate environment is highly robust (Ay, Flack, & Krakauer, 2007), and the model was able to
make use of cues even when they were variable in their presence across communicative situations.
However, this multiple cue advantage for robustness was only observed when there was noise in the
environment: When the cues occurred with perfect reliability then, even though learning was optimal in
speed, the acquired system was brittle and prone to error under suboptimal subsequent conditions. Thus,
canalization of language structure can be conceived of as a consequence of the interaction of multiple
information sources for learning. There is therefore a trade-off between speed of initial learning, and the
robustness of that learning. The former is supported by perfectly reliable information, and more
information resulted in better and better learning. The latter is supported by multiple information sources,
but with each individual source being somewhat noisy. The precise point of this trade-off is an issue for
18"
further exploration in computational systems, to determine the extent to which natural language
environments are optimally designed for acquisition.
Chomsky (2005) wrote of the problem of canalization as a “fair description of the growth of
language in the individual”, in that “a core problem of the faculty of language is to discover the
mechanisms that limit outcomes to ‘optimal types’ ”, referring to the constraints of syntax. The current
simulations demonstrate for word learning that these constraints may not be language-specific
mechanisms within the learner, but rather the response of a general-purpose learning system that
produces constraints as a consequence of integration of multiple cues from the environment. But do
these principles of word learning apply also to acquisition of syntax, which has largely been the domain
in which the problem of canalization has been discussed (e.g., Chomsky, 2005; Newmeyer, 2004)?
The observation that speed and accuracy of word learning is promoted by multiple cues is consistent
with several current accounts of multiple cue integration for various language learning tasks (see
Monaghan & Rowland, in press, for a review). For instance, for speech segmentation, Monaghan, White,
and Merkx (2013) assessed the acoustic properties of speech to identify multiple prosodic cues that can
combine to promote identification of words. In the same domain, Christiansen et al. (1998) provided a
computational model that demonstrated how multiple cues boost segmentation. Relatedly, Mattys et al.
(2005) presented a theoretical model for how multiple cues from speech may cohere to promote speech
recognition. Similarly, for learning word-referent mappings, the current multiple cues model is
consistent with Bahrick et al.’s (2004) model of intersensory redundancy for word learning, where
multiple cues are vital for guiding the child toward informative properties of the environment.
These language learning tasks all concern identification of information that could potentially be
processed using associative learning mechanisms (Yu & Smith, 2012), so the question remains whether
there is evidence that language structure in terms of morphology and syntax could also be constrained by
multiple cues. Again, there is converging evidence for these language learning tasks that multiple cues
19"
play a key role – not only co-occurrence constraints between morphemes or between words (Fries, 1952),
but also phonological and prosodic properties of words can constrain identification of grammatical
categories of words (Kelly, 1992; Monaghan et al., 2007) and facilitate learning of non-adjacent
dependencies (Newport & Aslin, 2004). It is of course possible that a completely different process
applies for syntax acquisition than learning all other aspects of language, but another starting point to
acquisition is that such constraints emerge from the same general statistical learning mechanisms:
learning of structure of words, grammatical categories, and syntax are not distinct processes (Frost &
Monaghan, 2016).
Nevertheless, a common feature of all these multiple cue studies of language learning is that they
would predict the growing advantage of learning as cues increase in reliability, as observed in the
current simulations. The simulations presented here suggest that, rather than canalization being a
challenge in the face of environmental variation, it is instead a primary consequence of this variation in a
system that is able to integrate multiple information sources.
20"
Acknowledgments
This work was supported by the International Centre for Language and Communicative Development
(LuCiD) at Lancaster University, funded by the Economic and Social Research Council (UK)
[ES/L008955/1].
21"
References
Ay, N., Flack, J., & Krakauer, D. (2007). Robustness and complexity co-constructed in multimodal
signalling networks Philosophical Transactions of the Royal Society B: Biological Sciences, 362
(1479), 441-447.
Bahrick, L. E., Lickliter, R., & Flom, R. (2004). Intersensory redundancy guides the development of
selective attention, perception, and cognition in infancy. Current Directions in Psychological Science,
13, 99-102.
Bettencourt, L. M. A. (2009). The rules of information aggregation and emergence of collective
intelligent behavior. Topics in Cognitive Science, 1, 598–620.
Christiansen, M.H., Allen, J. & Seidenberg, M.S. (1998). Learning to segment speech using multiple
cues: A connectionist model. Language and Cognitive Processes, 13, 221-268.
Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36, 1-22.
Edelman, G., & Gally, J. (2001). Degeneracy and complexity in biological systems. Proceedings of the
National Academy of Sciences, 98 (24), 13763-13768.
Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic functions. Annals of
Child Development, 8, 43-80.
Fries, C. C. (1952). The structure of English. London: Longmans.
Frost, R. L. A., & Monaghan, P. (2016). Simultaneous segmentation and generalisation of non-adjacent
dependencies from continuous speech. Cognition, 147, 70-74.
Gallistel, C. R. (1997). Neurons and memory. In Gazzaniga, M. S. (Ed.), Conversations in the cognitive
neurosciences (pp.71-89). Cambridge, MA: MIT Press.
Gallistel, C. R. (2000). The replacement of general-purpose learning models with adaptively specialized
learning modules. In Gazzaniga, M.S. (Ed.), The cognitive neurosciences, 2nd ed (pp.1179-1191).
Cambridge, MA: MIT Press.
22"
Iverson, J. M., Capirci, O., Longobardi, E., & Caselli, M. C. (1999). Gesturing in mother-child
interactions. Cognitive Development, 14, 57–75.
Mattys, S. L., White, L., & Melhorn, J. F. (2005). Integration of multiple segmentation cues: A
hierarchical framework. Journal of Experimental Psychology: General, 134, 477–500.
McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the interaction of
online referent selection and slow associative learning. Psychological Review, 119(4), 831-877.
Monaghan, P., Christiansen, M. H., & Chater, N. (2007). The Phonological Distributional coherence
Hypothesis: Cross-linguistic evidence in language acquisition. Cognitive Psychology, 55, 259-305.
Monaghan, P. & Mattock, K. (2012). Integrating constraints for learning word-referent mappings.
Cognition, 123, 133-143.
Monaghan, P., & Rowland, C. (in press). Combining language corpora with experimental and
computational approaches for language acquisition research. Language Learning, in press.
Monaghan, P., White, L., & Merkx, M. (2013). Disambiguating durational cues for speech segmentation.
Journal of the Acoustical Society of America, 134, EL45-EL51.
Newmeyer, F.J. (2004). Against a parameter-setting approach to typological variation. Linguistic
Variation Yearbook, 4, 181-234.
Newport, E.L., & Aslin, R.N. (2004). Learning at a distance I. Statistical learning of nonadjacent
dependencies. Cognitive Psychology, 48, 127-162.
Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent neural networks. Neural
Computation, 1, 263-269.
Plaut, D. C. (2002). Graded modality-specific specialization in semantics: A computational account of
optic aphasia. Cognitive Neuropsychology, 19, pp 603-639.
Quine, W.V.O. (1960). Word and object. Cambridge, MA: MIT Press.
23"
Siegal, M. L., & Bergman, A. (2002). Waddington's canalization revisited: developmental stability and
evolution. Proceedings of the National Academy of Sciences, 99(16), 10528-10532.
Smith, A.C., Monaghan, P., & Huettig, F. (2014). Literacy effects on language and vision: Emergent
effects from an amodal shared resource (ASR) computational model. Cognitive Psychology, 75, 28-
54.
Smith, A.C., Monaghan, P., & Huettig, F. (in press). The multimodal nature of spoken word processing
in the visual world: Testing the predictions of alternative models of multimodal integration. Journal
of Memory and Language, in press.
Waddington, C. H. (1942). Canalization of development and the inheritance of acquired characters.
Nature, 150(3811), 563-565.
Wagner, A. (1996). Does evolutionary plasticity evolve? Evolution, 50, 1008-1023.
Whitacre, J. (2010). Degeneracy: a link between evolvability, robustness and complexity in biological
systems Theoretical Biology and Medical Modelling, 7, 6.
Winter, B. (2014). Spoken language achieves robustness and evolvability by exploiting degeneracy and
neutrality. BioEssays, 36(10), 960-967.
Yang, C. (2002). Knowledge and learning in natural language. Oxford, UK: Oxford University Press.
Yu, C., & Smith, L. B. (2012). Modeling cross-situational word–referent learning: Prior questions.
Psychological Review, 119(1), 21-39.
Yurovsky, D., Smith, L. B. & Yu, C. (2013). Statistical word learning at scale: The baby's view is better.
Developmental Science, 16, 959-966.
... For example, melody facilitates learning of lyrics (Thiessen & Saffran, 2009) and tactile cues facilitate statistical learning of tone co-occurrence patterns (Lew-Williams, Ferguson, Abu-Zhaya, & Seidl, 2019). A recent computational model demonstrated that multimodal cues can benefit mappings between forms and meanings (Monaghan, 2017). The cues were probabilistic in the learning phase (appeared only some of the time), and were absent during testing (where only the labels appeared). ...
... This highlights an important feature of the language learned in the current study: The redundant cues were deterministic, they were always present (perfectly available, using the terminology of the Competition Model). Importantly, however, redundant cues in natural languages are often probabilistic (E. Bates & MacWhinney, 1989;Levshina, 2020;Monaghan, 2017;Monaghan, Brand, & Frost, 2017;Monaghan & Christiansen, 2008). In fact, it has been argued that part of the reason that languages have redundant cues is to compensate for the fact that individual cues are often not deterministic (Monaghan, 2017;Monaghan et al., 2017). ...
... Importantly, however, redundant cues in natural languages are often probabilistic (E. Bates & MacWhinney, 1989;Levshina, 2020;Monaghan, 2017;Monaghan, Brand, & Frost, 2017;Monaghan & Christiansen, 2008). In fact, it has been argued that part of the reason that languages have redundant cues is to compensate for the fact that individual cues are often not deterministic (Monaghan, 2017;Monaghan et al., 2017). In line with this reasoning, when redundant cues appear in learning but not in testing, learning is better facilitated when the cues are probabilistic rather than deterministic (Monaghan, 2017;Monaghan et al., 2017). ...
Preprint
The prevalence of redundancy in the world languages has long puzzled language researchers. It is especially surprising in light of the growing evidence on speakers' tendency to avoid redundant elements in production (omitting or reducing more predictable elements). Here, we propose that redundancy can be functional for learning. In particular, we argue that redundant cues can facilitate learning, even when they make the language system more complicated. This prediction is further motivated by the Linguistic Niche Hypothesis (Lupyan & Dale, 2010), which suggests that morphological complexity can arise due to the advantage redundancy might confer for child learners. We test these hypotheses in an artificial language learning study with children and adults, where either word order alone or both word order and case marking serve as cues for thematic assignment in a novel construction. We predict, and find, that children learning the redundant language learn to produce it, and show better comprehension of the novel thematic assignment than children learning the non-redundant language, despite having to learn an additional morpheme. Children in both conditions were similarly accurate in producing the novel word order, suggesting redundancy might have a differential effect on comprehension and production. Adults did not show better learning in the redundant condition, most likely because they were at ceiling in both conditions. We discuss implications for theories of language learning and language change.
... For example, melody facilitates learning of lyrics (Thiessen & Saffran, 2009) and tactile cues facilitate statistical learning of tone co-occurrence patterns (Lew-Williams, Ferguson, Abu-Zhaya, & Seidl, 2019). A recent computational model demonstrated that multimodal cues can benefit mappings between forms and meanings (Monaghan, 2017). The cues were probabilistic in the learning phase (appeared only some of the time), and were absent during testing (where only the labels appeared). ...
... This highlights an important feature of the language learned in the current study: The redundant cues were deterministic, they were always present (perfectly available, using the terminology of the Competition Model). Importantly, however, redundant cues in natural languages are often probabilistic (E. Bates & MacWhinney, 1989;Levshina, 2020;Monaghan, 2017;Monaghan, Brand, & Frost, 2017;Monaghan & Christiansen, 2008). In fact, it has been argued that part of the reason that languages have redundant cues is to compensate for the fact that individual cues are often not deterministic (Monaghan, 2017;Monaghan et al., 2017). ...
... Importantly, however, redundant cues in natural languages are often probabilistic (E. Bates & MacWhinney, 1989;Levshina, 2020;Monaghan, 2017;Monaghan, Brand, & Frost, 2017;Monaghan & Christiansen, 2008). In fact, it has been argued that part of the reason that languages have redundant cues is to compensate for the fact that individual cues are often not deterministic (Monaghan, 2017;Monaghan et al., 2017). In line with this reasoning, when redundant cues appear in learning but not in testing, learning is better facilitated when the cues are probabilistic rather than deterministic (Monaghan, 2017;Monaghan et al., 2017). ...
Article
The prevalence of redundancy in the world languages has long puzzled language researchers. It is especially surprising in light of the growing evidence on speakers' tendency to avoid redundant elements in production (omitting or reducing more predictable elements). Here, we propose that redundancy can be functional for learning. In particular, we argue that redundant cues can facilitate learning, even when they make the language system more complicated. This prediction is further motivated by the Linguistic Niche Hypothesis (Lupyan & Dale, 2010), which suggests that morphological complexity can arise due to the advantage redundancy might confer for child learners. We test these hypotheses in an artificial language learning study with children and adults, where either word order alone or both word order and case marking serve as cues for thematic assignment in a novel construction. We predict, and find, that children learning the redundant language learn to produce it, and show better comprehension of the novel thematic assignment than children learning the non-redundant language, despite having to learn an additional morpheme. Children in both conditions were similarly accurate in producing the novel word order, suggesting redundancy might have a differential effect on comprehension and production. Adults did not show better learning in the redundant condition, most likely because they were at ceiling in both conditions. We discuss implications for theories of language learning and language change.
... Importantly, this variability may actually be useful. In a computational model of word learning, Monaghan (2017) developed the multimodal integration model (MIM; Smith et al. 2017) to explore the role of multiple cues-distributional, prosodic, and gestural-in supporting language acquisition. The model was trained to learn word-object pairings when words and objects were presented among multiple possibilities and when cues were present or absent. ...
... Although learning benefited from all cues, learning was more efficient and more accurate when cues occurred 75% of the time, rather than when they were present 100% of the time (Monaghan, 2017). This was confirmed in behavioural studies with adults . ...
... In this paper, we examined how environmental variability might affect word learning by testing the contingency of caregiver gesture use to support word learning under referential uncertainty. We first adapted an established computational model of word learning (MIM; Monaghan, 2017) to test the benefit of contingent gestural ...
Article
Full-text available
Children learn words in environments where there is considerable variability, both in terms of the number of possible referents for novel words, and the availability of cues to support word‐referent mappings. How caregivers adapt their gestural cues to referential uncertainty has not yet been explored. We tested a computational model of cross‐situational word learning that examined the value of a variable gesture cue during training across conditions of varying referential uncertainty. We found that gesture had a greater benefit for referential uncertainty, but unexpectedly also found that learning was best when there was variability in both the environment (number of referents) and gestural cue use. We demonstrated that these results are reflected behaviourally in an experimental word learning study involving children aged 18‐24‐month‐olds and their caregivers. Under similar conditions to the computational model, caregivers not only used gesture more when there were more potential referents for novel words, but children also learned best when there was some referential ambiguity for words. Thus, caregivers are sensitive to referential uncertainty in the environment and adapt their gestures accordingly, and children are able to respond to environmental variability to learn more robustly. These results imply that training under variable circumstances may actually benefit learning, rather than hinder it.
... However, this is no easy feat, since there are no perfectly reliable cues that learners can draw upon Lehiste, 1970). Instead, children must look to a broad range of imperfect, probabilistic cues (e.g., stress patterns, phonotactic and allophonic regularities, and information about syllable co-occurrences), and use these in combination (Monaghan, 2017). Importantly, each language differs in the availability and likely combination of cues for segmentation, meaning each solution will necessarily be language-specific (see Cutler, 2012). ...
... Kidd et al. (2012) argue that infants demonstrate a 'Goldilocks' effect, such that they prefer to attend to events that are neither highly predictable nor unpredictable, thus avoiding making generalisations that are either too simple or too complex. A recent computational model of word learning suggests that cue variability may indeed serve to help, rather than hinder, learningguiding the creation of a robust, canalised language system that is resistant to noise in the input (Monaghan, 2017). This possible utility of noise in learning is underpinned by the principle that variation in the availability and reliability of distributional cues may encourage learners to seek guidance from multiple possible information sources, reducing the importance of a particular individual cue, and increasing the resilience of the language system to noise. ...
Article
Full-text available
To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.
... Existing theory and research around language acquisition indicates that the word web method of strengthening connections between word sound and word meaning cues is more robust when dynamic multisensory, multitemporal cues are used rather than static unimodal input (Yu and Ballard, 2007;Monaghan, 2017). Hollich et al (2000) propose a coalition language acquisition model where multiple sources, such as perceptual salience, prosodic cue, eye gaze, social context, syntactic cues, and temporal contiguity, combine. ...
Preprint
Full-text available
Around ten percent of all children could have a disorder where language does not develop as expected. This often effects vocabulary skills, i.e., finding the words to express wants, needs and ideas, which can influence behaviours linked to wellbeing and daily functioning, such as concentration, independence, social interactions and managing emotions. Without specialist support, needs can increase in severity and continue to adulthood. The type of support, known as interventions showing strongest evidence for improving vocabulary with some signs of improved behaviour and wellbeing are ones that use word-webs. These are diagrams consisting of lines that connect sound and meaning information about a word to strengthen the child's word knowledge and use. The diagrams resemble what is commonly known as mind-maps and are widely used by Speech and Language Therapists in partnership with schools to help children with language difficulties. In addition, interventions delivered through mobile-devices has led in some cases to increased vocabulary gains with positive influence on wellbeing and academic attainment. With advances in technology and the availability of user-friendly mobile devices to capture, combine and replay multimedia content, new opportunities for designing bespoke vocabulary instruction have emerged that are without timing and location constraints. This brings the potential to engage and motivate users and harbour independence through functional strategies that support each child's unique language needs. To achieve this, children with language disorder, their parents/carers, support professionals and software development team members must work jointly to create an intervention that is fit for purpose. This is the first research planned to explore the collaborative development and acceptability of a digitally enhanced vocabulary intervention for child language disorder.
... Cue degeneracy also contributes to evolvability (Winter, 2014) and learnability (Tal & Arnon, 2022) of language. The finding that pausing, prosody, syntax and lexical features can all serve to signal chunk boundaries despite being structurally different and simultaneously performing other functions adds to the growing body of literature on functional degeneracy and syntagmatic redundancy of cues in natural language (Leufkens, 2020;Monaghan, 2017;Pijpops & Zehentner, 2022). We also found substantial variation in the magnitude of the effects both across listeners and extracts, suggesting that listeners vary in the extent to which they rely on different cues and extracts vary in the extent to which different cues are reliable predictors of chunk boundaries. ...
Article
There have been some suggestions in linguistics and cognitive science that humans process continuous speech by routinely chunking it up into smaller units. The nature of the process is open to debate, which is complicated by the apparent existence of two entirely different chunking processes, both of which seem to be warranted by the limitations of working memory. To overcome them, humans seem to both combine items into larger units for future retrieval (usage-based chunking), and partition incoming streams into temporal groups (perceptual chunking). To determine linguistic properties and cognitive constraints of perceptual chunking, most previous research has employed short-constructed stimuli modeled on written language. In contrast, we presented linguistically naïve listeners with excerpts of natural speech from corpora and collected their intuitive perceptions of chunk boundaries. We then used mixed-effects logistic regression models to find out to what extent pauses, prosody, syntax, chunk duration, and surprisal predict chunk boundary perception. The results showed that all cues were important, suggesting cue degeneracy, but with substantial variation across listeners and speech excerpts. Chunk duration had a strong effect, supporting the cognitive constraint hypothesis. The direction of the surprisal effect supported the distinction between perceptual and usage-based chunking.
Article
Full-text available
Redundant marking of grammatical relations seems to be commonplace across languages, and has been shown to benefit learning as well as robust information transmission. At the same time, languages also exhibit trade-offs between strategies such as case marking or word order, suggesting that redundancy may also be dis-preferred in line with a tendency towards communicative efficiency. In the present paper, we assess redundancy in terms of number of strategies used simultaneously to mark specific relations within individual utterances (syntagmatic redundancy) in light of these competing motivations. Our test case is participant role disambiguation in English and Dutch, specifically the interaction of constituent order, case, prepositional marking, and agreement to distinguish agents and recipients in ditransitive clauses. Using evidence from corpora of Present Day Dutch and English as well as data from Middle English, we find that redundancy is prevalent, albeit within certain limits.
Article
Learning a language with complex morphology poses a challenge to language learners, especially adults, who may need to acquire unfamiliar grammatical categories. One possible advantage to languages with complex morphology is that the morphology could provide cues to word meaning. The hypothesis that morphology can bootstrap adult word learning is tested across four cross-situational word learning experiments. Adult learners were exposed to words from a novel language with CVCV stems and -CV suffixes. In the Experimental conditions, the suffixes consistently mapped to semantic categories (e.g. [-ke] for fruits). In the Control condition, the suffixes did not provide any consistent semantic information. Participants in the Experimental conditions outperformed participants in the Control conditions, but only when there were sufficient opportunities to infer the morphology in the initial learning phases. These results highlight adults’ ability to rapidly learn novel morphological information, and use this information in word learning.
Article
Full-text available
High frequency words play a key role in language acquisition, with recent work suggesting they may serve both speech segmentation and lexical categorisation. However, it is not yet known whether infants can detect novel high frequency words in continuous speech, nor whether they can use them to help learning for segmentation and categorisation at the same time. For instance, when hearing “you eat the biscuit”, can children use the high-frequency words “you” and “the” to segment out “eat” and “biscuit”, and determine their respective lexical categories? We tested this in two experiments. In Experiment 1, we familiarised 12-month-old infants with continuous artificial speech comprising repetitions of target words , which were preceded by high-frequency marker words that distinguished the targets into two distributional categories. In Experiment 2, we repeated the task using the same language but with additional phonological cues to word and category structure. In both studies, we measured learning with head-turn preference tests of segmentation and categorisation, and compared performance against a control group that heard the artificial speech without the marker words (i.e., just the targets). There was no evidence that high frequency words helped either speech segmentation or grammatical categorisation. However, segmentation was seen to improve when the distributional information was supplemented with phonological cues (Experiment 2). In both experiments, exploratory analysis indicated that infants’ looking behaviour was related to their linguistic maturity (indexed by infants’ vocabulary scores) with infants with high versus low vocabulary scores displaying novelty and familiarity preferences, respectively. We propose that high-frequency words must reach a critical threshold of familiarity before they can be of significant benefit to learning.
Article
Full-text available
Historically, first language acquisition research was a painstaking process of observation, requiring the laborious hand coding of children's linguistic productions, followed by the generation of abstract theoretical proposals for how the developmental process unfolds. Recently, the ability to collect large-scale corpora of children's language exposure has revolutionized the field. New techniques enable more precise measurements of children's actual language input, and these corpora constrain computational and cognitive theories of language development, which can then generate predictions about learning behavior. We describe several instances where corpus, computational, and experimental work have been productively combined to uncover the first language acquisition process and the richness of multimodal properties of the environment, highlighting how these methods can be extended to address related issues in second language research. Finally, we outline some of the difficulties that can be encountered when applying multimethod approaches and show how these difficulties can be obviated.
Article
Full-text available
Language learning requires mastering multiple tasks, including segmenting speech to identify words, and learning the syntactic role of these words within sentences. A key question in language acquisition research is the extent to which these tasks are sequential or successive, and consequently whether they may be driven by distinct or similar computations. We explored a classic artificial language learning paradigm, where the language structure is defined in terms of non-adjacent dependencies. We show that participants are able to use the same statistical information at the same time to segment continuous speech to both identify words and to generalise over the structure, when the generalisations were over novel speech that the participants had not previously experienced. We suggest that, in the absence of evidence to the contrary, the most economical explanation for the effects is that speech segmentation and grammatical generalisation are dependent on similar statistical processing mechanisms.
Article
During the development of a multicellular organism from a zygote, a large number of epigenetic interactions take place on every level of suborganismal organization. This raises the possibility that the system of epigenetic interactions may compensate or "buffer" some of the changes that occur as mutations on its lowest levels, and thus stabilize the phenotype with respect to mutations. This hypothetical phenomenon will be called "epigenetic stability." Its potential importance stems from the fact that phenotypic variation with a genetic basis is an essential prerequisite for evolution. Thus, variation in epigenetic stability might profoundly affect attainable rates of evolution. While representing a systemic property of a developmental system, epigenetic stability might itself be genetically determined and thus be subject to evolutionary change. Whether or not this is the case should ideally be answered directly, that is, by experimentation. The time scale involved and our insufficient quantitative understanding of developmental pathways will probably preclude such an approach in the foreseeable future. Preliminary answers are sought here by using a biochemically motivated model of a small but central part of a developmental pathway. Modeled are sets of transcriptional regulators that mutually regulate each other's expression and thereby form stable gene expression patterns. Such gene-expression patterns, crucially involved in determining developmental pattern formation events, are most likely subject to strong stabilizing natural selection. After long periods of stabilizing selection, the fraction of mutations causing changes in gene-expression patterns is substantially reduced in the model. Epigenetic stability has increased. This phenomenon is found for widely varying regulatory scenarios among transcription factor genes. It is discussed that only epistatic (nonlinear) gene interactions can cause such change in epigenetic stability. Evidence from paleontology, molecular evolution, development, and genetics, consistent with the existence of variation in epigenetic stability, is discussed. The relation of epigenetic stability to developmental canalization is outlined. Experimental scenarios are suggested that may provide further evidence.
Article
Ambiguity in natural language is ubiquitous, yet spoken communication is effective due to integration of information carried in the speech signal with information available in the surrounding multimodal landscape. Language mediated visual attention requires visual and linguistic information integration and has thus been used to examine properties of the architecture supporting multimodal processing during spoken language comprehension. In this paper we test predictions generated by alternative models of this multimodal system. A model (TRACE) in which multimodal information is combined at the point of the lexical representations of words generated predictions of a stronger effect of phonological rhyme relative to semantic and visual information on gaze behaviour, whereas a model in which sub-lexical information can interact across modalities (MIM) predicted a greater influence of visual and semantic information, compared to phonological rhyme. Two visual world experiments designed to test these predictions offer support for sub-lexical multimodal interaction during online language processing.
Article
The biolinguistic perspective regards the language faculty as an "organ of the body," along with other cognitive systems. Adopting it, we expect to find three factors that interact to determine (I-) languages attained: genetic endowment (the topic of Universal Grammar), experience, and principles that are language- or even organism-independent. Research has naturally focused on I-languages and UG, the problems of descriptive and explanatory adequacy. The Principles-and-Parameters approach opened the possibility for serious investigation of the third factor, and the attempt to account for properties of language in terms of general considerations of computational efficiency, eliminating some of the technology postulated as specific to language and providing more principled explanation of linguistic phenomena.
Article
Learning to read and write requires an individual to connect additional orthographic representations to pre-existing mappings between phonological and semantic representations of words. Past empirical results suggest that the process of learning to read and write (at least in alphabetic languages) elicits changes in the language processing system, by either increasing the cognitive efficiency of mapping between representations associated with a word, or by changing the granularity of phonological processing of spoken language, or through a combination of both. Behavioural effects of literacy have typically been assessed in offline explicit tasks that have addressed only phonological processing. However, a recent eye tracking study compared high and low literate participants on effects of phonology and semantics in processing measured implicitly using eye movements. High literates’ eye movements were more affected by phonological overlap in online speech than low literates, with only subtle differences observed in semantics. We determined whether these effects were due to cognitive efficiency and/or granularity of speech processing in a multimodal model of speech processing – the amodal shared resource model (ASR, Smith, Monaghan, & Huettig, 2013). We found that cognitive efficiency in the model had only a marginal effect on semantic processing and did not affect performance for phonological processing, whereas fine-grained versus coarse-grained phonological representations in the model simulated the high/low literacy effects on phonological processing, suggesting that literacy has a focused effect in changing the grain-size of phonological mappings.
Article
As with biological systems, spoken languages are strikingly robust against perturbations. This paper shows that languages achieve robustness in a way that is highly similar to many biological systems. For example, speech sounds are encoded via multiple acoustically diverse, temporally distributed and functionally redundant cues, characteristics that bear similarities to what biologists call “degeneracy”. Speech is furthermore adequately characterized by neutrality, with many different tongue configurations leading to similar acoustic outputs, and different acoustic variants understood as the same by recipients. This highlights the presence of a large neutral network of acoustic neighbors for every speech sound. Such neutrality ensures that a steady backdrop of variation can be maintained without impeding communication, assuring that there is “fodder” for subsequent evolution. Thus, studying linguistic robustness is not only important for understanding how linguistic systems maintain their functioning upon the background of noise, but also for understanding the preconditions for language evolution.
Article
A key question in early word learning is how children cope with the uncertainty in natural naming events. One potential mechanism for uncertainty reduction is cross-situational word learning - tracking word/object co-occurrence statistics across naming events. But empirical and computational analyses of cross-situational learning have made strong assumptions about the nature of naming event ambiguity, assumptions that have been challenged by recent analyses of natural naming events. This paper shows that learning from ambiguous natural naming events depends on perspective. Natural naming events from parent-child interactions were recorded from both a third-person tripod-mounted camera and from a head-mounted camera that produced a 'child's-eye' view. Following the human simulation paradigm, adults were asked to learn artificial language labels by integrating across the most ambiguous of these naming events. Significant learning was found only from the child's perspective, pointing to the importance of considering statistical learning from an embodied perspective.