Article

Statistical learning and memory

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Learners often need to identify and remember recurring units in continuous sequences, but the underlying mechanisms are debated. A particularly prominent candidate mechanism relies on distributional statistics such as Transitional Probabilities (TPs). However, it is unclear what the outputs of statistical segmentation mechanisms are, and if learners store these outputs as discrete chunks in memory. We critically review the evidence for the possibility that statistically coherent items are stored in memory and outline difficulties in interpreting past research. We use Slone and Johnson's (2018) experiments as a case study to show that it is difficult to delineate the different mechanisms learners might use to solve a learning problem. Slone and Johnson (2018) reported that 8-month-old infants learned coherent chunks of shapes in visual sequences. Here, we describe an alternate interpretation of their findings based on a multiple-cue integration perspective. First, when multiple cues to statistical structure were available, infants' looking behavior seemed to track with the strength of the strongest one — backward TPs, suggesting that infants process multiple cues simultaneously and select the strongest one. Second, like adults, infants are exquisitely sensitive to chunks, but may require multiple cues to extract them. In Slone and Johnson's (2018) experiments, these cues were provided by immediate chunk repetitions during familiarization. Accordingly, infants showed strongest evidence of chunking following familiarization sequences in which immediate repetitions were more frequent. These interpretations provide a strong argument for infants' processing of multiple cues and the potential importance of multiple cues for chunk recognition in infancy.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, the question of whether statistical learning truly facilitates the memorization of these units is controversial. An alternative view proposes that statistical learning primarily supports the formation of pairwise associations among co-occurring elements (e.g., syllables) rather than the memorization of units (Endress & de Seyssel, under review;Endress, Slone, & Johnson, 2020). ...
... Given the focus of the current paper, I will focus more on this mere-associations view. For a critical discussion of the evidence supporting the memory view as well as alternative interpretations thereof, see Endress and de Seyssel (under review) and Endress et al. (2020). Support for the mere-association view comes from several key observations, including computational modeling of behavioral and electrophysiological statistical learning results with memory-less Hebbian mechanisms (Endress & Johnson, 2021;Endress, 2024), and an almost complete inability to consciously recall statistical defined items such as words even when their statistical structure has been demonstrably learned (Batterink, 2020;Endress & de Seyssel, under review). ...
Article
Full-text available
Statistical learning is a mechanism for detecting associations among co-occurring elements in many domains and species. A key controversy is whether it leads to memory for discrete chunks composed of these associated elements, or merely to pairwise associations among elements. Critical evidence for the mere-association view comes from the “phantom-word” phenomenon, where learners recognize statistically coherent but unattested items better than actually presented items with weaker internal associations, suggesting that they prioritize pairwise associations over memories for discrete units. However, this phenomenon has only been demonstrated for sequentially presented stimuli, but not for simultaneously presented visual shapes, where learners might prioritize discrete units over pairwise associations. Here, I ask whether the phantom-word phenomenon can be observed with simultaneously presented visual shapes. Learners were familiarized with scenes combining two triplets of visual shapes (hereafter “words”). They were then tested on their recognition of these words vs. part-words (attested items with weaker internal associations), of phantom-words (unattested items with strong internal associations) vs. part-words, and of words vs. phantom-words. Learners preferred both words and phantom-words over part-words and showed no preference for words over phantom-words. This suggests that, as for sequentially input, statistical learning in simultaneously presented shapes leads primarily to pairwise associations rather than to memories for discrete chunks. However, as, in some analyses, the preference for words over part-words was slightly higher than for phantom-words over part-words, the results do not rule out that, for simultaneous presented items, learners might have some limited sensitivity to frequency of occurrence.
... Our implementation of memory utilizes a moving time window for integrating past experiences, in line with classical concepts of statistical learning 45,46 . For the implementation of boredom, we rely on a parametric function to express the impact of an experience depending on its unpredictability 15 . ...
Article
Full-text available
Boredom is an aversive mental state that is typically evoked by monotony and drives individuals to seek novel information. Despite this effect on individual behavior, the consequences of boredom for collective behavior remain elusive. Here, we introduce an agent‑based model of collective fashion behavior in which simplified agents interact randomly and repeatedly choose alternatives from a circular space of color variants. Agents are endowed with a memory of past experiences and a boredom parameter, promoting avoidance of monotony. Simulating collective color trends with this model captures aspects of real trends observed in fashion magazines. We manipulate the two parameters and observe that the boredom parameter is essential for perpetuating fashion dynamics in our model. Furthermore, highly bored agents lead future population trends, when acting coherently or being highly popular. Taken together, our study illustrates that highly bored individuals can guide collective dynamics of a population to continuously explore different variants of behavior.
... Additionally, while our experimental design was tailored to assess forward transitions (the probability that A is followed by B), we also consider bidirectional associations (given exposure to "AB," whether "BA" is endorsed as old at test) in follow-up analyses in the online supplemental materials (online supplemental materials, "Bidirectional Item-Item Links Do Not Predict Responses Better Than Forward Transitions"). Note that the bidirectional associations we interrogate here (as in Park et al., 2018) are in contrast to the more frequently investigated-and similarly named, but distinct-concept of backward transitions (the probability that B was preceded by A; see e.g., Endress et al., 2020;Tummeltshammer et al., 2017), which our study is not poised to measure. ...
Article
Full-text available
Decades of work has shown that learners rapidly extract structure from their environment, later leveraging their knowledge of what is more versus less consistent with prior experience to guide behavior. However, open questions remain about exactly what is remembered after exposure to structure. Memory for specific associations—transitions that unfold over time—is considered a prime candidate for guiding behavior. However, other factors could influence behavior, such as memory for general features like reliable groupings or within-group positions. We also do not yet know whether memory depends upon the amount of experience with the input structure, leaving us with an incomplete understanding of how statistical learning supports behavior. In 4 experiments, we tracked the emergence of memory for item-item transitions, order-independent groups, and positions by having 400 adults watch a stream of shape triplets followed by a recognition memory test. We manipulated how closely test sequences corresponded to the input along each dimension of interest, allowing us to isolate the contribution of each factor. Both item-item transitions and order-independent group information influenced behavior, highlighting statistical learning as a mechanism through which we form both specific and generalized representations. Moreover, these factors drove behavior after different amounts of experience: With limited exposure, only group information impacted old-new judgments specific transitions gained importance later. Our findings suggest statistical learning proceeds by first forming a general representation of structure, with memory being later refined to include specifics after more experience.
Article
Full-text available
According to chunking theories, children discover their first words by extracting subsequences embedded in their continuous input. However, the mechanisms proposed in these accounts are often incompatible with data from other areas of language development. We present a new theory to connect the chunking accounts of word discovery with the broader developmental literature. We argue that (a) children build a diverse collection of chunks, including words, multiword phrases, and sublexical units; (b) these chunks have different processing times determined by how often each chunk is used to recode the input; and (c) these processing times interact with short-term memory limitations and incremental processing to constrain learning. We implemented this theory as a computational modeling architecture called Chunk-Based Incremental Processing and Learning (CIPAL). Across nine studies, we demonstrate that CIPAL can model word discovery in different contexts. First, we trained the model with 70 child-directed speech corpora from 15 languages. CIPAL gradually discovered words in each language, with cross-linguistic variation in performance. The model’s average processing time also improved with experience, resembling the developmental changes observed in children’s speed of processing. Second, we showed that CIPAL could simulate seven influential effects reported in statistical learning experiments with artificial languages. This included a preference for words over nonwords, part words, frequency-matched part words, phantom words, and sublexical units. On this basis, we argue that incremental chunking is an effective implicit statistical learning mechanism that may be central to children’s vocabulary development.
Preprint
According to chunking theories, children discover their first words by extracting sub-sequences embedded in their continuous input. However, the mechanisms proposed in these accounts are often incompatible with data from other areas of language development. We present a new theory to connect the chunking accounts of word discovery with the broader developmental literature. We argue that (a) children build a diverse collection of chunks, including words, multi-word phrases, and sub-lexical units; (b) these chunks have different processing times determined by how often each chunk is used to recode the input; and (c) these processing times interact with short-term memory limitations and incremental processing to constrain learning. We implemented this theory as a computational modelling architecture called CIPAL (Chunk-based Incremental Processing and Learning). Across nine studies, we demonstrate that CIPAL can model word discovery in different contexts. First, we trained the model with 70 child-directed speech corpora from 15 languages. CIPAL gradually discovered words in each language, with cross-linguistic variation in performance. The model’s average processing time also improved with experience, resembling the developmental changes observed in children’s speed of processing. Second, we showed that CIPAL could simulate seven influential effects reported in statistical learning experiments with artificial languages. This included a preference for words over nonwords, part words, frequency-matched part words, phantom words, and sub-lexical units. On this basis, we argue that incremental chunking is an effective implicit statistical learning mechanism that may be central to children’s vocabulary development.
Article
Full-text available
In many domains, learners extract recurring units from continuous sequences. For example, in unknown languages, fluent speech is perceived as a continuous signal. Learners need to extract the underlying words from this continuous signal and then memorize them. One prominent candidate mechanism is statistical learning, whereby learners track how predictive syllables (or other items) are of one another. Syllables within the same word predict each other better than syllables straddling word boundaries. But does statistical learning lead to memories of the underlying words—or just to pairwise associations among syllables? Electrophysiological results provide the strongest evidence for the memory view. Electrophysiological responses can be time‐locked to statistical word boundaries (e.g., N400s) and show rhythmic activity with a periodicity of word durations. Here, I reproduce such results with a simple Hebbian network. When exposed to statistically structured syllable sequences (and when the underlying words are not excessively long), the network activation is rhythmic with the periodicity of a word duration and activation maxima on word‐final syllables. This is because word‐final syllables receive more excitation from earlier syllables with which they are associated than less predictable syllables that occur earlier in words. The network is also sensitive to information whose electrophysiological correlates were used to support the encoding of ordinal positions within words. Hebbian learning can thus explain rhythmic neural activity in statistical learning tasks without any memory representations of words. Learners might thus need to rely on cues beyond statistical associations to learn the words of their native language. Research Highlights Statistical learning may be utilized to identify recurring units in continuous sequences (e.g., words in fluent speech) but may not generate explicit memory for words. Exposure to statistically structured sequences leads to rhythmic activity with a period of the duration of the underlying units (e.g., words). I show that a memory‐less Hebbian network model can reproduce this rhythmic neural activity as well as putative encodings of ordinal positions observed in earlier research. Direct tests are needed to establish whether statistical learning leads to declarative memories for words.
Article
Statistical learning relies on detecting the frequency of co-occurrences of items and has been proposed to be crucial for a variety of learning problems, notably to learn and memorize words from fluent speech. Endress and Johnson (2021) (hereafter EJ) recently showed that such results can be explained based on simple memory-less correlational learning mechanisms such as Hebbian Learning. Tovar and Westermann (2022) (hereafter TW) reproduced these results with a different Hebbian model. We show that the main differences between the models are whether temporal decay acts on both the connection weights and the activations (in TW) or only on the activations (in EJ), and whether interference affects weights (in TW) or activations (in EJ). Given that weights and activations are linked through the Hebbian learning rule, the networks behave similarly. However, in contrast to TW, we do not believe that neurophysiological data are relevant to adjudicate between abstract psychological models with little biological detail. Taken together, both models show that different memory-less correlational learning mechanisms provide a parsimonious account of Statistical Learning results. They are consistent with evidence that Statistical Learning might not allow learners to learn and retain words, and Statistical Learning might support predictive processing instead.
Article
Infants’ ability to detect statistical regularities between visual objects has been demonstrated in previous studies (e.g., Kirkham et al., Cognition, 83, 2002, B35). The extent to which infants extract and learn the actual values of the transitional probabilities (TPs) between these objects nevertheless remains an open question. In three experiments providing identical learning conditions but contrasting different types of sequences at test, we examined 8-month-old infants’ ability to discriminate between familiar sequences involving high or low values of TPs, and new sequences that involved null TPs. Results showed that infants discriminate between these three types of sequences, supporting the existence of a statistical learning mechanism by which infants extract fine-grained statistical information from a stream of visual stimuli. Interestingly, the expression of this statistical knowledge varied between experiments and specifically depended on the nature of the first two test trials. We argue that the predictability of this early test arrangement—namely whether the first two test items were either predictable or unexpected based on the habituation phase—determined infants’ looking behaviors.
Article
Full-text available
Much research has documented infants' sensitivity to statistical regularities in auditory and visual inputs, however the manner in which infants process and represent statistically defined information remains unclear. Two types of models have been proposed to account for this sensitivity: statistical models, which posit that learners represent statistical relations between elements in the input; and chunking models, which posit that learners represent statistically-coherent units of information from the input. Here, we evaluated the fit of these two types of models to behavioral data that we obtained from 8-month-old infants across four visual sequence-learning experiments. Experiments examined infants' representations of two types of structures about which statistical and chunking models make contrasting predictions: illusory sequences (Experiment 1) and embedded sequences (Experiments 2-4). In all four experiments, infants discriminated between high probability sequences and low probability part-sequences, providing strong evidence of learning. Critically, infants also discriminated between high probability sequences and statistically-matched sequences (illusory sequences in Experiment 1, embedded sequences in Experiments 2-3), suggesting that infants learned coherent chunks of elements. Experiment 4 examined the temporal nature of chunking, and demonstrated that the fate of embedded chunks depends on amount of exposure. These studies contribute important new data on infants' visual statistical learning ability, and suggest that the representations that result from infants' visual statistical learning are best captured by chunking models.
Article
Full-text available
Research over the past 2 decades has demonstrated that infants are equipped with remarkable computational abilities that allow them to find words in continuous speech. Infants can encode information about the transitional probability (TP) between syllables to segment words from artificial and natural languages. As previous research has tested infants immediately after familiarization, infants' ability to retain sequential statistics beyond the immediate familiarization context remains unknown. Here, we examine infants' memory for statistically defined words 10 min after familiarization with an Italian corpus. Eight-month-old English-learning infants were familiarized with Italian sentences that contained 4 embedded target words-2 words had high internal TP (HTP, TP = 1.0) and 2 had low TP (LTP, TP = .33)-and were tested on their ability to discriminate HTP from LTP words using the Headturn Preference Procedure. When tested after a 10-min delay, infants failed to discriminate HTP from LTP words, suggesting that memory for statistical information likely decays over even short delays (Experiment 1). Experiments 2-4 were designed to test whether experience with isolated words selectively reinforces memory for statistically defined (i.e., HTP) words. When 8-month-olds were given additional experience with isolated tokens of both HTP and LTP words immediately after familiarization, they looked significantly longer on HTP than LTP test trials 10 min later. Although initial representations of statistically defined words may be fragile, our results suggest that experience with isolated words may reinforce the output of statistical learning by helping infants create more robust memories for words with strong versus weak co-occurrence statistics. (PsycINFO Database Record
Article
Full-text available
Working memory (WM) is thought to have a fixed and limited capacity. However, the origins of these capacity limitations are debated, and generally attributed to active, attentional processes. Here, we show that the existence of interference among items in memory mathematically guarantees fixed and limited capacity limits under very general conditions, irrespective of any processing assumptions. Assuming that interference (a) increases with the number of interfering items and (b) brings memory performance to chance levels for large numbers of interfering items, capacity limits are a simple function of the relative influence of memorization and interference. In contrast, we show that time-based memory limitations do not lead to fixed memory capacity limitations that are independent of the timing properties of an experiment. We show that interference can mimic both slot-like and continuous resource-like memory limitations, suggesting that these types of memory performance might not be as different as commonly believed. We speculate that slot-like WM limitations might arise from crowding-like phenomena in memory when participants have to retrieve items. Further, based on earlier research on parallel attention and enumeration, we suggest that crowding-like phenomena might be a common reason for the 3 major cognitive capacity limitations. As suggested by Miller (1956) and Cowan (2001), these capacity limitations might arise because of a common reason, even though they likely rely on distinct processes.
Article
Full-text available
Language learners encounter numerous opportunities to learn regularities, but need to decide which of these regularities to learn, because some are not productive in their native language. Here, we present an account of rule learning based on perceptual and memory primitives (Endress, Dehaene-Lambertz, & Mehler, Cognition, 105(3), 577-614, 2007; Endress, Nespor, & Mehler, Trends in Cognitive Sciences, 13(8), 348-353, 2009), suggesting that learners preferentially learn regularities that are more salient to them, and that the pattern of salience reflects the frequency of language features across languages. We contrast this view with previous artificial grammar learning research, which suggests that infants "choose" the regularities they learn based on rational, Bayesian criteria (Frank & Tenenbaum, Cognition, 120(3), 360-371, 2013; Gerken, Cognition, 98(3)B67-B74, 2006, Cognition, 115(2), 362-366, 2010). In our experiments, adult participants listened to syllable strings starting with a syllable reduplication and always ending with the same "affix" syllable, or to syllable strings starting with this "affix" syllable and ending with the "reduplication". Both affixation and reduplication are frequently used for morphological marking across languages. We find three crucial results. First, participants learned both regularities simultaneously. Second, affixation regularities seemed easier to learn than reduplication regularities. Third, regularities in sequence offsets were easier to learn than regularities at sequence onsets. We show that these results are inconsistent with previous Bayesian rule learning models, but mesh well with the perceptual or memory primitives view. Further, we show that the pattern of salience revealed in our experiments reflects the distribution of regularities across languages. Ease of acquisition might thus be one determinant of the frequency of regularities across languages.
Article
Full-text available
In an artificial grammar learning task, amnesic patients classified test items as well as normal subjects did. Item similarity did not affect grammaticality judgments when similar and nonsimilar test items were balanced for the frequency with which bigrams and trigrams (chunks) that appeared in the training set also appeared in the test items. Amnesic patients performed like normal subjects. The results suggest that concrete information about letter chunks can influence grammaticality judgments and that this information is acquired implicitly. The similarity of whole test items to training items does not appear to affect grammaticality judgments.
Article
Full-text available
Visual working memory (VWM) is an online memory buffer that is typically assumed to be immune to source memory confusions. Accordingly, the few studies that have investigated the role of proactive interference (PI) in VWM tasks found only a modest PI effect at best. In contrast, a recent study has found a substantial PI effect in that performance in a VWM task was markedly improved when all memory items were unique compared to the more standard condition in which only a limited set of objects was used. The goal of the present study was to reconcile this discrepancy between the findings, and to scrutinize the extent to which PI is involved in VWM tasks. Experiments 1-2 showed that the robust advantage in using unique memory items can also be found in a within-subject design and is largely independent of set size, encoding duration, or intertrial interval. Importantly, however, PI was found mainly when all items were presented at the same location, and the effect was greatly diminished when the items were presented, either simultaneously (Experiment 3) or sequentially (Experiments 4-5), at distinct locations. These results indicate that PI is spatially specific and that without the assistance of spatial information VWM is not protected from PI. Thus, these findings imply that spatial information plays a key role in VWM, and underscore the notion that VWM is more vulnerable to interference than is typically assumed. (PsycINFO Database Record
Article
Full-text available
Adding an affix to transform a word is common across the world languages, with the edges of words more likely to carry out such a function. However, detecting affixation patterns is also observed in learning tasks outside the domain of language, suggesting that the underlying mechanism from which affixation patterns have arisen may not be language or even human specific. We addressed whether a songbird, the zebra finch, is able to discriminate between, and generalize, affixation-like patterns. Zebra finches were trained and tested in a Go/Nogo paradigm to discriminate artificial song element sequences resembling prefixed and suffixed 'words.' The 'stems' of the 'words,' consisted of different combinations of a triplet of song elements, to which a fourth element was added as either a 'prefix' or a 'suffix.' After training, the birds were tested with novel stems, consisting of either rearranged familiar element types or novel element types. The birds were able to generalize the affixation patterns to novel stems with both familiar and novel element types. Hence, the discrimination resulting from the training was not based on memorization of individual stimuli, but on a shared property among Go or Nogo stimuli, i.e., affixation patterns. Remarkably, birds trained with suffixation as Go pattern showed clear evidence of using both prefix and suffix, while those trained with the prefix as the Go stimulus used primarily the prefix. This finding illustrates that an asymmetry in attending to different affixations is not restricted to human languages.
Article
Full-text available
Compared to humans, non-human primates have very little control over their vocal production. Nonetheless, some primates produce various call combinations, which may partially offset their lack of acoustic flexibility. A relevant example is male Campbell's monkeys (Cercopithecus campbelli), which give one call type ('Krak') to leopards, while the suffixed version of the same call stem ('Krak-oo') is given to unspecific danger. To test whether recipients attend to this suffixation pattern, we carried out a playback experiment in which we broadcast naturally and artificially modified suffixed and unsuffixed 'Krak' calls of male Campbell's monkeys to 42 wild groups of Diana monkeys (Cercopithecus diana diana). The two species form mixed-species groups and respond to each other's vocalizations. We analysed the vocal response of male and female Diana monkeys and overall found significantly stronger vocal responses to unsuffixed (leopard) than suffixed (unspecific danger) calls. Although the acoustic structure of the 'Krak' stem of the calls has some additional effects, subject responses were mainly determined by the presence or the absence of the suffix. This study indicates that suffixation is an evolved function in primate communication in contexts where adaptive responses are particularly important. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Article
Full-text available
In immediate serial recall, participants are asked to recall novel sequences of items in the correct order. Theories of the representations and processes required for this task differ in how order information is maintained; some have argued that order is represented through item-to-item associations, while others have argued that each item is coded for its position in a sequence, with position being defined either by distance from the start of the sequence, or by distance from both the start and the end of the sequence. Previous researchers have used error analyses to adjudicate between these different proposals. However, these previous attempts have not allowed researchers to examine the full set of alternative proposals. In the current study, we analyzed errors produced in 2 immediate serial recall experiments that differ in the modality of input (visual vs. aural presentation of words) and the modality of output (typed vs. spoken responses), using new analysis methods that allow for a greater number of alternative hypotheses to be considered. We find evidence that sequence positions are represented relative to both the start and the end of the sequence, and show a contribution of the end-based representation beyond the final item in the sequence. We also find limited evidence for item-to-item associations, suggesting that both a start-end positional scheme and item-to-item associations play a role in representing item order in immediate serial recall. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Article
Full-text available
ABSTRACT To achieve language proficiency, infants must find the building blocks of speech and master the rules governing their legal combinations. However, these problems are linked: words are also built according to rules. Here, we explored early morphosyntactic sensitivity by testing when and how infants could find either words or within-word structure in artificial speech snippets embodying properties of morphological constructions. We show that 12-month-olds use statistical relationships between syllables to extract words from continuous streams, but find word-internal regularities only if the streams are segmented. Seven-month-olds fail both tasks. Thus, 12-month-olds infants possess the resources to analyze the internal composition of words if the speech contains segmentation information. However, 7-month-old infants may not possess them, although they can track several statistical relations. This developmental difference suggests that morphosyntactic sensitivity may require computational resources extending beyond the detection of simple statistics.
Article
Full-text available
Conducted 2 analyses from a study by D. Newtson et al (1976) of the relation between the segmentation of 7 ongoing behavior sequences into their component actions and the movement in those sequences. The 1st analysis confirmed that action-unit boundaries consist of stimulus points depicting distinctive changes relative to the previously used action-unit boundary, rather than consisting of distinctive, action-defining states. The 2nd analysis tested more rigorously the notion that distinctive changes form the objective basis of behavior units by examining the transitions between stimulus points within action units and transitions to and from action-unit boundaries; results of this analysis also support a distinctive-change interpretation. Results of previous studies of action perception are reviewed, and a preliminary hypothesis as to the nature of behavior perception processes is presented and discussed. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Linguists have suggested that non-manual and manual markers are used in sign languages to indicate prosodic and syntactic boundaries. However, little is known about how native signers interpret non-manual and manual cues with respect to sentence boundaries. Six native signers of British Sign Language (BSL) were asked to mark sentence boundaries in two narratives: one presented in BSL and one in Swedish Sign Language (SSL). For comparative analysis, non-signers undertook the same tasks. Results indicated that both native signers and non-signers were able to use visual cues effectively in segmentation and that their decisions were not dependent on knowledge of the signed language. Signed narratives contain visible cues to their prosodic structure which are available to signers and non-signers alike.
Article
Full-text available
Transitional Probability (TP) computations are regarded as a powerful learning mechanism that is functional early in development and has been proposed as an initial bootstrapping device for speech segmentation. However, a recent study casts doubt on the robustness of early statistical word-learning. Johnson and Tyler (201019. Johnson , E. K. and Tyler , M. D. 2010 . Testing the limits of statistical learning for word segmentation . Developmental Science , 13 ( 2 ) : 339 – 345 . [CrossRef], [Web of Science ®]View all references)showed that when 8-month-olds are presented with artificial languages where TPs between syllables are reliable cues to word boundaries but that contain words of varying length, infants fail to show word segmentation. Given previous evidence that familiar words facilitate segmentation (Bortfeld, Morgan, Golinkoff, & Rathbun, 20053. Bortfeld , H. , Morgan , J. L. , Golinkoff , R. M. and Rathbun , K. 2005 . Mommy and me: Familiar names help launch babies into speech-stream segmentation . Psychological Science , 16 ( 4 ) : 298 – 304 . [CrossRef], [Web of Science ®]View all references), we investigated the conditions under which 8-month-old French-learning infants can succeed in segmenting an artificial language. We found that infants can use TPs to segment a language of uniform length words (Experiment 1) and a language of nonuniform length words containing the familiar word “maman” (/mamã/, mommy in French; Experiment 2), but not a similar language of nonuniform length words containing the pseudo-word /mãma/ (Experiment 3). We interpret these findings as evidence that 8-month-olds can use familiar words and TPs in combination to segment fluent speech, providing initial evidence for 8-month-olds' ability to combine top-down and bottom-up speech segmentation procedures.
Article
Full-text available
The acoustic variation in language presents learners with a substantial challenge. To learn by tracking statistical regularities in speech, infants must recognize words across tokens that differ based on characteristics such as the speaker’s voice, affect, or the sentence context. Previous statistical learning studies have not investigated how these types of non-phonemic surface form variation affect learning. The present experiments used tasks tailored to two distinct developmental levels to investigate the robustness of statistical learning to variation. Experiment 1 examined statistical word segmentation in 11-month-olds and found that infants can recognize statistically segmented words across a change in the speaker’s voice from segmentation to testing. The direction of infants’ preferences suggests that recognizing words across a voice change is more difficult than recognizing them in a consistent voice. Experiment 2 tested whether 17-month-olds can generalize the output of statistical learning across variation to support word learning. The infants were successful in their generalization; they associated referents with statistically defined words despite a change in voice from segmentation to label learning. Infants’ learning patterns also indicate that they formed representations of across word syllable sequences during segmentation. Thus, low probability sequences can act as object labels in some conditions. The findings of these experiments suggest that the units that emerge during statistical learning are not perceptually constrained, but rather are robust to naturalistic acoustic variation.
Article
Full-text available
Tested implications for attribution processes of variation in the unit of perception in 2 experiments with college freshmen males (n = 20). In Exp I Ss viewed a 5-min videotaped behavior sequence. Ss were instructed to segment the behavior into as fine units of action or as gross units of action as were natural and meaningful to them. Results indicate that in comparison to gross-unit Ss, fine-unit Ss were more confident in their impressions, made more dispositional attributions, and tended to have more differentiated impressions. In Exp II Ss viewed either of 2 comparable sequences of problem-solving behavior; in 1, an unexpected action was inserted. Following the unexpected act, Ss employed more units of perception/min than controls who did not view it. It is concluded that the unit of perception varies according to situational constraints and that attribution theories assuming constant units are seriously in error. Implications of unit variation for the interpretation of attribution research are discussed. (17 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This paper deals with two distinct but inextricably connected sets of questions in the area of sentence phonology. The first concerns the organisation of sentence phonology and the nature of the phonological representation(s) of the sentence, and the second the relation between syntactic structure and phonological representation.
Article
Full-text available
One of the infant's first tasks in language acquisition is to discover the words embedded in a mostly continuous speech stream. This learning problem might be solved by using distributional cues to word boundaries—for example, by computing the transitional probabilities between sounds in the language input and using the relative strengths of these probabilities to hypothesize word boundaries. The learner might be further aided by language-specific prosodic cues correlated with word boundaries. As a first step in testing these hypotheses, we briefly exposed adults to an artificial language in which the only cues available for word segmentation were the transitional probabilities between syllables. Subjects were able to learn the words of this language. Furthermore, the addition of certain prosodic cues served to enhance performance. These results suggest that distributional cues may play an important role in the initial word segmentation of language learners.
Article
Full-text available
Very young children successfully acquire the vocabulary of their native language despite their limited information processing abilities. One partial explanation for children's success at the inductive problem word learning presents is that children are constrained in the kinds of hypotheses they consider as potential meanings of novel words. Three such constraints are discussed: (1) the whole-object assumption which leads children to infer that terms refer to objects as a whole rather than to their parts, substance, color, or other properties; (2) the taxonomic assumption which leads children to extend words to objects or entities of like kind; and (3) the mutual exclusivity assumption which leads children to avoid two labels for the same object. Recent evidence is reviewed suggesting that all three constraints are available to babies by the time of the naming explosion. Given the importance of word learning, children might be expected to recruit whatever sources of information they can to narrow down a word's meaning, including information provided by grammatical form class and the pragmatics of the situation. Word-learning constraints interact with these other sources of information but are also argued to be an especially useful source of information for children who have not yet mastered grammatical form class in that constraints should function as an entering wedge into language acquisition.
Article
Learning often requires splitting continuous signals into recurring units, such as the discrete words constituting fluent speech; these units then need to be encoded in memory. A prominent candidate mechanism involves statistical learning of co-occurrence statistics like transitional probabilities (TPs), reflecting the idea that items from the same unit (e.g., syllables within a word) predict each other better than items from different units. TP computations are surprisingly flexible and sophisticated. Humans are sensitive to forward and backward TPs, compute TPs between adjacent items and longer-distance items, and even recognize TPs in novel units. We explain these hallmarks of statistical learning with a simple model with tunable, Hebbian excitatory connections and inhibitory interactions controlling the overall activation. With weak forgetting, activations are long-lasting, yielding associations among all items; with strong forgetting, no associations ensue as activations do not outlast stimuli; with intermediate forgetting, the network reproduces the hallmarks above. Forgetting thus is a key determinant of these sophisticated learning abilities. Further, in line with earlier dissociations between statistical learning and memory encoding, our model reproduces the hallmarks of statistical learning in the absence of a memory store in which items could be placed.
Article
We asked whether 11- and 14- month-old infants' abstract rule learning, an early form of analogical reasoning, is susceptible to processing constraints imposed by limits in attention and memory for sequence position. We examined 11- and 14- month-old infants' learning and generalization of abstract repetition rules ("repetition anywhere," Experiment 1 or "medial repetition," Experiment 2) and ordering of specific items (edge positions, Experiment 3) in 4-item sequences. Infants were habituated to sequences containing repetition- and/or position-based structure and then tested with "familiar" vs. "novel" (random) sequences composed of new items. Eleven-month-olds (N = 40) failed to learn abstract repetition rules, but 14-month-olds (N = 40) learned rules under both conditions. In Experiment 3, 11-month-olds (N = 20) learned item edge positions in sequences identical to those in Experiment 2. We conclude that infant sequence learning is constrained by item position in similar ways as in adults.
Article
Learners often need to extract recurring items from continuous sequences, in both vision and audition. The best-known example is probably found in word-learning, where listeners have to determine where words start and end in fluent speech. This could be achieved through universal and experience-independent statistical mechanisms, for example by relying on Transitional Probabilities (TPs). Further, these mechanisms might allow learners to store items in memory. However, previous investigations have yielded conflicting evidence as to whether a sensitivity to TPs is diagnostic of the memorization of recurring items. Here, we address this issue in the visual modality. Participants were familiarized with a continuous sequence of visual items (i.e., arbitrary or everyday symbols), and then had to choose between (i) high-TP items that appeared in the sequence, (ii) high-TP items that did not appear in the sequence, and (iii) low-TP items that appeared in the sequence. Items matched in TPs but differing in (chunk) frequency were much harder to discriminate than items differing in TPs (with no significant sensitivity to chunk frequency), and learners preferred unattested high-TP items over attested low-TP items. Contrary to previous claims, these results cannot be explained on the basis of the similarity of the test items. Learners thus weigh within-item TPs higher than the frequency of the chunks, even when the TP differences are relatively subtle. We argue that these results are problematic for distributional clustering mechanisms that analyze continuous sequences, and provide supporting computational results. We suggest that the role of TPs might not be to memorize items per se, but rather to prepare learners to memorize recurring items once they are presented in subsequent learning situations with richer cues.
Article
To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even without rich language-specific knowledge. However, it is not clear how strong and reliable the distributional cues are that humans might use to segment speech. We investigate cross-linguistic viability of different statistical learning strategies by analyzing child-directed speech corpora from nine languages and by modeling possible statistics-based speech segmentations. We show that languages vary as to which statistical segmentation strategies are most successful. The variability of the results can be partially explained by systematic differences between languages, such as rhythmical differences. The results confirm previous findings that different statistical learning strategies are successful in different languages and suggest that infants may have to primarily rely on non-statistical cues when they begin their process of speech segmentation.
Article
Much of what we know about the development of listeners’ word segmentation strategies originates from the artificial language-learning literature. However, many artificial speech streams designed to study word segmentation lack a salient cue found in all natural languages: utterance boundaries. In this study, participants listened to a speech-stream containing one of three sets of word boundary cues: transitional probabilities between syllables (TP Condition), silences marking utterance boundaries (UB Condition), or a combination of both cues (TP + UB Condition). Recognition of the trained words and rule words (words not in language, but conforming to its phonotactic structure) was tested. Participants performed equally well in the TP + UB and UB Conditions, scoring above chance on both trained and rule words. Performance in the TP condition, however, was at chance. Our results suggest that attention to UBs is a particularly effective strategy for finding words in speech, possibly providing a language-general solution to the word segmentation problem.
Article
We review recent artificial language learning studies, especially those following Endress and Bonatti (Endress AD, Bonatti LL. Rapid learning of syllable classes from a perceptually continuous speech stream. Cognition 2007, 105:247–299), suggesting that humans can deploy a variety of learning mechanisms to acquire artificial languages. Several experiments provide evidence for multiple learning mechanisms that can be deployed in fluent speech: one mechanism encodes the positions of syllables within words and can be used to extract generalization, while the other registers co‐occurrence statistics of syllables and can be used to break a continuum into its components. We review dissociations between these mechanisms and their potential role in language acquisition. We then turn to recent criticisms of the multiple mechanisms hypothesis and show that they are inconsistent with the available data. Our results suggest that artificial and natural language learning is best understood by dissecting the underlying specialized learning abilities, and that these data provide a rare opportunity to link important language phenomena to basic psychological mechanisms. WIREs Cogn Sci 2016, 7:19–35. doi: 10.1002/wcs.1376 This article is categorized under: Psychology > Language Psychology > Theory and Methods Psychology > Learning
Conference Paper
Much research has documented learners’ ability to segment auditory and visual input into its component units. Two types of models have been designed to account for this phenomena: statistical models, in which learners represent statistical relations between elements, and chunking models, in which learners represent statistically coherent units of information. In a series of three experiments, we investigated how adults’ performance on a visual sequence-learning task aligned with the predictions of these two types of models. Experiments 1 and 2 examined learning of embedded items and Experiment 3 examined learning of illusory items. The pattern of results obtained was most consistent with the competitive chunking model of Servan-Schreiber and Anderson (1990). Implications for theories and models of statistical learning are discussed.
Article
Within the first year of life, infants learn to segment words from fluent speech. Previous research has shown that infants at 0;7·5 can segment consonant-initial words, yet the ability to segment vowel-initial words does not emerge until the age of 1;1–1;4 (0;11 in some restricted cases). In five experiments, we show that infants aged 0;11 but not 0;8 are able to segment vowel-initial words that immediately follow the function word the [ði], while ruling out a bottom-up, phonotactic account of these results. Thus, function words facilitate the segmentation of vowel-initial words that appear sentence-medially for infants aged 0;11.
Article
Research on the influence of multimodal information on infants' learning is inconclusive. While one line of research finds that multimodal input has a negative effect on learning, another finds positive effects. The present study aims to shed some new light on this discussion by studying the influence of multimodal information and accompanying stimulus complexity on the learning process. We assessed the influence of multimodal input on the trial-by-trial learning of 8- and 11-month-old infants. Using an anticipatory eye movement paradigm, we measured how infants learn to anticipate the correct stimulus–location associations when exposed to visual-only, auditory-only (unimodal), or auditory and visual (multimodal) information. Our results show that infants in both the multimodal and visual-only conditions learned the stimulus–location associations. Although infants in the visual-only condition appeared to learn in fewer trials, infants in the multimodal condition showed better anticipating behavior: as a group, they had a higher chance of anticipating correctly on more consecutive trials than infants in the visual-only condition. These findings suggest that effects of multimodal information on infant learning operate chiefly through effects on infants' attention.
Article
Saffran, Newport, and Aslin (1996b) showed that adults were able to segment into words an artificial language that included no pauses or other prosodic cues for word boundaries. We propose an account of their results that requires only limited computational abilities and memory capacity. In this account, parsing emerges as a natural consequence of the on-line attentional processing of the input, thanks to basic laws of memory and associative learning. Our account was implemented in a computer program, PARSER. Simulations revealed that PARSER extracted the words of the language well before exhausting the material presented to participants in the Saffran et al. experiments. In addition, PARSER was able to simulate the results obtained under attention-disturbing conditions (Saffran, Newport, Aslin, Tunick, & Barrueco, 1997) and those collected from 8-month-old infants (Saffran, Aslin, and Newport, 1996a). Finally, the good performance of PARSER was not limited to the trisyllabic words used by Saffran et al., but also extended to a language composed of one- to five-syllable words.
Article
What is learned during mastery of a serial task: associations between adjacent and remote items, associations between an item and its ordinal position, or both? A clear answer to this question is lacking in the literature on human serial memory because it is difficult to control for a “naive” subject's linguistic competence and extensive experience with serial tasks. In this article, we present evidence that rhesus monkeys encode the ordinal positions of items of an arbitrary list when there is no requirement to do so. First, monkeys learned four nonverbal lists (1–4), each containing four novel items (photographs of natural objects). The monkeys then learned four 4-item lists that were derived exclusively and exhaustively from Lists 1 through 4, one item from each list. On two derived lists, each item's original ordinal position was maintained. Those lists were acquired with virtually no errors. The two remaining derived lists, on which the original ordinal position of each item was changed, were as difficult to learn as novel lists. The immediate acquisition of lists on which ordinal position was maintained shows that knowledge of ordinal position can develop without the benefit of language, extensive list-learning experience, or explicit instruction to encode ordinal information.
Article
A recent report demonstrated that 8-month-olds can seg- ment a continuous stream of speech syllables, containing no acoustic or prosodic cues to word boundaries, into wordlike units after only 2 min of listening experience (Saffran, Aslin, & Newport, 1996). Thus, a powerful learning mechanism capable of extracting statistical informa- tion from fluent speech is available early in development. The present study extends these results by documenting the particular type of statis- tical computation—transitional (conditional) probability—used by infants to solve this word-segmentation task. An artificial language corpus, consisting of a continuous stream of trisyllabic nonsense words, was presented to 8-month-olds for 3 min. A postfamiliarization test compared the infants' responses to words versus part-words (tri- syllabic sequences spanning word boundaries). The corpus was con- structed so that test words and part-words were matched in frequency, but differed in their transitional probabilities. Infants showed reliable discrimination of words from part-words, thereby demonstrating rapid segmentation of continuous speech into words on the basis of transi- tional probabilities of syllable pairs.
Article
In recent years, Bayesian learning models have been applied to an increasing variety of domains. While such models have been criticized on theoretical grounds, the underlying assumptions and predictions are rarely made concrete and tested experimentally. Here, I use Frank and Tenenbaum's (2011) Bayesian model of rule-learning as a case study to spell out the underlying assumptions, and to confront them with the empirical results Frank and Tenenbaum (2011) propose to simulate, as well as with novel experiments. While rule-learning is arguably well suited to rational Bayesian approaches, I show that their models are neither psychologically plausible nor ideal observer models. Further, I show that their central assumption is unfounded: humans do not always preferentially learn more specific rules, but, at least in some situations, those rules that happen to be more salient. Even when granting the unsupported assumptions, I show that all of the experiments modeled by Frank and Tenenbaum (2011) either contradict their models, or have a large number of more plausible interpretations. I provide an alternative account of the experimental data based on simple psychological mechanisms, and show that this account both describes the data better, and is easier to falsify. I conclude that, despite the recent surge in Bayesian models of cognitive phenomena, psychological phenomena are best understood by developing and testing psychological theories rather than models that can be fit to virtually any data.
Article
Previous research with artificial language learning paradigms has shown that infants are sensitive to statistical cues to word boundaries (Saffran, Aslin & Newport, 1996) and that they can use these cues to extract word-like units (Saffran, 2001). However, it is unknown whether infants use statistical information to construct a receptive lexicon when acquiring their native language. In order to investigate this issue, we rely on the fact that besides real words a statistical algorithm extracts sound sequences that are highly frequent in infant-directed speech but constitute nonwords. In three experiments, we use a preferential listening paradigm to test French-learning 11-month-old infants' recognition of highly frequent disyllabic sequences from their native language. In Experiments 1 and 2, we use nonword stimuli and find that infants listen longer to high-frequency than to low-frequency sequences. In Experiment 3, we compare high-frequency nonwords to real words in the same frequency range, and find that infants show no preference. Thus, at 11 months, French-learning infants recognize highly frequent sound sequences from their native language and fail to differentiate between words and nonwords among these sequences. These results are evidence that they have used statistical information to extract word candidates from their input and stored them in a 'protolexicon', containing both words and nonwords.
Article
Does statistical learning (Saffran, Aslin, & Newport, 1996) offer a universal segmentation strategy for young language learners? Previous studies on large corpora of English and structurally similar languages have shown that statistical segmentation can be an effective strategy. However, many of the world's languages have richer morphological systems, with sometimes several affixes attached to a stem (e.g. Hungarian: iskoláinkban: iskolá-i-nk-ban school.pl.poss1pl.inessive 'in our schools'). In these languages, word boundaries and morpheme boundaries do not coincide. Does the internal structure of words affect segmentation? What word forms does segmentation yield in morphologically rich languages: complex word forms or separate stems and affixes? The present paper answers these questions by exploring different segmentation algorithms in infant-directed speech corpora from two typologically and structurally different languages, Hungarian and Italian. The results suggest that the morphological and syntactic type of a language has an impact on statistical segmentation, with different strategies working best in different languages. Specifically, the direction of segmentation seems to be sensitive to the affixation order of a language. Thus, backward probabilities are more effective in Hungarian, a heavily suffixing language, whereas forward probabilities are more informative in Italian, which has fewer suffixes and a large number of phrase-initial function words. The consequences of these findings for potential segmentation and word learning strategies are discussed.
Article
a b s t r a c t Word-segmentation, that is, the extraction of words from fluent speech, is one of the first problems language learners have to master. It is generally believed that statistical pro-cesses, in particular those tracking ''transitional probabilities" (TPs), are important to word-segmentation. However, there is evidence that word forms are stored in memory for-mats differing from those that can be constructed from TPs, i.e. in terms of the positions of phonemes and syllables within words. In line with this view, we show that TP-based pro-cesses leave learners no more familiar with items heard 600 times than with ''phantom-words" not heard at all if the phantom-words have the same statistical structure as the occurring items. Moreover, participants are more familiar with phantom-words than with frequent syllable combinations. In contrast, minimal prosody-like perceptual cues allow learners to recognize actual items. TPs may well signal co-occurring syllables; this, how-ever, does not seem to lead to the extraction of word-like units. We review other, in par-ticular prosodic, cues to word-boundaries which may allow the construction of positional memories while not requiring language-specific knowledge, and suggest that their contri-butions to word-segmentation need to be reassessed.
Article
The subjects' ability to segment foreign speech was examined. Naturalness judgments regarding three syntactically defined pauses [between constituents (noun and verb phrases), words, or syllables] were obtained using a paired-presentation, forced-choice paradigm. It was hypothesized that segmentation skill developed through exposure to lexical and syntactic markers. Teh existence and effect of such markers was investigated by assigning subjects to various exposure conditions. Results indicated that lexical and syntactic markers exist and can be utilized by subjects in segmenting speech. Contrary to previous research, however, exposure did not facilitate performance. All groups discriminated constituents from either words or syllables, and words from syllables. Results were interpreted as reflecting the interdependence of syntax and suprasegmental phonology. Results challenged the credibility of traditional associationist accounts of language acquisition and speech perception. Results were discussed in the context of Martin's theory of the rhythmic structure of speech.
Article
One way to understand something is to break it up into parts. New research indicates that segmenting ongoing activity into meaningful events is a core component of ongoing perception, with consequences for memory and learning. Behavioral and neuroimaging data suggest that event segmentation is automatic and that people spontaneously segment activity into hierarchically organized parts and sub-parts. This segmentation depends on the bottom-up processing of sensory features such as movement, and on the top-down processing of conceptual features such as actors' goals. How people segment activity affects what they remember later; as a result, those who identify appropriate event boundaries during perception tend to remember more and learn more proficiently.
Article
Fluent speech contains few pauses between adjacent words. Cues such as stress, phonotactic constraints, and the statistical structure of the input aid infants in discovering word boundaries. None of the many available segmentation cues is foolproof. So, we used the headturn preference procedure to investigate infants' integration of multiple cues. We also explored whether infants find speech cues produced by coarticulation useful in word segmentation. Using natural speech syllables, we replicated Saffran, Aslin, et al.'s (1996) study demonstrating that 8-month-olds can segment a continuous stream of speech based on statistical cues alone. Next, we added conflicting segmentation cues. Experiment 2 pitted stress against statistics, whereas Experiment 3 pitted coarticulation against statistics. In both cases, 8-month-olds weighed speech cues more heavily than statistical cues. This observation was verified in Experiment 4, which indicated that greater complexity of the familiarization sequence does not necessarily lead to familiarity effects.
Article
Theoretical explanations of the spacing effect fall into two classes: those that attribute the advantage of two spaced presentations over two massed presentations to better consolidation of the first presentation, and those that attribute the advantage to better encoding of the second presentation. This paper reports an experimental test of the two classes of theory. Rather than manipulate spacing, the experiment varied the information processing difficulty of the activity interpolated between two presentations of an item. Consolidation-type theories imply decreasing consolidation with increasing difficulty of the interpolated activity. In fact, recall performance following two presentations separated by a difficult task was found to be slightly but consistently better than performance following two presentations separated by an easy task. The outcome thus favors encoding-type theories.