PreprintPDF Available

Finding hierarchical structure in binary sequences : evidence from Lindenmayer grammar learning

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In this article, we explore the extraction of recursive nested structure in the processing of binary sequences. Our aim was to determine whether the brain learns the higher order regularities of a highly simplified input where only sequential order information marks the hierarchical structure. To this end, we implemented sequence generated by the Fibonacci grammar in a serial reaction time task. This deterministic grammar generates aperiodic but self-similar sequences. The combination of these two properties allowed us to evaluate hierarchical learning while controlling for the use of low-level strategies like detecting recurring patterns. The deterministic aspect of the grammar allowed us to predict precisely which points in the sequence should be subject to anticipation. Results showed that participants' pattern of anticipation could not be accounted for by "flat" statistical learning processes and was consistent with them anticipating upcoming points based on hierarchical assumptions. We also found that participants were sensitive to the structure constituency, suggesting that they organized the signal into embedded constituents. We hypothesized that the participants built this structure by merging recursively deterministic transitions.
Finding hierarchical structure in binary sequences : evidence from
Lindenmayer grammar learning
Samuel Schmid,a Douglas Saddy,b Julie Francka
aFaculty of Psychology, University of Geneva, Geneva, Switzerland
bCentre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom
Abstract
In this article, we explore the extraction of recursive nested structure in the processing of binary
sequences. Our aim was to determine whether the brain learns the higher order regularities of a
highly simplified input where only sequential order information marks the hierarchical structure. To
this end, we implemented sequence generated by the Fibonacci grammar in a serial reaction time
task. This deterministic grammar generates aperiodic but self-similar sequences. The combination
of these two properties allowed us to evaluate hierarchical learning while controlling for the use of
low-level strategies like detecting recurring patterns. The deterministic aspect of the grammar
allowed us to predict precisely which points in the sequence should be subject to anticipation.
Results showed that participants’ pattern of anticipation could not be accounted for by “flat”
statistical learning processes and was consistent with them anticipating upcoming points based on
hierarchical assumptions. We also found that participants were sensitive to the structure
constituency, suggesting that they organized the signal into embedded constituents. We
hypothesized that the participants built this structure by merging recursively deterministic
transitions.
Keyword: Hierarchical representations, L-systems, Artificial grammar learning, Serial reaction
times, Self-similarity, Fibonacci, Recursion, Nested structure
1
1. Introduction
How does the brain extract hierarchical structure from a sequentially presented input? This question
lies at the core of multiple domains of cognitive psychology and neuroscience. The most prominent
is probably language processing where most linguistic theories assume that the sequences that
humans produce and remember cannot be reduced to mere associations of consecutive items but
must be mentally represented as recursively nested structures (Chomsky, 1957; Lashley, 1951;
Simon, 1962). Nested tree structure is a form of representation generated by symbolic rules
allowing recursion when they are embedded such that the same element can appear at multiple
levels.
There's a plethora of evidence that nested structures are represented and used by adults in sentence
processing (e.g., Lewis & Phillips, 2015) as well as in other cognitive domains like mathematical
expressions (Maruyama et al., 2012; Monti et al., 2012; Nakai & Sakai, 2014), motor action (Hunt
& Aslin, 2001; Martins et al., 2019), musical melody (Koelsch, 2005) and rhythm (Fitch & Martins,
2014; Kotz et al., 2018). Nevertheless, the experimental demonstration of the learning of nested
structures in sequence processing has proven difficult and the field of artificial grammar learning
(AGL) has produced very few empirical studies showing conclusive evidence (Kovács & Endress,
2014; Fitch, 2014; Honing & Zuidema, 2014; Levelt, 2019).
This difficulty comes from the fact that in the test cases classically used, a sequence can be
processed without necessarily building a nested structure as other, possibly simpler ways of
representing it can give rise to similar learning performance. Dehaene et al. (2015) proposed a
taxonomy of the different types of internal representations that can be generated from a sequence. In
particular, they distinguish between two kinds of hierarchical representations: nested representations
and algebraic patterns. Algebraic patterns refer to a type of representation where the input is coded
as sequential abstract relationships or categories, thus allowing generalization to new exemplars
irrespective of their specific identity. For example, the pseudowords "duduba" and "pipiro" share
the algebraic pattern AAB that can be coded as a repetition followed by an alternation. Marcus et al.
(1999) showed that at seven months, children were able to generalize this pattern to new unseen
pseudowords, suggesting that they had a representation of the AAB rule. Algebraic patterns are
hierarchical in the sense that they consist in variables that can take different values. Nevertheless,
patterns, even though abstract, are insufficient to account for complex structural dependencies that
characterize natural languages, like the subject-verb agreement dependency. For example, in the
sentence “[The cats [the car avoided] ran away]” the plural subject (cats) agrees with the verb (ran)
2
irrespective of the intervention of the relative clause (the car avoided). Long-distance dependencies
in natural language are impossible to express with a system that only captures local order relations
because arbitrarily large materials can intervene between the subject and the verb. In other words,
the nesting of constituents where (cats) and (ran) are directly linked is necessary to account for
long-distance dependencies.
Many AGL attempts to study the learning of nested structures have focused on the ability to learn
and generalize center-embedding (Bahlmann & Friederici, 2006; de Vries et al., 2012; Lai &
Poletiek, 2011, 2013; Mueller et al., 2010). Center-embedding is the nesting of an arbitrary number
of phrases into higher order phrases (e.g., [The cat [the dog chased] ran away]). Context-free
grammars (CFG) represent the minimal level in the Chomsky hierarchy because of their unbounded
memory that allows the binding of an unlimited number of constituents (Chomsky & Lightfoot,
2002). A well-studied instance of CFG is the a(n)b(n) grammar that generates strings like AB,
A[AB]B, A[A[AB]B]B, etc. In order to assess if recursion can be induced by participants after
exposure to sequences generated from this grammar, the test contrast is provided by strings
generated from a finite state grammar (FSG) like (ab)n. FSGs cannot generate center-embedding
because they have no memory; transitions are determined by the current state and the input only.
They are therefore unable to describe nested structures. Fitch and Hauser (2004) compared in a
habituation/discrimination task the ability of humans and cotton-top tamarins to discriminate
between the a(n)b(n) grammar and the (ab)n grammar. The authors discovered that humans were
able to notice the change from one grammar to the other while cotton-top tamarins were not able to
discriminate (ab)n from a(n)b(n) after training on a(n)b(n). The results were interpreted as evidence
that humans possess a unique ability to induce the hierarchical structure needed to process CFG,
while cotton-top tamarins are limited to the processing of less complex grammars.
However, the conclusion that participants can represent a(n)b(n) as a nested structure has been
challenged. Perruchet and Rey (2005) noted that it was not necessary to pair As and Bs to
discriminate between the two kinds of test strings; a simpler strategy based on counting and
detection of repetition could also explain performance. They showed that participants were unable
to pair As and Bs in structures involving mirror recursion (center-embedding with systematic
pairing of As and Bs that generate strings such as A3[A2[A1B1]B2]B3). Although later studies
reported successful learning of mirror recursion under specific conditions (Bahlmann & Friederici,
2006; de Vries et al., 2008, 2012), the authors of these studies all acknowledged that the processing
of surface distinctions could also account for performance. This comes from the fact that the
ungrammatical test strings necessarily differ in their surface expression from the grammatical
3
string: the correct rejection of an ungrammatical string can therefore also be due to the
representation of those surface properties.
Recent work has used fractal stimuli to explore hierarchical processing in the visual modality
(Martins et al., 2015; Martins et al., 2014; Martins et al., 2019), the auditory modality (Martins et
al., 2017; Martins et al., 2020), and in the motor domain (Martins et al., 2019). In this series of
studies, participants were performing a completion task on periodic fractals. For example, in
Martins et al. (2017) participants were first exposed to three auditory stimuli that were generated
either by the application of a recursive rule or by the application of an iterative rule. Participants
were then asked to choose between two stimuli the one that followed the rule at the higher
hierarchical level. The authors found that participants were able to select the correct continuation
when presented along with different foils. This suggests that participants were sensitive to the rules
and able to apply them to new hierarchical levels. However, these results do not demonstrate that
the rules were embedded because it was sufficient to apply the rule only to the highest hierarchical
level to solve the task: it was not necessary to apply the rule simultaneously at all the hierarchical
levels.
One way to avoid the pitfall of ungrammatical strings is to use a grammar that generates sequences
in which the learning of one regularity is conditioned by the learning of another, lower-level
regularity. This makes it possible to evaluate the depth of learning by comparing which regularities
the learner has identified. However, if a grammar contains its own test, it becomes complicated to
use the classical AGL paradigm. Indeed, asking participants to choose between two strings from the
same grammar that have the same surface characteristics would be equivalent to measuring which
particular arrangement is more frequent since both were present in the exposure phase. A procedure
that provides a direct measure of the learning performance throughout the whole task rather than at
the end of an exposure session (whose length is difficult to estimate a priori and may vary across
participants) seems more appropriate. The serial reaction time (SRT) paradigm (Nissen & Bullemer,
1987) permits such on-line monitoring of the participants’ learning performance. In the SRT task,
participants respond as quickly as possible to successively presented stimuli, usually by pressing
response keys. Each response triggers the presentation of the next stimulus, to which participants
respond anew. Learning typically manifests by a reduction in reaction times and is expected to take
place when a given trial is subject to anticipation.
Only a few studies have made use of this paradigm to explore the learning of hierarchical structure
and for most of them, the kind knowledge developed by participants involves algebraic patterns and
4
not nested structures. Koch and Hoffman (2000) were the first to report evidence suggesting
sensitivity to higher order properties of sequences in SRT. Participants were presented with
sequences consisting of 6 different digits. The sequences were periodic and 24 digits in length. The
participants' task was to respond to the digit presented on the screen with one of the six response
keys. The authors manipulated the relational structure of the sequences. In the third experiment, the
highly structured sequences were composed of four pairs of three elements that followed two
relational patterns. The first two pairs corresponded to a mirror relationship of an ascending and
descending order (e.g., 123-321) and the last two pairs corresponded to a transposition (e.g., 123-
234). The unstructured sequences were created by the permutation of the triplets in such a way to
break the relational patterns while keeping the statistical distribution identical (e.g., 123-345; 234-
321). The results showed a greater decrease in reaction times for participants in the structured than
in the unstructured condition, suggesting that they were sensitive to the sequences’ higher order
relational structure. However, an algebraic rule like “two mirror relations followed by two
transposition relations” is actually sufficient to account for the results, since the relational patterns
were not embedded in multiple levels.
In a slightly different task, the discrete sequence production task (DSP), Verway and Wright (2014)
trained participants by repeatedly presenting them with short sequences of six elements. The
elements were presented successively and each was associated with the location of an illuminated
square. The participants' task was to respond to the illuminated location with one of the six
associated buttons. Each presentation of a sequence was separated from the next by a clearly
perceptible interval. At each presentation of the sequence, the serial position of one of the six
elements was random. Thus, during the training phase, the sequences seen by the participants at
each presentation deviated by one element from the true sequence which never appeared during
training. In a later phase of testing, the authors presented the participants with the sequence without
deviations (i.e., the true but never seen sequence) as well as an unfamiliar sequence (i.e., a sequence
where the order of elements never matched the training phase). The results showed that participants
were faster in the no-deviation sequence than in the unfamiliar sequence, although they did not
practice either during the training phase. This suggests that during the training phase, participants
extracted probabilities related to the order of appearance (i.e., the probability that an element
appears in position 1, position 2, etc.) and combined that information into a representation capturing
the underlying pattern of the sequence. Like in the study of Koch and Hoffman (2000), those results
demonstrate learning of algebraic pattern, not of nested structures.
5
To our knowledge, only one SRT study reported results suggesting the use of nested structures,
which is that from Hunt and Aslin (2001). These authors presented probabilistic sequences in a
visual SRT task. The sequences were presented by illuminating buttons occupying different spatial
positions. In their Experiment 3, the sequence consisted of 4 pairs of elements where the transitional
probability from the first to the second element was 1, so the second element of a pair could always
be anticipated with certainty by the participants. On the other hand, the transition between pairs was
governed by the following probabilities: pairs A and B were each followed in 50% of the cases by
pair C and in the remaining 50% by pair D. Pairs C and D were each followed in 25% of the cases
by pair A and in 25% of the cases by pair B. Pair C was followed in 50% of the cases by pair D and
pair D in 50% of the cases by pair C. An additional restriction was that when pairs C and D were
contingent, the next pair had to be either A or B (thus prohibiting alternating CDC or DCD). The
authors observed that some participants became sensitive to the cumulative probability of the two
most frequent pairs. When pairs C and D were contingent, reaction times for the second element of
the pair in position 2 were faster than those for the second item of the same pair when it was in
position 1. Since the transitional probability was always 1 for the second element of a pair, the
effect can be explained only if participants have acquired the knowledge that the transition between
elements of a pair is embedded in the transition between pairs. This embedding of transition seems
more in line with a nested representation than a representation of an algebraic pattern; however, this
interpretation has some limitations. First, only 3 participants out of 10 showed the effect. Second,
the alternation CDC and DCD being prohibited, the transition following CD or DC was at chance
level (50% A and 50% B). Thus, the design of the materials prevented determining if participants
nested more than one relation, that is, if the transitions between pairs were themselves embedded
into transitions between multiples pairs. Nevertheless, the results suggest that transitional
information is sufficient to bootstrap the construction of nested representations.
In a recent study, Planton et al (2021) went further and explored if a simple form of temporal
sequence could give rise to nested representations. One of the simplest forms of temporal sequences
are binary sequences, and unlike more complex sequences like music or natural language, they have
the advantage of allowing maximal control of the input presented to the participants. This apparent
simplicity however preserves the possibility of creating highly complex sequences, which can be
expressed as nested tree structures. The authors presented short binary sequences in a violation
detection task. After an exposure phase, altered sequences that deviated by one item from the initial
sequences were presented to the participants. The participants' task was to report as quickly as
possible if they detected a violation. In order to vary the complexity of the sequences, the authors
developed a formal language containing a limited number of primitive instructions that could
6
generate any binary sequence. This allowed them to characterize each binary sequence in terms of
Kolmogorov Complexity. Kolmogorov complexity is a theoretical measure where the complexity of
a sequence is equal to the size of the shortest computer program that can generate it. Thus, the
complexity of a sequence was defined by the minimal number of primitive instructions needed to
generate it in the proposed language. The more the complexity of a sequence increases, the more its
most compressed representation requires the use of instruction nesting. The authors therefore
wanted to know if the participants' sequence representations were compressed in a similar way. To
separate the part of the performance explained by this compression process and the part that can be
attributed to the learning of transitional probabilities, the authors also measured in each test
sequence the Shannon surprise induced by the deviant stimuli. Shannon surprise (Shannon, 1948)
measures the degree of uncertainty of observing an item given the history of previous items and
thus reflects statistical learning. Since surprise is independent of complexity (it varies with the
position of the deviant within a sequence and is insensitive to sequence complexity that
characterizes a sequence as a whole), if participants process only the transitional probabilities of the
sequences, the degree of surprise of the deviant stimuli should be the only predictor of performance.
Conversely, the use of compression by participants should result in a significant portion of the
variance being explained by the degree of complexity of the sequences. The results showed that
both surprise and complexity were significant predictors of performance suggesting that
compression occurred along with statistical learning. This finding demonstrates that statistical
learning is insufficient to fully account for sequence processing: even when processing sequences as
simple as binary sequences, participants recode the sequence using a recursive compression
algorithm. However, this study did not assess the degree of compression of the participants. Indeed,
sensitivity to complexity, demonstrated by slower violation detection times in the most complex
sequences, does not imply that participants have compressed the sequence to the maximum, nor that
the primitive instructions of their formal language correspond to the mental operations of the
participants. Our study aims to go further by trying to characterize more precisely the mechanism
used by the participants to compress the signal.
1.1. Present study
The purpose of the present study is to evaluate, with the SRT paradigm, if participants represent
binary sequences of events as nested structures. In theory, recursive compression algorithms allow
an infinite number of hierarchical levels. This is obviously not the case for the human brain whose
processing capacity is finite, limiting the number of hierarchical levels it can represent.
Nevertheless, this limit cannot be defined a priori and can vary from one participant to another.
7
Thus, predefining in advance the hierarchical structure of a sequence and setting a maximum
number of levels does not allow for finely evaluating the hierarchical depth reached by the
participants. We avoided this problem by using aperiodic and self-similar sequences. A sequence
with these two properties combined has several advantages. First, the self-similar character of the
sequences does not limit a priori the hierarchical depth, which is theoretically infinite1. Second, the
aperiodic character of the sequences means that no matter how deep the hierarchical representations
are, they will necessarily be incomplete and will only explain part of the signal. Thus, the part not
explained by the hierarchical structure corresponds to the maximum hierarchical level reached. In
this way, it is not necessary to compare performance between grammatical and ungrammatical
stimuli because the learning is evaluated within the sequence. Crucially, the linear distribution of
points in the grammar is aperiodic, meaning that there is no linear function that can be used to
linearly predict when a point will occur. This prevents the use of low-level strategies like detecting
recurring patterns.
The sequences we will use are generated by a grammar derived from the Lindenmayer formalism
(L-systems). These grammars show interesting properties: there is no distinction between
rewriteable and non-rewriteable symbols, and rewrite rules apply simultaneously to all symbols
rather than sequentially from left-to-right in a string (Lindemayer, 1968; Vitányi, 1978). Because L-
systems do not distinguish rewriteable from non-rewriteable symbols, rule systems are simplified,
but still produce complex structural patterns. One instantiation of L-systems used in AGL
paradigms is the so-called Fibonacci grammar, which consists in two rewrite rules (Saddy, 2009;
Geambasu et al., 2016; Shirley, 2014) :
0 →1
1 →0 1
The interpretation of such a formalism is very simple: every instance of [0] in a sequence must be
‘rewritten as’ [1], and every instance of [1] in the same sequence must be rewritten as [01].
Applying these rules over and over again generates longer and longer sequences of symbols, each of
which corresponds to a ‘generation’ of the grammar. The name of this grammar comes from the fact
that each generation is the concatenation of the two previous ones; as a result, the number of
symbols in each generation actually follows the Fibonacci sequence (Fig. 1C). This results in an
asymmetry in the distribution of 0s and 1s with more 1s than 0s: the ratio of the number of 1s to 0s
approximates the golden ratio (1.618). Two transitions are possible in those sequences (from 0 and
1 Note that the hierarchical depth can of course only be infinite for an infinite chain. In the present study, the
presented sequences were 233 units long, and had potentially up to 12 hierarchical levels, which is presumably well
beyond the processing capacity of the cognitive system.
8
the next symbol and from 1 and the next symbol) and the probability of those transitions is also
asymmetric. The transition from 0 to 1 is deterministic: 0 is always followed by 1. The transition
from 1 to the next symbol is probabilistic: 1 is followed by 0 in 61.8% of the cases and by 1 in
38.2% of the cases. Thus, a point that appears after a 0 is non-ambiguous whereas a point that
appears after a 1 is ambiguous.
The most important property of this grammar with respect to our research question is its self-
similarity. Self-similarity implies that the transitions mentioned above are found not only in the
relations between points but also in the relations between groups of points (Fig. 1A right panel).
Those groups of points correspond to previous generations and are as such natural constituents.
Thus, any generation can be seen as a multiple embedding of constituents that reflects the
hierarchical structure of the grammar. To access this constituent structure, the processing
mechanism may start by merging the elements linked by a deterministic transition, and then use the
output of this process, i.e., the higher order constituents, to detect the deterministic transitions at the
next hierarchical level. This process of recursive combination would progressively transform the
representation of the sequence into a complex hierarchical structure of embedded constituents (Fig.
1A left panel).
This leads to an interesting observation: points that follow a probabilistic transition can appear
inside a constituent that follows a deterministic transition. Therefore, the detection of higher-order
deterministic transitions serves to disambiguate some of the ambiguous points at a lower level.
Crucially, the higher the hierarchical structure is, the more ambiguous points will be disambiguated.
Nevertheless, due to the aperiodicity of the string, there will always remain a subset of unresolved
ambiguous points that can lead to new embedding, no matter the depth of the hierarchy. Thus, each
hierarchical level corresponds to a specific learning pattern of points: points that are still ambiguous
at this level and points that are disambiguated at this level and lower levels. The maximum level
reached corresponds to the last level where this pattern is observed.
9
Structural processing in the Fibonacci grammar has already been explored via the classical AGL
paradigm (Geambaşu et al., 2016, 2020). However, these studies have run into the inherent problem
of the habituation/discrimination paradigm, as the non-grammatical test strings used did not allow
for conclusions about what was actually learned by the participants. Two other studies (Vender et
10
Fig 1. (A) Left panel: depiction of the first three hierarchical levels of generation 8 of the Fibonacci
grammar. Ambiguous points at each level are highlighted in red and non-ambiguous points in green. To
form a new hierarchical level, units that span across a deterministic transition are combined together (this is
illustrated by the arrows). The result is a new representation of the string that consists in the combination of
units corresponding to natural higher-order constituents of the grammar (illustrated by the brackets). At
each level, constituents spanning a deterministic transition can be combined to form an embedded
hierarchy. Right panel: transition probabilities between constituents at each level. (B) Vectors of
disambiguated points (green) and non-disambiguated points (red) for each hierarchical level for generation
8 of the Fibonacci grammar. In the present study, we used generation 13 of the Fibonacci grammar that
consists in 233 symbols. We did not illustrate this generation due space limitation, but the rationale is
identical. (C) Derivation of the Fibonacci grammar for the first 6 generations. The right column shows the
number of symbols at each generation, which maps the Fibonacci sequence. Arrows and circles highlight
the hierarchical constituency of the grammar. (D) Structural contexts at levels 1, 2 and 3. Green bars point
to the constituents in non-ambiguous structural contexts at each level and red bars point to the same
constituents when in ambiguous structural contexts. Arrows illustrate the fact that, with the exception of the
first unit, points that occur inside constituents have the same transitional probability regardless if the
constituent is in an ambiguous or non-ambiguous structural context. (E) Transitional probabilities for
disambiguated and non-disambiguated points at each level given the sub-sequence that precedes them. We
see that the sub-sequence that precedes a disambiguated point is equal to 1 whereas the sub-sequence that
precedes a non-disambiguated point is equal to .38.
al., 2019, 2020) explored the Fibonacci grammar by way of an SRT task: a sequence of blue and red
dots generated by the Fibonacci grammar was presented to the participants whose task was to press
the left or right button corresponding to the color of each dot. Sequences of dots were implemented
in a a Simon task: dots appeared to the left or to the right side of the screen, such that the colored
dot sometimes appeared to the opposite side of the corresponding key. Such incongruent trials
occurred every sixth trials. The Simon task was introduced to make the task less repetitive for
participants. In the 2020 study, the authors added a final block within which the order of appearance
of stimuli followed an alternative grammar called Skip, which has similar surface properties to Fib:
0 is always followed by 1 (p(1|0) = 1), the sub-sequence 11 is always followed by 0 (p(0|11) = 1)
and the first order transitional probabilities are relatively similar: p(0|1) = .73 and p(1|1) = .27 but
differ from the latter from a formal point of view. The authors proposed that within the Fibonacci
grammar, the identification of certain points, called k-points, would allow the reconstruction of the
local hierarchical structure of the sequence due to their specific structural status. Indeed, the
distance between two k-points exactly mirrors the transitional probability of the minimal units of
the sequence (see Krivochen et al., 2018 for a detailed explanation). Linearly, k-points are the last 1
of the 3-gram [011] and correspond to the constituent [1] of level 1 (shown in Fig. 1-A left panel)
whose transitional probability is p(1|1) = .38. In Skip, although the surface expression of the k-
points is present (Skip has the 3-gram [011]), their identification would not allow the reconstruction
of the local hierarchical structure because the distance between them does not mirror the statistical
distribution of minimal units. In other words, in contrast to Fib, the self-similarity of Skip does not
allow to extend the local statistical regularities at a higher hierarchical level. Vender et al. (2020)
found faster processing for the last 1 of the 3-gram [011] in Fib blocks than in the Skip block. They
interpreted this as evidence that participants had granted a special status to k-points, suggesting that
they partially reconstructed the hierarchical structure of the Fibonacci grammar.
However, a more detailed analysis of the sequences generated by the Skip grammar shows an
inversion of the second order transitional probabilities. In Skip, k-points have a second order
conditional probability of p(1|01) = .36 while in Fib it is equal to p(1|01) = .62. Thus, the slower
processing observed for the last 1 of the 3-gram [011] in Skip block could also be explained by
participants becoming sensitive to the fact that 01 is more frequently followed by 0 than by 1. The
effect can therefore also be explained byflatstatistical learning processes. Moreover, the Simon
task introduces a factor that occurs periodically; Fibonacci grammar being aperiodic, incongruent
trials are not distributed evenly in the sequence, which makes the impact of this factor difficult to
evaluate.
11
In the present study, we implemented Fibonacci sequences in an SRT task, thus avoiding the need to
create non-grammatical Fib-strings (like in Geambaşu et al., 2016, 2020). In contrast to Vender et
al. (2019, 2020), dots were presented in the center of the screen, to avoid the interfering congruency
factor introduced by the Simon task. Importantly, we developed new analyses, substantially
different from those conducted in these 4 papers, which allowed us to more finely tease apart
hierarchical learning from the tracking of surface regularities. We carried out two analyses to asses
hierarchical processing. The first analysis (Processing of hierarchical structure) explored whether
specific points were anticipated based on the hierarchical structure of the grammar. To this end, we
compared reaction times and accuracy for points disambiguated at a particular hierarchical level to
points not disambiguated at the same level. For clarity, disambiguated points will refer to points that
can be anticipated at a certain hierarchical level and non-disambiguated points will refer to points
that cannot be anticipated at the same hierarchical level (Fig 1B). Hierarchical processing should
result in a larger decrease in reaction times and better accuracy for disambiguated points compared
to non-disambiguated points. We do not have any prior expectation with respect to how many levels
the participants might reach. We will therefore evaluate each level successively until the effects
disappear at the group level (see Fig. 1A left panel for levels descriptions). In order to control for
frequency effects that could be due to the asymmetry of the sequence (1s being more frequent than
0s), we compared, for each hierarchical level, only 1s to 1s and 0s to 0s. Anticipating the results, we
found evidence of learning at levels 1, 2 and 3 but not at level 4 (which is why this level is not
presented in Fig. 1A and Fig. 1B).
The second analysis (Processing of hierarchical constituency) explored whether participants were
sensitive to the constituent structure of the Fibonacci grammar. If we examine closely the
constituents of each level, we see that the first position is always occupied by either a
disambiguated or a non-disambiguated point (Fig. 1A left panel) whereas the following positions
are composed of points disambiguated at the previous levels. Crucially, the remaining positions of
the constituent following a deterministic transition and of the constituent following a low
probabilistic transition (Fig. 1A right panel) are occupied by points disambiguated at the same
levels. In other words, a point disambiguated at level n can appear at level n+1 in either a
constituent that follows a deterministic transition or in a constituent that follows a probabilistic
transition, while the composition of the constituents is identical (except for the point in the first
position). Thus, the same disambiguated point appears higher in the hierarchy subsumed in a
different structural context. We refer to the condition where a disambiguated point appears at a
higher level inside a constituent that follows a deterministic transition as a non-ambiguous
structural context and to the condition where a disambiguated point appears at a higher level in a
12
constituent following a probabilistic transition as an ambiguous structural context (Fig. 1D). If the
system is sensitive to the hierarchical constituency of the sequence, disambiguated points appearing
at the upper level in a non-ambiguous structural context should be processed faster than the same
disambiguated points appearing in an ambiguous structural context.
2. Methods
2.1. Participants
One hundred seventy-four students (33 men and 141 women; mean age 22.8 years old) participated
in the experiment. They were recruited either from an introductory psycholinguistics course from
the university of Geneva or through announcements at the University of Geneva. All participants
reported normal or corrected-to-normal vision.
2.2. Materials
The training sequence was composed of two elements and had a length of 50. The order was
pseudo-randomized and elements had the same frequency. The training sequence included multiple
non-grammatical sub-sequences such as 00 or 111. The longest Fib-grammatical sub-sequence had a
length of 4. In the experimental blocks, the sequence consisted of generation 12 of the Fibonacci
grammar which has 233 symbols. Each block corresponded to the full generation.
2.3. Design and procedure
Each trial consisted of a blue or red circle 100px in diameter presented at the center of the screen.
The circles disappeared after the response of the participant, or after 1200ms, if no response was
given. The response to stimulus interval lasted 500 ms. Participants were asked to press as quickly
as possible the button corresponding to the color of the circle (X=blue, N=red). Keys X and N were
chosen because they had a similar position on QWERTZ and AZERTY keyboards. No information
related to the grammar was given. The experiment started with a training block that was identical
for all the participants. During the training block, when the participants made an error, the
experiment stopped and a message appeared to remind them the color key association, the
experiment resumed after 3000ms. In the experimental blocks, no message appeared when they
made an error. After the training block, participants did 5 experimental blocks of 233 trials.
The experiment was conducted online on the website Testable (https://www.testable.org/) (Rezlescu
et al., 2020). Participants were asked to perform the experiment in a quiet environment where they
13
could not be disturbed. Instructions were displayed on the screen and participants had to click on a
button to start the experiment. The experiment lasted approximately 25 minutes.
2.4. Data analyses
Four participants were removed due to technical failures. We also removed participants who had an
error rate superior to 3 SD to the mean error rate in at least one block. This led to the removal of 11
additional participants. Due to an error in the experiment code, the data of the training block was
not recorded. Reaction times and accuracy were both modelled as dependent variables. We removed
from the analysis all the trials where participants did not respond after 1200 ms (699 trials). For the
analysis of reaction times, only trials with a correct answer were included. Homoscedasticity and
normality were checked by visual inspection of residual plots. Data from the remaining 159
participants were analyzed with linear mixed-effects models as implemented in the lme4 package
for R (Bates et al., 2014; R Development Core Team, 2021).
For the analysis Processing of hierarchical structure, models included two fixed-effect factors and
their interaction: Time, Ambiguity, and Time*Ambiguity. Time was treated as a continuous variable
with a value of 0 for trials of the 1st experimental block, and of 1, 2, 3 and 4 for trials of the 2nd, 3d,
4th and 5th blocks. This factor allowed us modeling the evolution of performance throughout the
experiment. Ambiguity is a discrete variable contrasting disambiguated and non-disambiguated
points and operationalized differently depending on the level at which its effect is explored (it is
labeled Ambiguity leveln according to the level at which it has been operationalized). We entered as
fixed effects the factors Ambiguity leveln (Disambiguated vs. Non-disambiguated), Time, and the
interaction Ambiguity*Time. The modality “Non-disambiguated” of the factor Ambiguity leveln was
always set as the intercept of the models. The factor Block is a discrete variable corresponding to
the 5 blocks of the experiment. As random effects, the models had intercepts for both Participants
and Participants*Block, allowing us to attribute a different intercept for each participant for each
block. P-values were calculated by way of the Satterthwaites’s approximation to degrees of freedom
with the lmerTest package (Kuznetsova et al., 2015).
For the analysis Processing of hierarchical constituency, models included two fixed-effect factors
and their interaction: Time, Structural context, and Time*Structural context. Structural context is a
discrete variable contrasting disambiguated points that appeared at the next level in constituents that
either followed a deterministic transition (Non-ambiguous) or a probabilistic transition
(Ambiguous). This variable is operationalized differently depending on the level at which its effect
is explored (it is labeled Structural context leveln according to the level at which it has been
14
operationalized). Since at each level, the first unit of a constituent is either a disambiguated or a
non-disambiguated point we excluded those first units when we computed the mean RTs and
accuracy of the constituents (Fig. 1D). The same mixed models were ran as in previous analysis,
including Structural context and Time as fixed factors and the modality “Ambiguous” of the factor
Structural context leveln was always set as the intercept. We excluded from the RT analyses all the
constituents where there was at least one error. At level 2, the constituent of interest ([101]) contain
3 points, for analyzing accuracy, we computed the mean number of correct answers for the last two
points of [101] (i.e., the two disambiguated points that appeared in both structural contexts) and
divided it by 2 in order to have a value that ranged from 0 to 1 (we did not consider the first point of
[101] because it could either be disambiguated or non-disambiguated point depending on the
structural context). At level 2, the accuracy value for the constituent was either 1 (no error), 0.5 (1
error) or 0 (2 errors). For analyzing accuracy of structural context in level 3, we followed the same
logic but with the constituent [01101]. We computed the mean number of correct answers for the
four disambiguated points that appeared in both structural context and divided it by 4 in order to
have a value that ranged from 0 to 1. At level 3, the accuracy value for the constituent could either
be 1 (no error), 0.75 (1 error), 0.5 (2 errors), 0.25 (3 errors), or 0 (4 errors).
We first explored if participants were sensitive to the surface statistical properties of the sequence,
corresponding to level 0, and then if they were able to detect the higher-order deterministic
transitions at levels 1-4 (see Fig. 1B). We then explored if participants were sensitive to the
constituent structure of the grammar by comparing, at each level, disambiguated points occurring in
different structural contexts (see Fig. 1D). Finally, we analyzed performance at the individual level
to more finely explore the effect of structural context at level 3 found at the group level.
3. Results
3.1. Processing of hierarchical structure
3.1.1. Processing of surface statistical regularities (Level 0)
Analyses of reaction times showed a main effect of Time (β = -21.47, SE = 0.85, t = -25.12, p <
.000) with a mean reduction of reaction times of 86 ms from block 1 to block 5. There was also a
main effect of Ambiguity level0 (β = -56.82, SE = 0.68, t = -83.33, p < .000) with disambiguated
points being faster than non-disambiguated ones by 57 ms. The interaction Ambiguity level0*Time
was also significant (β = -14.25, SE = 0.48, t = -29.71, p < .000) with a more important reduction
15
over time for disambiguated points (Mblock1 – block5 = -106 ms) than non-disambiguated points (Mblock1 –
block5 = -49 ms) (Mblock1 block5 indicates the mean difference between blocks 1 and 5). Results are
shown in Fig. 2.
Concerning accuracy, we found a main effect of Time (β = -0.06, SE = 0.01, z = -5.137, p< .000)
with a mean reduction of accuracy of 1 % from block 1 to block 5. There was also a main effect of
Ambiguity level0 (β = 2.26, SE = 0.03, z = 58.173, p< .000) with higher accuracy for disambiguated
points (M = 0.98) than for non-disambiguated points (M = 0.90). The effect of Time significantly
interacted with Ambiguity level0 (β = 0.23, SE = 0.03, z = 8.484, p< .000) with accuracy increasing
for disambiguated points over time (Mblock1 block5 = 0.006) and decreasing for non-disambiguated
points (Mblock1 – block5 = -0.037). Results are shown in Table 1.
3.1.2. Processing of hierarchical regularities (Levels 1-4)
Hierarchical processing at level 1
Analyses of reaction times showed a main effect of Time (β = -18.29, SE = 0.88, t = -20.80, p <
.000) with a mean reduction of reaction times of 73 ms from block 1 to block 5. There was also a
main effect of Ambiguity level1 (β = -55.92, SE = 0.88, t = -63.24, p < .000) with disambiguated
points being faster than non-disambiguated ones by 56 ms. The interaction Ambiguity level1*Time
was also significant (β = -15.22, SE = 0.62, t = -25.504, p < .000) with a more important reduction
over time for disambiguated points (Mblock1 – block5 = -95 ms) than non-disambiguated points (Mblock1 –
block5 = -34 ms). Results are shown in Fig. 2.
Concerning accuracy, we found a main effect of Time (β = -0.054, SE = 0.01, z = -4.563, p< .000)
with a mean reduction of accuracy of 1.2 % from block 1 to block 5. There was also a main effect of
Ambiguity level1 (β = 1.35, SE = 0.03, z = 42.059, p< .000) with accuracy higher for disambiguated
points (M = 0.96) than for non-disambiguated points (M = 0.88). The effect of Time significantly
interacted with Ambiguity level1 (β = 0.20, SE = 0.02, z = 8.829, p< .000) with accuracy increasing
for disambiguated points over time (Mblock1 block5 = 0.01) and decreasing for non-disambiguated
points (Mblock1 – block5 = -0.05). Results are shown in Table 1.
16
Table 1
Mean Proportion (M) and Standard Deviation (SD) of Correct Responses for Disambiguated and Non-
Disambiguated Points by Hierarchical Levels and Blocks
Block 1 Block 2 Block 3 Block 4 Block 5
M SD M SD M SD M SD M SD
Level 0 Disambiguated 0.98 0.13 0.99 0.09 0.99 0.11 0.99 0.10 0.99 0.09
Non-disambiguated 0.93 0.26 0.91 0.28 0.90 0.30 0.89 0.31 0.89 0.31
Level 1 Disambiguated 0.96 0.20 0.96 0.19 0.96 0.19 0.97 0.18 0.97 0.17
Non-disambiguated 0.91 0.28 0.89 0.32 0.87 0.34 0.86 0.35 0.86 0.35
Level 2 Disambiguated 0.92 0.27 0.91 0.28 0.9 0.30 0.9 0.30 0.9 0.24
Non-disambiguated 0.94 0.23 0.90 0.30 0.89 0.31 0.88 0.32 0.88 0.33
Level 3 Disambiguated 0.91 0.28 0.89 0.32 0.87 0.33 0.87 0.34 0.87 0.34
Non-disambiguated 0.91 0.28 0.89 0.32 0.88 0.34 0.84 0.36 0.85 0.36
Hierarchical processing at level 2
Analyses of reaction times showed a main effect of Time (β = -12.26, SE = 0.85, t = -14.422, p <
.000) with a mean reduction of reaction times of 49 ms from block 1 to block 5. There was also a
main effect of Ambiguity level2 (β = -7.37, SE = 1.01, t = -7.272, p < .000) with disambiguated
points being faster than non-disambiguated ones by 7 ms. The interaction Ambiguity level2*Time
was also significant (β = -7.51, SE = 0.71, t = -10.518, p < .000) with a more important reduction
over time for disambiguated points (Mblock1 – block5 = -61 ms) than non-disambiguated points (Mblock1 –
block5 = -31 ms). Results are shown in Fig. 3.
Concerning accuracy, we found a main effect of Time (β = -0.10, SE = 0.01, z = -7.834, p< .000)
with a mean reduction of accuracy of 3.5 % from block 1 to block 5. There was also a main effect of
Ambiguity level2 (β = 0.07, SE = 0.03, z = 2.209, p = .027) with accuracy higher for disambiguated
points (M = 0.91) than for non-disambiguated points (M = 0.89). The effect of Time significantly
interacted with Ambiguity level2 (β = 0.09, SE = 0.02, z = 3.850, p< .000) with accuracy decreasing
less for disambiguated points over time (Mblock1 block5 = -0.02) than for non-disambiguated points
(Mblock1 – block5 = -0.06). Results are shown in Table 1.
17
Hierarchical processing at level 3
Analyses of reaction times showed a main effect of Time (β = -8.55, SE = 0.87, t = -9.877, p < .000)
with a mean reduction of reaction times of 34 ms from block 1 to block 5. There was also a main
effect of Ambiguity level3 (β = -12.29, SE = 1.43, t = -8.622, p < .000) with disambiguated points
being faster than non-disambiguated ones by 12 ms. The interaction Ambiguity level3*Time was also
significant (β = -2.84, SE = 1.00, t = -2.830, p = .004) with a more important reduction over time for
disambiguated points (Mblock1 – block5 = -38 ms) than non-disambiguated points (Mblock1 – block5 = -27 ms).
Results are shown in Fig. 3.
Fig. 2. Mean RT (ms) for Disambiguated and Non-disambiguated points of Hierarchical Levels 0 and 1 by
Block. Errors bars denote the 95% confidence interval.
Concerning accuracy, we found a main effect of Time (β = -0.13, SE = 0.01, z = -8.568, p< .000)
with a mean reduction of accuracy of 5.2 % from block 1 to block 5. There was also a main effect of
Ambiguity level3 (β = 0.09, SE = 0.04, z = 2.200, p = .028) with accuracy higher for disambiguated
points (M = 0.88) than for non-disambiguated points (M = 0.87). The interaction Time* Ambiguity
level3 was marginally significant (β = 0.05, SE = 0.03, z = 1.655, p< .096) with accuracy decreasing
less for disambiguated points over time (Mblock1 block5 = -0.04) than for non-disambiguated points
(Mblock1 – block5 = -0.06). Results are shown in Table 1.
18
Hierarchical processing at level 4
Analyses of reaction times showed a main effect of Time (β = -7.47, SE = 0.82, t = -9.127, p < .000)
with a mean reduction of reaction times of 30 ms from block 1 to block 5. There was no main effect
of Ambiguity level4 (β = 0.27, SE = 1.52, t = 0.182, p = .855) and the interaction Ambiguity
level4*Time was also not significant (β = 1.676, SE = 1.07, t = 1.563, p = .118).
Concerning accuracy, we found a main effect of Time (β = -0.16, SE = 0.02, z = -7.837, p < .000)
with a mean reduction of accuracy of 6 % from block 1 to block 5. There was no main effect of
Ambiguity level4 (β = -0.02, SE = 0.05, z = -0.298, p = .765) and the interaction Ambiguity
level4*Time was also not significant (β = 0.024, SE = 0.04, z = 0.613, p = .540).
Fig. 3. Mean RT (ms) for Disambiguated and Non-disambiguated points of Hierarchical Levels 2
and 3 by Block. Errors bars denote the 95% confidence interval.
3.2. Processing of hierarchical constituency
The results above suggest that participants were sensitive to the higher-order regularities of the
sequence up to the third level, we thus restricted the analysis of the structure constituency to level 1,
2 and 3.
19
Hierarchical constituency at level 1
Analyses of reaction times showed a main effect of Time (β = -26.49, SE = 0.99, t = -26.655, p <
.000) with a mean reduction of reaction times of 106 ms from block 1 to block 5. There was also a
main effect of Structural contextlevel1 (β = 4.92, SE = 0.86, t = 5.709, p < .000) with points in an
ambiguous structural context faster than points in a non-ambiguous structural context by 4.9 ms.
The interaction Structural contextlevel1*Time was not significant (β = -0.85, SE = 0.61 t = -1.394, p
= .163).
Concerning accuracy, we found a main effect of Time (β = 0.15, SE = 0.03, z = 4.753, p < .000) with
accuracy increasing of 0.7 % from block 1 to block 5. There was no main effect of Structural
contextlevel1 (β = -0.02, SE = 0.08, z = -0.275, p = .783). However, the interaction Structural
contextlevel1* Time was significant (β = 0.14, SE = 0.05, z = 2.572, p = .010) with accuracy increasing
more for points in non-ambiguous structural context (Mblock1 – block5 = 0.009) than points in ambiguous
structural context (Mblock1 – block5 = 0.004). Results are shown in Table 2.
Hierarchical constituency at level 2
Analyses of reaction times showed a main effect of Time (β = -25.24, SE = 0.93, t = -27.121, p <
.000) with a mean reduction of reaction times of 101 ms from block 1 to block 5. There is was main
effect of Structural contextlevel2 (β = -1.19, SE = 0.82, t = -1.453, p = .146). The interaction
Structural contextlevel2*Time was also not significant (β = -0.18, SE = 0.58, t = -0.311, p = .756).
Concerning accuracy, we found a main effect of Time (β = 0.002, SE = 0.0005, t = 3.788, p< .000)
with accuracy increasing of 0.8 % from block 1 to block 5. There was a marginally significant main
effect of Structural contextlevel2 (β = -0.002, SE = 0.001, t = -1.710, p = .087) with accuracy slightly
better for points in an ambiguous structural context (M = 0.98) than for points in a non-ambiguous
structural context (M = 0.97). The interaction Structural contextlevel2 *Time was not significant (β =
0.0008, SE = 0.0008, t = 0.934, p = .350). Results are shown in Table 2.
Table 2
Mean Proportion (M) and Standard Deviation (SD) of Correct Responses for Ambiguous and Non-ambiguous
Structural Context by Hierarchical Levels and Blocks
Block 1 Block 2 Block 3 Block 4 Block 5
20
M SD M SD M SD M SD M SD
Level 1 Non-ambiguous Structural Context 0.98 0.13 0.99 0.10 0.99 0.09 0.99 0.09 0.99 0.09
Ambiguous Structural Context 0.99 0.11 0.99 0.09 0.99 0.11 0.99 0.10 0.99 0.10
Level 2 Non-ambiguous Structural Context 0.97 0.13 0.97 0.12 0.97 0.12 0.98 0.12 0.98 0.12
Ambiguous Structural Context 0.97 0.12 0.97 0.11 0.97 0.11 0.98 0.11 0.98 0.11
Level 3 Non-ambiguous Structural Context 0.95 0.11 0.96 0.11 0.96 0.12 0.96 0.11 0.96 0.11
Ambiguous Structural Context 0.95 0.11 0.96 0.11 0.96 0.11 0.95 0.12 0.95 0.12
Hierarchical constituency at level 3
Analyses of reaction times showed a main effect of Time (β = -22.93, SE = 0.91, t = -25.18, p <
.000) with a mean reduction of reaction times of 92 ms from block 1 to block 5. There was also a
main effect of Structural contextlevel3 (β = -4.01, SE = 0.89, t = -4.50, p < .000) with points in non-
ambiguous structural context faster than points in ambiguous structural context by 4 ms. The
interaction Structural contextlevel2*Time was significant (β = -1.62, SE = 0.63 t = -2.573, p = .010)
with a more important reduction over time for points in a non-ambiguous structural context ( Mblock1 –
block5 = -94 ms) than for points in an ambiguous structural context (Mblock1 block5 = -88ms). Fig. 4
shows the results plotted for each disambiguated point of the ambiguous and non-ambiguous
structural context.
With respect to accuracy, we found no main effect of Time (β = 0.0003, SE = 0.0006, t = -0.608, p=
.544). There was a significant main effect of Structural contextlevel3 (β = 0.003, SE = 0.001, t =
2.236, p = .025) with accuracy better for points in a non-ambiguous structural context (M = 0.96)
than for points in an ambiguous structural context (M = 0.95). The interaction Structural contextlevel3
*Time was significant (β = 0.002, SE = 0.0009, t = 2.384, p = .017) with accuracy increasing for
points in a non-ambiguous structural context over time (Mblock1 block5 = 0.004) and decreasing for
points in an ambiguous structural context (Mblock1 – block5 = -0.005). Results are shown in Table 2.
21
Fig. 4. Mean RT (ms) of disambiguated points occurring in Ambiguous (dashed lines) and Non-ambiguous
(solid lines) Structural Contexts at Level 3 by Position and Blocks. The position number indicates the
serial order in the constituent [01101], from left to right. Errors bars denote the 95% confidence interval.
4. Discussion
The aim of the present study was to evaluate if binary sequences can be processed as nested
structures. To do so, we created aperiodic self-similar sequences from the Fibonacci grammar, and
tested adult participants’ learning of their properties in an SRT task. The transitions within these
sequences can be considered from a hierarchical point of view. Sequences being self-similar,
transitions between units at level n are identical to transitions between constituents at level n+1. At
each level, the transitions are either probabilistic or deterministic. Crucially, the probabilistic
transitions at level n are embedded in deterministic transitions at level n+1. It is thus possible to
reduce the number of probabilistic transitions by recursively embedding deterministic transitions.
This recursive structure allows us to predict precisely which unit can be anticipated if the
underlying hierarchical structure of the sequence is processed.
We hypothesized that hierarchical processing would result in a progressive construction of the
underlying, nested structure. This should be reflected by (a) a progressive ability to anticipate
22
specific points in the sequence that are ambiguous at level n, but disambiguated at level n+1, and
(b) a better anticipation for disambiguated points appearing at level n+1 in a constituent following a
deterministic transition (non-ambiguous structural context) compared to the same disambiguated
points occurring at level n+1 in a constituent following a probabilistic transition (ambiguous
structural context).
In line with the first prediction, we found that for levels 0, 1, 2 and 3, disambiguated points showed
a steeper reduction of RTs through exposure than their non-disambiguated counterparts. Accuracy
was also systematically better for disambiguated points, thus excluding the possibility that the
reduction of RTs was due to a speed accuracy trade-off. We found no sign of anticipation at level 4.
This suggests that participants were able to build the structure up to the third hierarchical level.
Concerning the second prediction, we found that accuracy increased significantly more in non-
ambiguous structural contexts than in ambiguous structural contexts at level 1, suggesting
progressive learning of the constituent structure at level 1. Results also showed that points occurring
in an ambiguous structural context were overall faster than when they appeared in a non-ambiguous
structural context. However, that effect was there from the beginning of the sequence, i.e., it did not
interact with time, which suggests that it does not reflect learning. In fact, it is plausible that
transitional probabilities hid the effect of structural context. Indeed, the points measured at this level
correspond to the simplest regularity of the grammar (the remaining elements of level 1 correspond
to disambiguated points of level 0 whose transitional probability is p(1|0) = 1). Since reaction times
for these points show the strongest decrease, it is possible that a floor level was reached thus
making the effect of structural context undetectable. At level 2, performance was marginally better
for points in an ambiguous structural context. However, this effect failed to interact with time,
suggesting again that it does not reflect an effect of learning. Level 3 showed the predicted effect of
structural context in both RTs and accuracy, with a significant reduction of RTs and a significant
accuracy increase for the non-ambiguous structural context compared to the ambiguous structural
context. Taken together, those results suggest a sensitivity to the constituent structure of the
grammar.
This sensitivity to structure constituency also addresses potential linear precedence based
alternative explanations for the better anticipation of disambiguated points compared to non-
disambiguated points found in the first analysis. While it is true that disambiguated points are
systematically preceded by a specific sub-sequence that never precedes non-disambiguated points
23
of the same level, transitions between sub-sequences of identical length and their following non-
disambiguated points are probabilistic (Fig. 1E). Thus, while disambiguated points could potentially
be anticipated based on their linear precedence, non-disambiguated points should be similarly
accessible. Moreover, it is worth noting that accounting for the anticipation of disambiguated points
with linear precedence faces numerous challenges. First, such explanations would be very costly in
terms of memory resource. The linear sub-sequences needed to anticipate the disambiguated points
overlaps (see Fig. 5), hence the parser would need to track in parallel all the different patterns.
Second, the sequence being binary, the patterns are distinguishable only by their positional order;
the parser must therefore also be able to deal with the interference caused by the similarity in the
patterns’ elements. Finally, the pattern allowing anticipation of disambiguated points would have to
be held in memory for a relatively long time. In the present experiment, the pattern retention time
would include the 500 ms of the response- to- stimulus interval (RSI) and the time to answer the
trial. If we consider a mean reaction time of 300 ms per trials, the patterns allowing anticipation of
disambiguated points at level 1, 2 and 3 should be held in memory for 1.6, 3.2 and 5.5 seconds,
respectively. Thus, in order to account for the results, a linear precedence parser would have to
overcome these three requirements : overlapping patterns, interference caused by item similarity
and long retention time in working memory. The attentional cost induced by these constraints casts
doubt that a simple pattern recognition mechanism could be a plausible candidate to account for
anticipation of disambiguated points.
In addition, the effects of hierarchical constituency found at levels 1 and 3 can simply not be
explained by linear precedence: if better anticipation for the disambiguated points was due to
participants memorizing the sub-sequence preceding them, the structural context in which they
occur should have no influence given that in both ambiguous and non-ambiguous structural
contexts, disambiguated points were preceded by exactly the same sub-sequence. In order to explain
those effects of structural context, participants must organize the input in a hierarchical way.
Fig. 5. Sub-sequences (blue) preceding disambiguated points (green) by hierarchical
levels. We see that the linear sub-sequences necessary to anticipate the disambiguated
points of each levels overlap.
24
Figure 4 shows that the advantage for the non-ambiguous structural context was not driven by one
particular point but was distributed across all the points that appeared in that context. This last
finding is interesting as it tells us something about the type of hierarchical structure participants
built. We have suggested that the process by which participants anticipate higher-order regularities
would consist in the recursive combination of units linked through deterministic transitions.
However, such a mechanism does not necessarily need to represent a unit as embedded in multiple
hierarchical levels; the parser could only retain a representation of the highest level’s constituents
and anticipate the constituents as wholes. In that view, lower levels’ constituents are dissolved into
higher levels’ constituents and become inaccessible once these higher levels’ constituents are
represented. In other words, the internal hierarchical structure of the constituents might dissolve as
hierarchical building progresses. Such hypothesis is assumed in different models of chunking in
which there is no record of the sequential steps by which a chunk is formed (Perruchet & Vinter,
1998; French et al., 2011; Goldwater et al., 2009; McCauley & Christiansen, 2014; Robinet et al.,
2011). For example, in PARSER (Perruchet & Vinter, 1998), the system chunks together units that
are present in the focus of attention. The span of this focus changes randomly at each trial
(encompassing 1, 2 or 3 units). Once a chunk is created, it is processed as a single unit in the focus
of attention. Thus, if a chunk reoccurs in the signal, it will occupy only one slot in the focus of
attention. This allows the model to chunk multiple chunks together if they are present at the same
time in the focus of attention. The activation value of a chunk decreases at each trial if it is not in
the focus of attention and increases each time the chunk is encountered. When multiple chunks in
memory correspond to the signal (that is when the signal could fit with chunks of different sizes)
the activation value of the chunk with the best fit increases while the activation value of the chunks
with a lower fit decreases. In this way, the small chunks that are created in the early phases of
learning have their activation values progressively tend to 0 as bigger chunks that embed them are
created. This results in a representation where only the biggest chunks that fit the signal are kept in
memory whereas the smaller chunks that allowed the creation of these bigger chunks are
progressively erased from memory. In this view, cognitive representations are limited to chunks
with no internal hierarchical structure.
Evidence supporting this claim comes from the so-called sub-unit effect that shows that sub-units of
a chunk are less accessible once a chunk is learned (Fiser & Aslin, 2005; Giroux & Rey, 2009;
Orbán et al., 2008; Slone & Johnson, 2015; 2018). In SRT experiments, this manifests as relatively
slow RTs for the first unit of a chunk followed by an acceleration for the remaining units (Hunt &
Aslin, 2001; Sakai et al., 2003; Jiménez et al., 2011). In our experiment, if participants were
processing constituents as single units without internal structure, RTs should progressively diminish
25
through the constituent. This should be especially true for constituents appearing in the non-
ambiguous structural contexts at level 3. This constituent (01101) is composed of 5 points and 4
transitions: if it were processed as a single unit, the transition from one point to the next should
result in a progressive reduction of RTs, and the transitional pattern should thus be ( - - - - ) (where
“-”corresponds to a diminution of RTs from each unit to the following). In contrast to that
prediction, the transitional pattern observed for this constituent in the last two blocks is ( - + - - )
(where “+”corresponds to an increase of RTs), i.e., there was a strong deceleration at the second
transition. Crucially, that deceleration appears precisely at the border between two constituents at
the lower level: the internal structure of [01101] is indeed [[01][101]]. The pattern of
acceleration/deceleration therefore provides further evidence that participants represent the internal
structure of constituent [01101].
In order to make sure that the deceleration at the second transition observed at the group level was
not driven by a subset of participants we computed for each participant the direction of the 4
transitions of the constituent in the non-ambiguous structural context at level 3. We ran by-
participants comparisons with 4 linear models (one for each transition). The factor Position had two
modalities (before, after), “before” coded for the points that was before the transition and “after”
coded for the point after the transition. Each model had as predictor the factors Participants, and the
interaction Participants* Position (the factor Position was entered only in the interaction term in
order to compare the effect of position for the same individual and not across individuals). In order
to increase statistical power, we computed transitions for blocks 4 and 5 jointly (see supplementary
materials for detailed results). Table 3 shows the number of participants by transition pattern. We
see that 78 % of the participants show a deceleration at the second transition, 22% show no
variation in RTs, and critically none shows acceleration. This shows that the transition pattern ( - + -
- ) found at the group level is replicated at the individual level, and is therefore not due to a mix of
different patterns across participants. We also see that the transitional pattern ( - - - - ), expected if
chunks lost their internal structure, was found in no participants, suggesting that the constituent
[01101] was never processed as a single unit. Crucially, 93 % of the slow-downs occurred at the
second and third transitions, that is, at the boundary between lower level constituents. This suggests
that participants represent several hierarchical levels simultaneously: the pattern reflects the
processing of the internal structure [[01][[1][01]]] of the constituent [01101]. This observation
brings further support to our hypothesis that sequences are represented as recursive embedding of
constituents.
26
Table 3
Distribution of the statistical effects for the four transitional patterns in Blocks 4 and 5 combined for the
constituent [01101] in Non-ambiguous Structural Context.
Transitional pattern N° of obs.
Transition 1 Transition 2 Transition 3 Transition 4
- + - - 30
- + = - 32
- + - = 31
- + = = 22
- = = = 20
- + - + 5
- = - + 3
- = - = 3
= = = = 3
- + + - 2
= = - = 2
- = + - 1
- = = - 1
- = = + 1
= + - = 1
= + = = 1
= = - + 1
- - - - 0
Note. The + sign indicate a significant increase in reaction times. The - sign indicate a significant decrease in
reaction times. The = sign indicate no significant differences in reaction times. Significant difference were
considered at the p < 0.05 level.
Our results also confirm the finding of Planton et al (2020) that even sequences as simple as binary
sequences can be processed hierarchically. Our proposal that the parser relies on the statistical
regularities of the signal to access higher-level constituents is also consistent with the results
reported by these authors regarding the involvement of statistical learning. Indeed, this component
explained a significant part of the variance even in sequences with high Kolmogorov complexity.
The idea that the degree of complexity of the input is the factor that will lead the system to recode
the information has also been put forward to explain how the system induces rules from a set of
exemplars (Pothos, 2010; Radulescu et al., 2019, 2021). In particular, Radulescu et al. (2019, 2021)
proposed that the recoding of information into a more abstract format depends on the complexity of
the signal and the finite encoding capabilities of the cognitive system. The degree of entropy of a
27
signal (i.e., its complexity) depends on the number of items that compose it as well as on the
homogeneity of the distribution of these items. The more homogeneous the distribution (i.e., all
items have the same probability) and the longer the signal, the higher the entropy is. Radulescu
argues that rule induction arises when the entropy level exceeds the encoding capacity of the
system. This upper limit of the amount of information that can be sent through the channel per unit
of time forces the system to compress the information into a more abstract format in order to reduce
the level of entropy. We suggest that the construction of a hierarchical structure can be seen as a
way to reduce the entropic state of the parser: uncertainty is reduced as the hierarchical structure of
the signal is built, in line with the proposition of Radulescu et al., (2019, 2021). However, the
particularity of the Fibonacci grammar is that at each level, the statistical distribution of the
constituents is identical, due to the specific flavor of self-similarity of the Fibonacci grammar. An
interesting line for future research could be to ask whether and how self-similarity may play a role
in the compression of the input, since it is independent of the entropy of the signal. The rich world
of L-systems allows such manipulation, that is manipulating the degree of isomorphism of the self-
similarity while keeping entropy constant.
4. Supporting information
Appendix A. Supplementary material
Data associated with this article can be found electronically at h ttps://osf.io/8n9he/?
view_only=6f4b42d8e0d7429a984d9a8ff96ad4ba
5. References
Bahlmann, J., & Friederici, A. D. (2006). FMRI investigation of the processing of simple linear and
embedded hierarchical structures : An artificial grammar task. 126.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). Lme4 : Linear mixed-effects models using
Eigen and S4. R package version 1.1-7.
Chomsky, N. (1957). Logical structures in language. American Documentation (pre-1986), 8(4),
284.
Chomsky, N., & Lightfoot, D. W. (2002). Syntactic Structures. Walter de Gruyter.
Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., & Pallier, C. (2015). The Neural Representation
of Sequences : From Transition Probabilities to Algebraic Patterns and Linguistic Trees.
Neuron, 88(1), 2-19. https://doi.org/10.1016/j.neuron.2015.09.019
28
de`Vries, M. H., Monaghan, P., Knecht, S., & Zwitserlood, P. (2008). Syntactic structure and
artificial grammar learning : The learnability of embedded hierarchical structures. Cognition,
107(2), 763-774. https://doi.org/10.1016/j.cognition.2007.09.002
de`Vries, M. H., Petersson, K. M., Geukes, S., Zwitserlood, P., & Christiansen, M. H. (2012).
Processing multiple non-adjacent dependencies : Evidence from sequence learning.
Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1598),
2065-2076. https://doi.org/10.1098/rstb.2011.0414
Fiser, J., & Aslin, R. N. (2005). Encoding Multielement Scenes : Statistical Learning of Visual
Feature Hierarchies. Journal of Experimental Psychology: General, 134(4), 521-537.
https://doi.org/10.1037/0096-3445.134.4.521
Fitch, W. T. (2014). Toward a computational framework for cognitive biology : Unifying
approaches from cognitive neuroscience and comparative cognition. Physics of Life
Reviews, 11(3), 329-364. https://doi.org/10.1016/j.plrev.2014.04.005
Fitch, W. T., & Hauser, M. D. (2004). Computational Constraints on Syntactic Processing in a
Nonhuman Primate. Science, 303(5656), 377-380. https://doi.org/10.1126/science.1089401
Fitch, W. T., & Martins, M. D. (2014). Hierarchical processing in music, language, and action :
Lashley revisited. Annals of the New York Academy of Sciences, 1316(1), 87-104.
https://doi.org/10.1111/nyas.12406
French, R. M., Addyman, C., & Mareschal, D. (2011). TRACX : A recognition-based connectionist
framework for sequence segmentation and chunk extraction. Psychological Review, 118(4),
614-636. https://doi.org/10.1037/a0025255
Geambaşu, A., Ravignani, A., & Levelt, C. C. (2016). Preliminary Experiments on Human
Sensitivity to Rhythmic Structure in a Grammar with Recursive Self-Similarity. Frontiers in
Neuroscience, 10. https://doi.org/10.3389/fnins.2016.00281
Geambaşu, A., Toron, L., Ravignani, A., & Levelt, C. C. (2020). Rhythmic Recursion? Human
Sensitivity to a Lindenmayer Grammar with Self-similar Structure in a Musical Task. Music
& Science, 3, 205920432094661. https://doi.org/10.1177/2059204320946615
Giroux, I., & Rey, A. (2009). Lexical and Sublexical Units in Speech Perception. Cognitive Science,
33(2), 260-272. https://doi.org/10.1111/j.1551-6709.2009.01012.x
Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A Bayesian framework for word
segmentation : Exploring the effects of context. Cognition, 112(1), 21-54.
https://doi.org/10.1016/j.cognition.2009.03.008
Honing, H., & Zuidema, W. (2014). Decomposing dendrophilia. Physics of Life Reviews, 11(3),
375-376. https://doi.org/10.1016/j.plrev.2014.06.020
29
Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task : Access to
separable statistical cues by individual learners. Journal of Experimental Psychology:
General, 130(4), 658-680. https://doi.org/10.1037/0096-3445.130.4.658
Jiménez, L., Méndez, A., Pasquali, A., Abrahamse, E., & Verwey, W. (2011). Chunking by colors :
Assessing discrete learning in a continuous serial reaction-time task. Acta Psychologica,
137(3), 318-329. https://doi.org/10.1016/j.actpsy.2011.03.013
Koch, I., & Hoffmann, J. (2000). Patterns, chunks, and hierarchies in serial reaction-time tasks.
Psychological Research, 63(1), 22-35. https://doi.org/10.1007/PL00008165
Koelsch, S. (2005). Neural substrates of processing syntax and semantics in music. Current Opinion
in Neurobiology, 15(2), 207-212. https://doi.org/10.1016/j.conb.2005.03.005
Kotz, S. A., Ravignani, A., & Fitch, W. T. (2018). The Evolution of Rhythm Processing. Trends in
Cognitive Sciences, 22(10), 896-910. https://doi.org/10.1016/j.tics.2018.08.002
Kovács, Á. M., & Endress, A. D. (2014). Hierarchical Processing in Seven-Month-Old Infants.
Infancy, 19(4), 409-425. https://doi.org/10.1111/infa.12052
Krivochen, D., Phillips, B., & Saddy, J. (2018). Classifying points in Lindenmayer systems :
Transition probabilities and structure reconstruction (v. 1.1).
https://doi.org/10.13140/RG.2.2.25719.88484
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2015). Package ‘lmertest’. R package
version, 2(0), 734.
Lai, J., & Poletiek, F. H. (2011). The impact of adjacent-dependencies and staged-input on the
learnability of center-embedded hierarchical structures. Cognition, 118(2), 265-273.
https://doi.org/10.1016/j.cognition.2010.11.011
Lai, J., & Poletiek, F. H. (2013). How “small” is “starting small” for learning hierarchical centre-
embedded structures? Journal of Cognitive Psychology, 25(4), 423-435.
https://doi.org/10.1080/20445911.2013.779247
Lashley, K. S. (1951). The problem of serial order in behavior (Vol. 21). Bobbs-Merrill Oxford,
United Kingdom.
Levelt, W. J. M. (2019). On Empirical Methodology, Constraints, and Hierarchy in Artificial
Grammar Learning. Topics in Cognitive Science, 0(0). https://doi.org/10.1111/tops.12441
Lewis, S., & Phillips, C. (2015). Aligning Grammatical Theories and Language Processing Models.
Journal of Psycholinguistic Research, 44(1), 27-46. https://doi.org/10.1007/s10936-014-
9329-z
Lindenmayer, A. (1968). Mathematical models for cellular interactions in development II. Simple
and branching filaments with two-sided inputs. Journal of Theoretical Biology, 18(3),
300-315. https://doi.org/10.1016/0022-5193(68)90080-5
30
Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule Learning by Seven-Month-Old
Infants. Science, 283(5398), 77-80. https://doi.org/10.1126/science.283.5398.77
Martins, M. D., Gingras, B., Puig-Waldmueller, E., & Fitch, W. T. (2017). Cognitive representation
of “musical fractals” : Processing hierarchy and recursion in the auditory domain. Cognition,
161, 31-45. https://doi.org/10.1016/j.cognition.2017.01.001
Martins, M. de J. D., Muršič, Z., Oh, J., & Fitch, W. T. (2015). Representing visual recursion does
not require verbal or motor resources. Cognitive Psychology, 77, 20-41.
https://doi.org/10.1016/j.cogpsych.2015.01.004
Martins, M. J. D., Bianco, R., Sammler, D., & Villringer, A. (2019). Recursion in action : An fMRI
study on the generation of new hierarchical levels in motor sequences. Human Brain
Mapping, 40(9), 2623-2638. https://doi.org/10.1002/hbm.24549
Martins, M. J. D., Fischmeister, F. Ph. S., Gingras, B., Bianco, R., Puig-Waldmueller, E., Villringer,
A., Fitch, W. T., & Beisteiner, R. (2020). Recursive music elucidates neural mechanisms
supporting the generation and detection of melodic hierarchies. Brain Structure and
Function, 225(7), 1997-2015. https://doi.org/10.1007/s00429-020-02105-7
Martins, M. J. D., Krause, C., Neville, D. A., Pino, D., Villringer, A., & Obrig, H. (2019). Recursive
hierarchical embedding in vision is impaired by posterior middle temporal gyrus lesions.
Brain, 142(10), 3217-3229. https://doi.org/10.1093/brain/awz242
Martins, M. J., Fischmeister, F. P., Puig-Waldmüller, E., Oh, J., Geißler, A., Robinson, S., Fitch, W.
T., & Beisteiner, R. (2014). Fractal image perception provides novel insights into
hierarchical cognition. NeuroImage, 96, 300-308.
https://doi.org/10.1016/j.neuroimage.2014.03.064
Maruyama, M., Pallier, C., Jobert, A., Sigman, M., & Dehaene, S. (2012). The cortical
representation of simple mathematical expressions. NeuroImage, 61(4), 1444-1460.
https://doi.org/10.1016/j.neuroimage.2012.04.020
McCauley, S. M., & Christiansen, M. H. (2014). Acquiring formulaic language : A computational
model. The Mental Lexicon, 9(3), 419-436. https://doi.org/10.1075/ml.9.3.03mcc
Monti, M. M., Parsons, L. M., & Osherson, D. N. (2012). Thought Beyond Language : Neural
Dissociation of Algebra and Natural Language. Psychological Science, 23(8), 914-922.
https://doi.org/10.1177/0956797612437427
Mueller, J. L., Bahlmann, J., & Friederici, A. D. (2010). Learnability of Embedded Syntactic
Structures Depends on Prosodic Cues. Cognitive Science, 34(2), 338-349.
https://doi.org/10.1111/j.1551-6709.2009.01093.x
31
Nakai, T., & Sakai, K. L. (2014). Neural Mechanisms Underlying the Computation of Hierarchical
Tree Structures in Mathematics. PLOS ONE, 9(11), e111439.
https://doi.org/10.1371/journal.pone.0111439
Nissen, M. J., & Bullemer, P. (1987). Attentional requirements of learning : Evidence from
performance measures. Cognitive Psychology, 19(1), 1-32. https://doi.org/10.1016/0010-
0285(87)90002-8
Orbán, G., Fiser, J., Aslin, R. N., & Lengyel, M. (2008). Bayesian learning of visual chunks by
human observers. Proceedings of the National Academy of Sciences, 105(7), 2745-2750.
Perruchet, P., & Rey, A. (2005). Does the mastery of center-embedded linguistic structures
distinguish humans from nonhuman primates? Psychonomic Bulletin & Review, 12(2),
307-313. https://doi.org/10.3758/BF03196377
Perruchet, P., & Vinter, A. (1998). PARSER : A Model for Word Segmentation. Journal of Memory
and Language, 39(2), 246-263. https://doi.org/10.1006/jmla.1998.2576
Planton, S., Kerkoerle, T. van, Abbih, L., Maheu, M., Meyniel, F., Sigman, M., Wang, L., Figueira,
S., Romano, S., & Dehaene, S. (2021). A theory of memory for binary sequences : Evidence
for a mental compression algorithm in humans. PLOS Computational Biology, 17(1),
e1008598. https://doi.org/10.1371/journal.pcbi.1008598
Pothos, E. (2010). An entropy model for artificial grammar learning. Frontiers in Psychology, 1.
https://doi.org/10.3389/fpsyg.2010.00016
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Radulescu, S., Kotsolakou, A., Wijnen, F., Avrutin, S., & Grama, I. (2021). Fast but Not Furious.
When Sped Up Bit Rate of Information Drives Rule Induction. Frontiers in Psychology, 12,
4987. https://doi.org/10.3389/fpsyg.2021.661785
Radulescu, S., Wijnen, F., & Avrutin, S. (2019). Patterns Bit by Bit. An Entropy Model for Rule
Induction. Language Learning and Development, 1-32.
https://doi.org/10.1080/15475441.2019.1695620
Rezlescu, C., Danaila, I., Miron, A., & Amariei, C. (2020). Chapter 13 - More time for science :
Using Testable to create and share behavioral experiments faster, recruit better participants,
and engage students in hands-on research. In B. L. Parkin (Éd.), Progress in Brain Research
(Vol. 253, p. 243-262). Elsevier. https://doi.org/10.1016/bs.pbr.2020.06.005
Robinet, V., Lemaire, B., & Gordon, M. B. (2011). MDLChunker : A MDL-Based Cognitive Model
of Inductive Learning. Cognitive Science, 35(7), 1352-1389. https://doi.org/10.1111/j.1551-
6709.2011.01188.x
32
Saddy, J. D. (2009). Perceiving and processing recursion in formal grammars. Recursion: Structural
Complexity in Language and Cognition Conference at the University of Massachusetts
(Amherst).
Sakai, K., Kitaguchi, K., & Hikosaka, O. (2003). Chunking during human visuomotor sequence
learning. Experimental Brain Research, 152(2), 229-242. https://doi.org/10.1007/s00221-
003-1548-8
Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical
journal, 27(3), 379-423.
Shirley, E. J. (2014). Representing and remembering Lindenmayer-grammars [Ph.D., University of
Reading]. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.658878
Simon, H. A. (1962). An information processing theory of intellectual development. Monographs of
the Society for Research in Child Development, 150-161.
Slone, L., & Johnson, S. P. (2015). Statistical and Chunking Processes in Adults’ Visual Sequence
Learning. CogSci.
Slone, L. K., & Johnson, S. P. (2018). When learning goes beyond statistics : Infants represent
visual sequences in terms of chunks. Cognition, 178, 92-102.
https://doi.org/10.1016/j.cognition.2018.05.016
Vender, M., Krivochen, D. G., Compostella, A., Phillips, B., Delfitto, D., & Saddy, D. (2020).
Disentangling sequential from hierarchical learning in Artificial Grammar Learning :
Evidence from a modified Simon Task. PLOS ONE, 15(5), e0232687.
https://doi.org/10.1371/journal.pone.0232687
Vender, M., Krivochen, D. G., Phillips, B., Saddy, D., & Delfitto, D. (2019). Implicit Learning,
Bilingualism, and Dyslexia : Insights From a Study Assessing AGL With a Modified Simon
Task. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01647
Verwey, W. B., & Wright, D. L. (2014). Learning a keying sequence you never executed : Evidence
for independent associative and motor chunk learning. Acta Psychologica, 151, 24-31.
https://doi.org/10.1016/j.actpsy.2014.05.017
Vitányi, P. M. B., & Walker, A. (1978). Stable string languages of lindenmayer systems. Information
and Control, 37(2), 134-149. https://doi.org/10.1016/S0019-9958(78)90483-7
33
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The language abilities of young and adult learners range from memorizing specific items to finding statistical regularities between them (item-bound generalization) and generalizing rules to novel instances (category-based generalization). Both external factors, such as input variability, and internal factors, such as cognitive limitations, have been shown to drive these abilities. However, the exact dynamics between these factors and circumstances under which rule induction emerges remain largely underspecified. Here, we extend our information-theoretic model (Radulescu et al., 2019), based on Shannon’s noisy-channel coding theory, which adds into the “formula” for rule induction the crucial dimension of time: the rate of encoding information by a time-sensitive mechanism. The goal of this study is to test the channel capacity-based hypothesis of our model: if the input entropy per second is higher than the maximum rate of information transmission (bits/second), which is determined by the channel capacity, the encoding method moves gradually from item-bound generalization to a more efficient category-based generalization, so as to avoid exceeding the channel capacity. We ran two artificial grammar experiments with adults, in which we sped up the bit rate of information transmission, crucially not by an arbitrary amount but by a factor calculated using the channel capacity formula on previous data. We found that increased bit rate of information transmission in a repetition-based XXY grammar drove the tendency of learners toward category-based generalization, as predicted by our model. Conversely, we found that increased bit rate of information transmission in complex non-adjacent dependency aXb grammar impeded the item-bound generalization of the specific a_b frames, and led to poorer learning, at least judging by our accuracy assessment method. This finding could show that, since increasing the bit rate of information precipitates a change from item-bound to category-based generalization, it impedes the item-bound generalization of the specific a_b frames, and that it facilitates category-based generalization both for the intervening Xs and possibly for a/b categories. Thus, sped up bit rate does not mean that an unrestrainedly increasing bit rate drives rule induction in any context, or grammar. Rather, it is the specific dynamics between the input entropy and the maximum rate of information transmission.
Article
Full-text available
Working memory capacity can be improved by recoding the memorized information in a condensed form. Here, we tested the theory that human adults encode binary sequences of stimuli in memory using an abstract internal language and a recursive compression algorithm. The theory predicts that the psychological complexity of a given sequence should be proportional to the length of its shortest description in the proposed language, which can capture any nested pattern of repetitions and alternations using a limited number of instructions. Five experiments examine the capacity of the theory to predict human adults’ memory for a variety of auditory and visual sequences. We probed memory using a sequence violation paradigm in which participants attempted to detect occasional violations in an otherwise fixed sequence. Both subjective complexity ratings and objective violation detection performance were well predicted by our theoretical measure of complexity, which simply reflects a weighted sum of the number of elementary instructions and digits in the shortest formula that captures the sequence in our language. While a simpler transition probability model, when tested as a single predictor in the statistical analyses, accounted for significant variance in the data, the goodness-of-fit with the data significantly improved when the language-based complexity measure was included in the statistical model, while the variance explained by the transition probability model largely decreased. Model comparison also showed that shortest description length in a recursive language provides a better fit than six alternative previously proposed models of sequence encoding. The data support the hypothesis that, beyond the extraction of statistical knowledge, human sequence coding relies on an internal compression using language-like nested structures.
Article
Full-text available
Processing of recursion has been proposed as the foundation of human linguistic ability. Yet this ability may be shared with other domains, such as the musical or rhythmic domain. Lindenmayer grammars (L-systems) have been proposed as a recursive grammar for use in artificial grammar experiments to test recursive processing abilities, and previous work had shown that participants are able to learn such a grammar using linguistic stimuli (syllables). In the present work, we used two experimental paradigms (a yes/no task and a two-alternative forced choice) to test whether adult participants are able to learn a recursive Lindenmayer grammar composed of drum sounds. After a brief exposure phase, we found that participants at the group level were sensitive to the exposure grammar and capable of distinguishing the grammatical and ungrammatical test strings above chance level in both tasks. While we found evidence of participants’ sensitivity to a very complex L-system grammar in a non-linguistic, potentially musical domain, the results were not robust. We discuss the discrepancy within our results and with the previous literature using L-systems in the linguistic domain. Furthermore, we propose directions for future music cognition research using L-system grammars.
Article
Full-text available
The ability to generate complex hierarchical structures is a crucial component of human cognition which can be expressed in the musical domain in the form of hierarchical melodic relations. The neural underpinnings of this ability have been investigated by comparing the perception of well-formed melodies with unexpected sequences of tones. However, these contrasts do not target specifically the representation of rules generating hierarchical structure. Here, we present a novel paradigm in which identical melodic sequences are generated in four steps, according to three different rules: The Recursive rule, generating new hierarchical levels at each step; The Iterative rule, adding tones within a fixed hierarchical level without generating new levels; and a control rule that simply repeats the third step. Using fMRI, we compared brain activity across these rules when participants are imagining the fourth step after listening to the third (generation phase), and when participants listened to a fourth step (test sound phase), either well-formed or a violation. We found that, in comparison with Repetition and Iteration, imagining the fourth step using the Recursive rule activated the superior temporal gyrus (STG). During the test sound phase, we found fronto-temporo-parietal activity and hippocampal de-activation when processing violations, but no differences between rules. STG activation during the generation phase suggests that generating new hierarchical levels from previous steps might rely on retrieving appropriate melodic hierarchy schemas. Previous findings highlighting the role of hippocampus and inferior frontal gyrus may reflect processing of unexpected melodic sequences, rather than hierarchy generation per se.
Article
Full-text available
In this paper we probe the interaction between sequential and hierarchical learning by investigating implicit learning in a group of school-aged children. We administered a serial reaction time task, in the form of a modified Simon Task in which the stimuli were organised following the rules of two distinct artificial grammars, specifically Lindenmayer systems: the Fibonacci grammar (Fib) and the Skip grammar (a modification of the former). The choice of grammars is determined by the goal of this study, which is to investigate how sensitivity to structure emerges in the course of exposure to an input whose surface transitional properties (by hypothesis) bootstrap structure. The studies conducted to date have been mainly designed to investigate low-level superficial regularities, learnable in purely statistical terms, whereas hierarchical learning has not been effectively investigated yet. The possibility to directly pinpoint the interplay between sequential and hierarchical learning is instead at the core of our study: we presented children with two grammars, Fib and Skip, which share the same transitional regularities, thus providing identical opportunities for sequential learning, while crucially differing in their hierarchical structure. More particularly, there are specific points in the sequence (k-points), which, despite giving rise to the same transitional regularities in the two grammars, support hierarchical reconstruction in Fib but not in Skip. In our protocol, children were simply asked to perform a traditional Simon Task, and they were completely unaware of the real purposes of the task. Results indicate that sequential learning occurred in both grammars, as shown by the decrease in reaction times throughout the task, while differences were found in the sensitivity to k-points: these, we contend, play a role in hierarchical reconstruction in Fib, whereas they are devoid of structural significance in Skip. More particularly, we found that children were faster in correspondence to k-points in sequences produced by Fib, thus providing an entirely new kind of evidence for the hypothesis that implicit learning involves an early activation of strategies of hierarchical reconstruction, based on a straightforward interplay with the statistically-based computation of transitional regularities on the sequences of symbols.
Article
Full-text available
From limited evidence, children track the regularities of their language impressively fast and they infer generalized rules that apply to novel instances. This study investigated what drives the inductive leap from memorizing specific items and statistical regularities to extracting abstract rules. We propose an innovative entropy model that offers one consistent information-theoretic account for both learning the regularities in the input and generalizing to new input. The model predicts that rule induction is an encoding mechanism gradually driven as a natural automatic reaction by the brain’s sensitivity to the input complexity (entropy) interacting with the finite encoding power of the human brain (channel capacity). In two artificial grammar experiments with adults we probed the effect of input complexity on rule induction. Results showed that as the input becomes more complex, the tendency to infer abstract rules increases gradually.
Article
Full-text available
The generation of hierarchical structures is central to language, music and complex action. Understanding this capacity and its potential impairments requires mapping its underlying cognitive processes to the respective neuronal underpinnings. In language, left inferior frontal gyrus and left posterior temporal cortex (superior temporal sulcus/middle temporal gyrus) are considered hubs for syntactic processing. However, it is unclear whether these regions support computations specific to language or more generally support analyses of hierarchical structure. Here, we address this issue by investigating hierarchical processing in a non-linguistic task. We test the ability to represent recursive hierarchical embedding in the visual domain by contrasting a recursion task with an iteration task. The recursion task requires participants to correctly identify continuations of a hierarchy generating procedure, while the iteration task applies a serial procedure that does not generate new hierarchical levels. In a lesion-based approach, we asked 44 patients with left hemispheric chronic brain lesion to perform recursion and iteration tasks. We modelled accuracies and response times with a drift diffusion model and for each participant obtained parametric estimates for the velocity of information accumulation (drift rates) and for the amount of information accumulated before a decision (boundary separation). We then used these estimates in lesion-behaviour analyses to investigate how brain lesions affect specific aspects of recursive hierarchical embedding. We found that lesions in the posterior temporal cortex decreased drift rate in recursive hierarchical embedding, suggesting an impaired process of rule extraction from recursive structures. Moreover, lesions in inferior temporal gyrus decreased boundary separation. The latter finding does not survive conservative correction but suggests a shift in the decision criterion. As patients also participated in a grammar comprehension experiment, we performed explorative correlation-analyses and found that visual and linguistic recursive hierarchical embedding accuracies are correlated when the latter is instantiated as sentences with two nested embedding levels. While the roles of the inferior temporal gyrus and posterior temporal cortex in linguistic processes are well established, here we show that posterior temporal cortex lesions slow information accumulation (drift rate) in the visual domain. This suggests that posterior temporal cortex is essential to acquire the (knowledge) representations necessary to parse recursive hierarchical embedding in visual structures, a finding mimicking language acquisition in young children. On the contrary, inferior frontal gyrus lesions seem to affect recursive hierarchical embedding processing by interfering with more general cognitive control (boundary separation). This interesting separation of roles, rooted on a domain-general taxonomy, raises the question of whether such cognitive framing is also applicable to other domains.
Article
Full-text available
This paper presents an experimental study investigating artificial grammar learning in monolingual and bilingual children, with and without dyslexia, using an original methodology. We administered a serial reaction time task, in the form of a modified Simon task, in which the sequence of the stimuli was manipulated according to the rules of a simple Lindenmayer grammar (more specifically, a Fibonacci grammar). By ensuring that the subjects focused on the correct response execution at the motor stage in presence of congruent or incongruent visual stimuli, we could meet the two fundamental criteria for implicit learning: the absence of an intention to learn and the lack of awareness at the level of resulting knowledge. The participants of our studies were four groups of 10-year-old children: 30 Italian monolingual typically developing children, 30 bilingual typically developing children with Italian L2, 24 Italian monolingual dyslexic children, and 24 bilingual dyslexic children with Italian L2. Participants were administered the modified Simon task developed according to the rules of the Fibonacci grammar and tested with respect to the implicit learning of three regularities: (i) a red is followed by a blue, (ii) a sequence of two blues is followed by a red, and (iii) a blue can be followed either by a red or by a blue. Results clearly support the hypothesis that learning took place, since participants of all groups became increasingly sensitive to the structure of the input, implicitly learning the sequence of the trials and thus appropriately predicting the occurrence of the relevant items, as manifested by faster reaction times in predictable trials. Moreover, group differences were found, with bilinguals being overall faster than monolinguals and dyslexics less accurate than controls. Finally, an advantage of bilingualism in dyslexia was found, with bilingual dyslexics performing consistently better than monolingual dyslexics and, in some conditions, at the level of the two control groups. These results are taken to suggest that bilingualism should be supported also among linguistically impaired individuals.
Chapter
A major pain for researchers in all fields is that they have less and less time for actual science activities: reading, thinking, coming up with new theories and hypotheses, testing, analyzing data, writing. In psychology, three of the most time-consuming nonactual science activities are: learning how to program an experiment, recruiting participants, and preparing teaching materials. Testable (www.testable.org) provides a suite of academic tools to speed things up considerably. The Testable software allows the development of most psychology experiments in minutes, using a natural language form and a spreadsheet. Furthermore, any experiment can be easily converted into a social experiment in Testable Arena, with multiple participants interacting and viewing each other's responses. Experiments can then be published to Testable Library, a public repository for demonstration and sharing purposes. Participants can be recruited from Testable Minds, the subject pool with the most advanced participants verification system. Testable Minds employs multiple checks (such as face authentication) to ensure participants have accurate demographics (age, sex, location), are human, unique, and reliable. Finally, the Testable Class module can be used to teach psychology through experiments. It features over 50 ready-made classic psychology experiments, fully customizable, which instructors can add to their classes, together with their own experiments. These experiments can then be made available to students to do, import, modify, and use to collect data as part of their class. These Testable tools, backed up by a strong team of academic advisors and thousands of users, can save psychology researchers and other behavioral scientists valuable time for science.
Article
This paper considers the AGL literature from a psycholinguistic perspective. It first presents a taxonomy of the experimental familiarization test procedures used, which is followed by a consideration of shortcomings and potential improvements of the empirical methodology. It then turns to reconsidering the issue of grammar learning from the point of view of acquiring constraints, instead of the traditional AGL approach in terms of acquiring sets of rewrite rules. This is, in particular, a natural way of handling long‐distance dependences. The final section addresses an underdeveloped issue in the AGL literature, namely how to detect latent hierarchical structure in AGL response patterns.