Content uploaded by Marco Zorzi
Author content
All content in this area was uploaded by Marco Zorzi on Apr 29, 2019
Content may be subject to copyright.
https://doi.org/10.1177/0956797618823540
Psychological Science
2019, Vol. 30(3) 386 –395
© The Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0956797618823540
www.psychologicalscience.org/PS
ASSOCIATION FOR
PSYCHOLOGICAL SCIENCE
Research Article
Learning to read is foundational for literacy develop-
ment, yet a large percentage of children in primary
school (~5%–17%) fail to become efficient and autono-
mous readers despite normal intelligence and school-
ing, a condition referred to as developmental dyslexia
(Snowling, 2000). Research on developmental dyslexia
has documented deficits in vision (Stein & Walsh, 1997),
attention (Vidyasagar & Pammer, 2010), auditory and
temporal processes (Vandermosten etal., 2010), and
phonology and language (Hulme, Nash, Gooch, Lervåg,
& Snowling, 2015; Snowling, 2001). It remains a chal-
lenge to link the various deficits to the precise learning
mechanisms that cause atypical reading development.
Computational models provide a unique tool for
understanding how deficits in component skills affect
the mechanisms or representations that underlie read-
ing development. Harm and Seidenberg (1999) were
the first to use a computational modeling approach to
understand developmental dyslexia. They assumed, in
line with mainstream theories of reading acquisition
(Ziegler & Goswami, 2005), that learning to read con-
sisted of mapping an orthographic code onto a preex-
isting phonological system, modeled with an attractor
neural network that learned phonological structure
from phonetic input. Then, following the dominant
view of dyslexia as being caused by a core phonologi-
cal deficit (Vellutino, Fletcher, Snowling, & Scanlon,
2004), they impaired the phonological network to cre-
ate impoverished representations and trained the model
to map orthography onto them. A mild phonological
823540PSSXXX10.1177/0956797618823540Perry et al.Simulating Dyslexia
research-article2019
Corresponding Author:
Conrad Perry, Faculty of Life and Social Sciences (Psychology),
Swinburne University of Technology, John Street, Hawthorn, Victoria,
3122, Australia
E-mail: ConradPerry@gmail.com
Understanding Dyslexia Through
Personalized Large-Scale Computational
Models
Conrad Perry1, Marco Zorzi2,3,4 , and Johannes C. Ziegler5
1Faculty of Health, Arts and Design, Swinburne University of Technology; 2Department of General
Psychology, University of Padova; 3Padova Neuroscience Center, University of Padova; 4Fondazione
Ospedale San Camillo IRCCS, Venice Lido, Italy; and 5Laboratoire de Psychologie Cognitive, Centre
National de la Recherche Scientifique, Aix-Marseille University
Abstract
Learning to read is foundational for literacy development, yet many children in primary school fail to become efficient
readers despite normal intelligence and schooling. This condition, referred to as developmental dyslexia, has been
hypothesized to occur because of deficits in vision, attention, auditory and temporal processes, and phonology and
language. Here, we used a developmentally plausible computational model of reading acquisition to investigate how
the core deficits of dyslexia determined individual learning outcomes for 622 children (388 with dyslexia). We found
that individual learning trajectories could be simulated on the basis of three component skills related to orthography,
phonology, and vocabulary. In contrast, single-deficit models captured the means but not the distribution of reading
scores, and a model with noise added to all representations could not even capture the means. These results show that
heterogeneity and individual differences in dyslexia profiles can be simulated only with a personalized computational
model that allows for multiple deficits.
Keywords
dyslexia, computer simulation, reading
Received 1/10/18; Revision accepted 10/5/18
Simulating Dyslexia 387
impairment resulted in impaired nonword reading (e.g.,
blorf ) but not irregular word reading (e.g., aisle, yacht,
pint), a moderate impairment resulted in strong deficits
in nonword reading but smaller deficits in irregular
word reading, and a severe deficit caused very strong
deficits in both nonword and irregular word reading.
These simulations provided a proof of concept that one
can impair a model such that it reflects impaired read-
ing performance. However, they did not investigate
how the size of the phonological deficit for any given
child would affect his or her reading outcomes. More-
over, they did not investigate how different types of
impairments, including nonphonological deficits, affect
reading outcomes. This issue is of great importance
because it has become increasingly clear that the causes
of developmental dyslexia are multifactorial (Menghini
etal., 2010).
In the present research, we went a major step further.
First, we implemented a developmentally plausible com-
putational model of reading acquisition that learns to
read in the same way children do, that is, through a
combination of explicit teaching (i.e., direct instruction),
phonological decoding, and self-teaching (Share, 1995).
Second, we used real data from one of the biggest dys-
lexia samples (Peterson, Pennington, & Olson, 2013; 622
children, 388 of whom have dyslexia) to set up 622
individual models, in which the efficiency of key mecha-
nisms and representations was set up using individual
measures in tasks that tap these component skills. Third,
we simulated the real reading performance of these 622
children using exactly the same words that the children
read. Fourth, we investigated whether a multideficit
model was superior to three alternative models that
represent different major theories of developmental dys-
lexia: the core phonological-deficit model (Vellutino
etal., 2004), a visual-deficit model (Stein, 2014), and a
noisy computation model (Hancock, Pugh, & Hoeft,
2017). Finally, we investigated how changing the effi-
ciency of a given component skill affects individual
learning outcomes for word and nonword reading.
Model Description and Method
The model is presented in Figure 1a. The basic archi-
tecture was taken from the connectionist dual-process
model (Perry, Ziegler, & Zorzi, 2007, 2010, 2013), but
new dynamics and mechanisms were introduced to
capture reading acquisition within a realistic learning
environment. It is assumed that the phonological lexi-
con is largely in place prior to reading, although its size
can vary from one child to another. The grapheme–
phoneme mapping system (i.e., the decoding network)
is initially taught with a small number of grapheme–
phoneme correspondences (e.g., b → /b/) in a
supervised fashion using a simple associative-learning
rule (for these correspondences, see the Supplemental
Material available online). This process reflects the
explicit teaching of grapheme–phoneme correspon-
dences, as it occurs during early reading instruction
(e.g., see the statutory requirements of the National
Curriculum in England; U.K. Department for Educa-
tion, 2013; Hulme, Bowyer-Crane, Carroll, Duff, &
Snowling, 2012).
From there on, however, learning becomes unsuper-
vised, and most of the correspondences are picked up
via implicit statistical-learning procedures. That is,
when presented with a new word, the initially rudimen-
tary decoding network generates a phoneme sequence
that potentially activates entries in the phonological
lexicon. If the correct word is in the phonological lexi-
con and passes a critical threshold, it is selected, and
a representation is set up in the orthographic lexicon
(i.e., orthographic learning), which is connected to its
phonological representation. Importantly, the internally
generated phonological representation is then used as
a teaching signal (i.e., self-teaching) to improve the
decoding network. That is, every successful decoding
of a new word provides the child (and the network)
with an opportunity to set up an orthographic repre-
sentation and improve the decoding network without
an external teacher or teaching signal. Indeed, we
showed in previous simulations that 80% of words from
an English corpus of more than 32,000 words can be
learned through decoding alone (Ziegler, Perry, & Zorzi,
2014). The remaining 20% are too irregular (e.g., yacht,
aisle, chef ) to be learned through decoding.
To simulate irregular word learning and reading—
that is, words that were not able to be learned via
decoding—we added a mechanism that specifies how
irregular words would get into the orthographic lexi-
con. The basic idea is that children learn these words
via direct instruction (e.g., flash cards). Direct instruc-
tion of irregular words is explicitly listed as one of the
statutory requirements up to Grade 4 in the National
Curriculum in England (U.K. Department for Educa-
tion, 2013). Direct instruction on irregular words is also
achieved in the context of teaching word spelling. Thus,
each time a word was not lexicalized via phonological
decoding, we allowed for the possibility that it might
be lexicalized via direct instruction. We made this a
probabilistic process in which the chance that a word
would enter the orthographic lexicon varied as a
function of the orthographic ability of each child (see
Simulation Methods in the Supplemental Material).
We used this computational model to investigate
how deficits in the underlying components of the read-
ing network can predict interindividual differences in
reading performance. The general approach is outlined
388 Perry et al.
in Figure 1b. We used the data of all children included
in the study by Peterson etal. (2013) and additional
children tested by the same group, which included
accuracy in reading aloud (on regular words, irregular
words, and nonwords) as well as performance measures
in other nonreading tasks for 622 English-speaking chil-
dren, including 388 children with dyslexia. We selected
three component tasks that map relatively directly onto
processes and processing components of the model
(i.e., orthographic lexicon, phonological lexicon, pho-
nemes). Orthographic choice was taken as a measure
of processing efficiency in the orthographic lexicon,
phoneme deletion was taken as a measure of the
efficiency of activating phonemes correctly, and
vocabulary score was taken as a measure of the size of
a child’s phonological lexicon.
For each child, we used performance on these three
tasks to create individual models, one for each child, in
which the parameterization of the models’ components
and processes was changed using a simple linear function
based on the child’s performance on the three component
tasks. In particular, performance in the orthographic-
choice task was used to parameterize the amount of noise
in the orthographic lexicon and the probability that a
word would be lexicalized if successfully decoded or
found through direct instruction. Performance in the
phoneme-deletion task was used to parameterize the
amount of noise in the decoding network during training,
Orthographic Choice
Phoneme Deletion
Vocabulary Score
Probability of Phoneme Switching
Size of Phonological Lexicon
Noise in Orthographic Lexicon
Probability of Lexicalization
Component
Tasks
Size of the Deficit (zscores)
–4 –3 –2 –1 0123
Scaling Parameter
0 .05 .10 .15 .20 .25 .30 .35
Model
Components
0
10
20
30
40
50
60
70
Learning Outcome
Percentage Correct
Regular
Irregular
Nonword
Phonological
Lexicon Phonemes
Letters
Decoding
Network
Initial Network :
Explicit GPC Instruction Phonological Decoding Self-Teaching
Phonological
Lexicon Phonemes
Letters
Phonological
Lexicon Phonemes
Letters
Orthographic
Lexicon
a
b
Fig. 1. Schematics illustrating how a developmentally plausible computational model of reading development can be used to predict
learning outcomes. After initial explicit teaching on a small set of grapheme–phoneme correspondences (GPCs), the decoding network
(a) is able to decode words that have a preexisting representation in the phonological lexicon but no orthographic representation. If
the decoding mechanism activates a word in the phonological lexicon, an orthographic entry is created, and the phonology is used
as an internally generated teaching signal (red arrows) to refine and strengthen letter–sound connections, thereby improving the effi-
ciency of the decoding network. In the individual-deficit simulation approach (b), the efficiency of various components of the reading
network can be estimated individually for each child (N = 622) through performance on component tasks that map directly onto model
components. The performance of each child in the three component tasks is used to individually set the parameters of the model in
order to predict individual learning outcomes.
Simulating Dyslexia 389
in which noise was used probabilistically to swap correct
phonemes with phonetically similar ones (see Ziegler
etal., 2014). Finally, the vocabulary score was used to
set the size of the phonological lexicon, that is, how many
words a child knows when he or she begins the task of
learning to read. Importantly, model parameters were not
optimized to fit the individual reading scores, thereby
preventing overfitting (see Materials and Methods in the
Supplemental Material).
A full learning simulation was performed for each
individual model, and its performance after learning
was assessed by presenting the same words and non-
words used by Peterson etal. (2013). This allowed a
direct comparison between learning outcomes in the
simulation and actual reading performance of the child
that the simulation was meant to capture. It is important
to point out that the three component tasks do not map
directly onto the three word types (e.g., orthographic
choice for irregular words, phoneme deletion for non-
words), but rather, they affect different aspects of pro-
cessing in the model and thus the way activation is
generated and combined in the model before a final
output is produced.
Results
Overall reading performance (proportion of correct
responses) averaged across the 622 simulations (model)
and 622 children (human) is presented in Figure 2a.
These data are further broken down for dyslexic and
normally developing readers. As can be seen in Figure
2a, the overall means of the children and the predicted
means of the model for the very same children are highly
similar, for both the normally developing readers and
the readers with dyslexia. That is, the model accurately
simulated normal and impaired reading development
on the basis of performance in three component tasks.
To investigate how well the model captured interindi-
vidual differences and reading outcomes, we plotted
the actual versus predicted reading performance for the
622 children on the three reading outcome measures
(see Fig. 2b). The fit was very good, as indexed by r2
values ranging from .63 to .72. That is, knowing a
child’s performance on only three component tasks of
reading allows the model to predict his or her learning
outcomes on regular words, irregular words, and non-
words with high accuracy.
In addition to examining the accuracy of the model,
we examined its reliance on decoding versus direct
instruction for word learning. This is an interesting anal-
ysis because a large number of studies have suggested
that good readers are initially efficient decoders and
poor readers tend to be poor decoders (e.g., Gentaz,
Sprenger-Charolles, Theurel, & Colé, 2013; Juel, 1988).
Poor readers are thus more reliant on direct instruction
when learning to read than are good readers. The results
of our simulations show that the predictions of the mul-
tideficit model are consistent with these findings. In
particular, Figure 3a presents the proportion of words
that entered the lexicon through decoding or direct
instruction as a function of overall reading skill (the
average performance of each child across all word
types). Figure 3b complements the analysis by present-
ing the number of direct instruction attempts as a func-
tion of overall reading skill. As can be seen from the
simulations of poor readers, only a small proportion of
the words were learned through decoding compared with
direct instruction, and there were far more attempts at
direct instruction compared with the simulations of good
readers. Alternatively, in the simulations of good readers,
most of the words were learned via decoding.
The performance of the multideficit model in simu-
lating the whole distribution of reading deficits in chil-
dren with dyslexia was then compared with that of
three alternative models: (a) a phonological-deficit
model, which assumes deficits in activating correct pho-
nemes (i.e., deficits in phonological awareness, pho-
neme discrimination, and categorical perception of
phonemes); (b) a visual-deficit model, which assumes
impoverished orthographic processing due to poor
letter-position coding (e.g., letter reversals); and (c) a
global-noise model, which assumes general processing
inefficiency (set as a function of the child’s overall level
of performance) due to noisy computations (Hancock
etal., 2017; Sperling, Lu, Manis, & Seidenberg, 2005).
For all models, the vocabulary score was used to set
the size of the individual phonological lexicon (with
the same procedure used for the multideficit model).
These simulations were designed to examine whether
simpler models could account for the distribution of
reading scores and to investigate how single deficits
may affect different aspects of reading (for further
details, see the Supplemental Material). The mean
results appear in Figure 4.
As can be seen, the mean results from the multidefi-
cit, phonological-deficit, and visual-deficit models were
very similar to the mean results found with the human
data. Only the global-noise model was not parameteriz-
able in such a way as to allow it to capture the mean
results. Despite the similarities in the mean results
across models, however, only the multideficit model
captured the distribution of reading scores across word
types, as can be seen in Figure 5, in which the data
from all children with dyslexia are displayed (see also
Fig. S2 in the Supplemental Material for the whole data
set and Fig. S3 in the Supplemental Material for only
the normally developing children). To quantitatively
compare the predictive accuracy of the different
390
0
0.2
0.4
0.6
0.8
1
00.2 0.40.6 0.
81
0.4
0.6
0.8
1Regular Irregular Nonword
0
0.2
0.4
0.6
0.8
1
0 0.2 0.40.6 0.81
0.4
0.6
0.8
1
0.4
0.6
0.8
1
Proportion Correct
All
Proportion Correct (MDM)
r2 = .67r2 = .72r2 = .63
Regular Words
Proportion Correct (Human)
a
b
0
0.2
0.4
0.6
0.8
1
00.2 0.4 0.60.8 1
Irregular Words Nonwords
Human MDM Human MDM Human MDM
Dyslexics Controls
Proportion Correct (Human) Proportion Correct (Human)
Fig. 2. Predicted versus actual reading performance. The bar graphs (a) show the proportion of correct responses for regular words, irregular words, and nonwords
by the multideficit model (MDM) and humans, separately for all children (N = 622), children with dyslexia (n = 388), and normally developing children (controls;
n = 234). Error bars show 95% confidence intervals. The scatterplots (b) show the relationship between predicted and actual individual reading scores for regular words,
irregular words, and nonwords for all children.
Simulating Dyslexia 391
0
0.2
0.4
0.6
0.8
1
0
2,000
4,000
6,000
8,000
10,000
Direct
Decoding
Best
Proportion Learned
Direct Instruction Attempts
Worst Best Worst
ab
Reading PerformanceReading Performance
Fig. 3. The use of decoding versus direct instruction as a function of reading skill. Simulations
show (a) the proportion of words that were learned via self-generated decoding and via direct
instruction and (b) the number of direct instruction attempts. Both are plotted as a function
of the average reading performance of each child. Colored lines represent the individual data,
and the black overlaid lines are the results in deciles. The proportions of words in (a) do not
add up to 1.0 because they refer to a full-size phonological lexicon, which includes words that
were not learned by either decoding or direct instruction for most of the simulated individuals.
Proportion Correct
Human MDM Noise Phon Visual
Human MDM Noise Phon Visual
Human MDM Noise Phon Visual
Proportion CorrectProportion Correct
1
.9
.8
.7
.6
.5
.4
All
1
.9
.8
.7
.6
.5
.4
1
.9
.8
.7
.6
.5
.4
Dyslexics
Controls
Regular Irregular Nonword
Fig. 4. Reading performance for regular words, irregular words, and nonwords for all children, dys-
lexic children, and control children, compared with performance of the multideficit model (MDM),
the global-noise model (noise), the phonological-deficit model (phon), and the visual-deficit model
(visual). Error bars show 95% confidence intervals.
392
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
0.8
1
Regular Words
Proportion Correct (Model)
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
Multideficit
Global
Noise
Visual
Deficit
Phonological
Deficit
0
0.2
0.4
0.6
0.8
1
All BICs =
–19;
r2 = .28
All BICs =
–1,398;
r2 = .71
All BICs =
–1,189;
r2 = .64
All BICs =
–1,074;
r2 = .59
BIC = –516; r2 = .49 BIC = –529; r2 = .60 BIC = –135; r2 = .43
BIC = –551; r2 = .54 BIC = –487; r2 = .57 BIC = –209; r2 = .58
BIC = –576; r2 = .58 BIC = 184; r2 = .51 BIC = 72; r2 = .52
BIC = –587; r2 = .65 BIC = –625; r2 = .69 BIC = –250; r2 = .61
Proportion Correct (Model)
Irregular Words Nonwords
Regular
Irregular
Nonword
Regular Irregular Nonword
Proportion Correct (Human)
Fig. 5. Predicted mean dyslexic reading performance (bar graphs) and the association between predicted and actual reading performance of individual dys-
lexics (scatterplots) of the multideficit, global-noise, phonological-deficit, and visual-deficit models. A Bayesian information criterion (BIC) difference of 10
corresponds to a posterior odds of about 150:1 (Raftery, 1995), and a larger negative value is an index of better fit. Error bars show 95% confidence intervals.
Simulating Dyslexia 393
models, we calculated the residual sum of squares
between the simulated data and the empirical data (i.e.,
scores for regular words, irregular words, and nonwords
for each child) and computed the Bayesian information
criterion (BIC) for each model.
On the full set of data, although the multideficit
model was penalized for its larger number of free
parameters—four (orthographic noise, phoneme
switching, lexicalization threshold, and vocabulary)
versus two for the single-deficit models (one specific
parameter and vocabulary)—it yielded a markedly
lower BIC score (−2,630) than all alternative models
(the global-noise, phonological-deficit, and visual-
deficit models had scores of −316, −2,244, and −2,027,
respectively); the size of the difference between BIC
scores represents very strong evidence in favor of the
multideficit model (a BIC difference of 10 corresponds
to a posterior odds of about 150:1; Raftery, 1995). The
same pattern was found when the comparison was
restricted to the dyslexic children, with the multideficit
model having the lowest BIC score (−1,398) compared
with all alternative models (−19, −1,189, and −1,074 for
the global-noise, phonological-deficit, and visual-deficit
models, respectively), as well as when the comparison
was restricted to the normally developing children
(−1,288 for the multideficit model; −321, −1,101, and
−985 for the global-noise, phonological-deficit, and
visual-deficit models, respectively).
A potential problem with the model comparison is
that a systematic search of the optimal parameter set for
each model was computationally unfeasible despite our
use of supercomputing facilities. However, there is no
reason to believe that the alternative models were penal-
ized with respect to the multideficit model because it is
much easier to find suitable values in a two-parameter
space (for the single-deficit models) than in a four-
parameter space (for the multideficit model). Our hand-
search approach was to explore the parameter space of
each model until the overall means in the simulations
were close to the empirical means. As can be seen in
Table S2 in the Supplemental Material, apart from the
global-noise model, the fit of all models with respect to
the overall means was indeed rather good. Thus, it is
only when it comes to explaining the individual distri-
butions (i.e., interindividual differences) that the
phonological-deficit and visual-deficit models go off
track. Further inspection of the results showed that the
single-deficit models were worse because there were
no parameters that could be changed to fix this (for
further discussion, see the Supplemental Material).
Finally, we used the multideficit model as a tool to
predict how the increase in the efficiency of a single
component would change reading performance on
regular words, irregular words, and nonwords. This was
done by first selecting the 100 children with the worst
average deficit scores (i.e., the most negative z scores
averaged across the three types of deficit). Then, each
deficit score of each child was increased in 0.2 z-score
steps until it reached a level corresponding to unim-
paired processing (for orthographic and phonological
deficits) or a full vocabulary size. Thus, each z score
was increased as much as it could be, and the other
two z scores were held constant. Predicted reading
scores were generated at each step using the same
method as in the previous simulations. The results
appear in Figure 6.
The results of the simulations show that increasing
vocabulary tends to be more beneficial for irregular
word reading (i.e., sight word reading) than nonword
reading (i.e., decoding), whereas increasing the effi-
ciency of phonological processing shows the opposite
pattern. Increasing orthographic efficiency helps all
word types. However, Figure 6 shows important inter-
individual differences, which suggest that the choice of
an optimal intervention depends on the initial condi-
tions, that is, the individual starting point in the 3-D
deficit space. The validity of these models’ predictions
should be tested in future empirical studies.
Conclusion
Our results show that large-scale simulations with a
developmentally plausible computational model of
reading acquisition allow us to predict learning out-
comes for individual children and reading profiles of
children with dyslexia on the basis of performance on
three component tasks (orthographic choice, phoneme
deletion, vocabulary). The multideficit model is supe-
rior to alternative single-deficit models in all respects,
which suggests that future research needs to take into
account the multidimensional nature of the deficits that
cause dyslexia. This novel computational approach
establishes causal relations between deficits and out-
comes that can be used to make long-term predictions
on learning outcomes for at-risk children. Importantly,
the model can be used to predict how changing the
efficiency of one component might change reading per-
formance for an individual child. One limitation is that
the present simulations were based on a cross-sectional
sample of children rather than on data from a longitu-
dinal study. In particular, it would be of great interest to
validate the model’s predictions of intervention outcomes
in future intervention studies. Confirming the validity of
the model’s predictions would pave the way for develop-
ing personalized computer models to guide the design
of individually tailored remediation strategies.
394 Perry et al.
020406080
Orthography Phonology Vocabulary
020406080
012345
020406080
012345 0
12345
Increase in z Score
Accuracy Increase
for Irregular Words (%)
Component Score
Accuracy Increase
for Regular Words (%)
Accuracy Increase
for Nonwords (%)
Increase in z Score Increase in z Score
Fig. 6. Predicting learning outcomes as a function of improvements in orthography, phonology, and vocabulary. The scores of each child
were normalized to start at 0, and the component scores were increased by 0.2 of a z score until they were at their maximum. Thus, the
start of a line represents a child’s initial state, and the end of a line represents how a child was predicted to perform when a single com-
ponent score was increased as much as possible. Thus, the length of the line represents the potential gain (in z scores) for a given child.
Action Editor
Erika E. Forbes served as action editor for this article.
Author Contributions
All the authors contributed equally to the study concept
and design. C. Perry implemented the computational
model and performed all simulations and statistical analy-
ses under the supervision of M. Zorzi. All the authors
interpreted the results of the simulations. C. Perry and J.
C. Ziegler drafted the manuscript, and M. Zorzi provided
critical revisions. All the authors approved the final manu-
script for submission.
ORCID iDs
Marco Zorzi https://orcid.org/0000-0002-4651-6390
Johannes C. Ziegler https://orcid.org/0000-0002-2061-5729
Acknowledgments
This work was performed in part on the swinSTAR supercom-
puter at Swinburne University of Technology. We thank Robin
Simulating Dyslexia 395
Peterson, Bruce Pennington, and Richard Olson for many
insightful comments and discussions and for providing the
behavioral data.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest
with respect to the authorship or the publication of this article.
Funding
This research was supported by grants from the Australian
Research Council (DP170101857); the European Research
Council (210922-GENMOD); the Agence National de la
Recherche (ANR-13-APPR-0003); the Labex Brain and Lan-
guage Research Institute (ANR-11-LABX-0036); the Institute
of Convergence at the Institute for Language, Communica-
tion and the Brain (ANR-16-CONV-0002); the Excellence
Initiative of Aix-Marseille University A*MIDEX (ANR-11-
IDEX-0001-02); and the University of Padova (Strategic Grant
NEURAT). Behavioral data were collected with support from
the National Institutes of Health to the Colorado Learning
Disabilities Research Center (Grant No. P50 HD027802).
Supplemental Material
Additional supporting information can be found at http://
journals.sagepub.com/doi/suppl/10.1177/0956797618823540
Open Practices
All data from all simulations as well as an executable version of
the model can be downloaded from C. Perry’s website (https://
sites.google.com/site/conradperryshome/). The original behav-
ioral data were taken from the study of Peterson, Pennington,
and Olson (2013) and are not owned by the authors of the cur-
rent article. The design and analysis plans were not preregistered.
References
Gentaz, E., Sprenger-Charolles, L., Theurel, A., & Colé, P.
(2013). Reading comprehension in a large cohort of
French first graders from low socio-economic status
families: A 7-month longitudinal study. PLOS ONE, 8(11),
Article e78608. doi:10.1371/journal.pone.0078608
Hancock, R., Pugh, K. R., & Hoeft, F. (2017). Neural noise
hypothesis of developmental dyslexia. Trends in Cognitive
Sciences, 21, 434–448.
Harm, M. W., & Seidenberg, M. S. (1999). Phonology, read-
ing acquisition, and dyslexia: Insights from connectionist
models. Psychological Review, 106, 491–528.
Hulme, C., Bowyer-Crane, C., Carroll, J. M., Duff, F. J., &
Snowling, M. J. (2012). The causal role of phoneme
awareness and letter-sound knowledge in learning to
read: Combining intervention studies with mediation
analyses. Psychological Science, 23, 572–577.
Hulme, C., Nash, H. M., Gooch, D., Lervåg, A., & Snowling,
M. J. (2015). The foundations of literacy development in
children at familial risk of dyslexia. Psychological Science,
26, 1877–1886.
Juel, C. (1988). Learning to read and write: A longitudinal
study of 54 children from first through fourth grades.
Journal of Educational Psychology, 80, 437–447.
Menghini, D., Finzi, A., Benassi, M., Bolzani, R., Facoetti, A.,
Giovagnoli, S., . . . Vicari, S. (2010). Different underly-
ing neurocognitive deficits in developmental dyslexia: A
comparative study. Neuropsychologia, 48, 863–872.
Perry, C., Ziegler, J. C., & Zorzi, M. (2007). Nested incremental
modeling in the development of computational theories:
The CDP+ model of reading aloud. Psychological Review,
114, 273–315.
Perry, C., Ziegler, J. C., & Zorzi, M. (2010). Beyond single
syllables: Large-scale modeling of reading aloud with the
connectionist dual process (CDP++) model. Cognitive
Psychology, 61, 106–151.
Perry, C., Ziegler, J. C., & Zorzi, M. (2013). A computational
and empirical investigation of graphemes in reading.
Cognitive Science, 37, 800–828.
Peterson, R. L., Pennington, B. F., & Olson, R. K. (2013).
Subtypes of developmental dyslexia: Testing the predic-
tions of the dual-route and connectionist frameworks.
Cognition, 126, 20–38.
Raftery, A. E. (1995). Bayesian model selection in social
research. Sociological Methodology, 25, 111–163.
Share, D. L. (1995). Phonological recoding and self-teaching:
Sine qua non of reading acquisition. Cognition, 55, 151–218.
Snowling, M. J. (2000). Dyslexia. Oxford, England: Blackwell.
Snowling, M. J. (2001). From language to reading and dys-
lexia. Dyslexia, 7, 37–46.
Sperling, A. J., Lu, Z. L., Manis, F. R., & Seidenberg, M. S.
(2005). Deficits in perceptual noise exclusion in develop-
mental dyslexia. Nature Neuroscience, 8, 862–863.
Stein, J. (2014). Dyslexia: The role of vision and visual atten-
tion. Current Developmental Disorders Reports, 1, 267–280.
Stein, J., & Walsh, V. (1997). To see but not to read; the mag-
nocellular theory of dyslexia. Trends in Neurosciences,
20, 147–152.
U.K. Department for Education. (2013). National curriculum
in England: Framework for key stages 1 to 4. Retrieved
from https://www.gov.uk/government/publications/
national-curriculum-in-england-framework-for-key-
stages-1-to-4/the-national-curriculum-in-england-frame-
work-for-key-stages-1-to-4
Vandermosten, M., Boets, B., Luts, H., Poelmans, H., Golestani,
N., Wouters, J., & Ghesquière, P. (2010). Adults with dys-
lexia are impaired in categorizing speech and nonspeech
sounds on the basis of temporal cues. Proceedings of the
National Academy of Sciences, USA, 107, 10389–10394.
Vellutino, F. R., Fletcher, J. M., Snowling, M. J., & Scanlon,
D. M. (2004). Specific reading disability (dyslexia): What
have we learned in the past four decades? Journal of Child
Psychology and Psychiatry, 45, 2–40.
Vidyasagar, T. R., & Pammer, K. (2010). Dyslexia: A deficit in
visuo-spatial attention, not in phonological processing.
Trends in Cognitive Sciences, 14, 57–63.
Ziegler, J. C., & Goswami, U. (2005). Reading acquisition,
developmental dyslexia, and skilled reading across lan-
guages: A psycholinguistic grain size theory. Psychological
Bulletin, 131, 3–29.
Ziegler, J. C., Perry, C., & Zorzi, M. (2014). Modelling read-
ing development through phonological decoding and
self-teaching: Implications for dyslexia. Philosophical
Transactions of the Royal Society B: Biological Sciences,
369(1634), Article 20120397.
1
Supplemental Online Material
Understanding dyslexia through personalized large-scale
computational models
Conrad Perry1,*, Marco Zorzi2,3 , & Johannes C. Ziegler4
1 Faculty of Health, Arts and Design, Swinburne University of Technology, Hawthorn,
Australia
2 Department of General Psychology and Padova Neuroscience Center, University of Padova,
Padova, Italy
3 Fondazione Ospedale San Camillo IRRCS, Venice-Lido, Italy
4 Aix-Marseille University, Centre National de la Recherche Scientifique, Laboratoire de
Psychologie Cognitive, Marseille, France
* Correspondence: conradperry@gmail.com
Materials and Methods
Behavioral data
All children (N = 622) who had full data on the critical component tasks (phonological
awareness, orthographic choice, vocabulary) and who had scores on regular, irregular, and
nonword reading were selected from a larger database (N = 1189) that was generously provided
by Bruce Pennington, Richard Olson, and Robin Peterson. The database included all of the children
documented in Peterson et al. (2013). Note that we did not use any other selection criteria than
having complete data on all critical measures. Further information about the testing procedures
and diagnostic criteria can be found in the original study. The critical component and reading tasks
used were the following:
Phonological processing. This was assessed with a phoneme deletion test. The phoneme
deletion test consisted of six practice and 40 test trials presented in two blocks and required
subjects to repeat a nonword, then remove a specific phoneme (when done correctly, a real word
resulted—e.g., ‘Say ‘prot’. Now say ‘prot’ without the ‘/r/’). Note that the database included two
other tasks that we could have used to parameterize phonological processing: phonological choice
2
(participants chose which of three nonwords sounds like a word) and Pig Latin (participants strip
the first phoneme of a word, pronounce the word without the phoneme, and then use a second
syllable with the onset of the first syllable plus the vowel /eɪ/). We did not use the phonological
choice task because it taps whole-word phonological knowledge and it requires reading the
nonwords aloud (prior to the phonological decision). Both phoneme deletion and Pig Latin tasks
provide a purer measure of phonological processing, but the latter is more complex (it requires a
bigger number of phonological operations) and it is far less commonly used than phoneme deletion
when examining the development of reading and reading disorders (e.g., Landerl et al., 2013;
Ziegler et al., 2010).
Vocabulary. Vocabulary knowledge was measured with the Vocabulary subtest from the
Wechsler Intelligence Scale for Children—Revised.
Orthographic processing. This was assessed with an orthographic choice test (Olson, Forsberg,
Wise, & Rack, 1994). The orthographic choice test included 80 real word/pseudohomophone pairs
(e.g., easy–eazy, fue–few, salmon–sammon) presented in two blocks and required participants to
select the real word. Note that the database included another task that we could have used to
parametrize orthographic processing, that is, homophone choice (participants decide which of two
possible homophones corresponds to a statement which details the meaning of only one of the
homophones). However, homophone choice examines whether participants know which spelling
corresponds to a given meaning, whereas orthographic choice only requires visual word
recognition. Therefore, orthographic choice offers a purer measure of orthographic processing than
homophone choice. Moreover, homophone choice relies on word meaning and is thus likely to
have more overlap with vocabulary measures.
Reading aloud measures. Nonword reading was assessed with a nonword reading test (Olson
et al., 1994). The nonword reading test was presented in two blocks and included 85 items of
varying difficulty levels (e.g., strale, lobsel). Regular and irregular word reading was assessed with
the set of words used by Castles and Coltheart (Castles and Coltheart, 1993) that included 30
irregularly spelled words (e.g., island, choir) and 30 regular words of varying difficulty.
3
Program Availability
A fully executable version of the model that runs under the Windows operating system as well
as the data generated in this paper can be found at the following websites:
https://sites.google.com/site/conradperryshome/
http://ccnl.psy.unipd.it/CDP.html
Simulation Methods
A brief description of the Connectionist Dual-Process Model
The core architecture of the models is taken from the Connectionist Dual-Process (CDP) model
(Perry, Ziegler, & Zorzi, 2007, 2010), as implemented in its latest version CDP++.parser (Perry,
Ziegler, & Zorzi, 2013) which is depicted in Supplementary Figure 1. There are two relatively
separate processing pathways (“routes”) in the model. One is a lexical route that includes the
orthographic and phonological word forms. The other is a sublexical route that computes the
phonology of words without knowledge of the whole-word form. Both of these routes share letter
features and letter representations, as well as output nodes for phonemes and stress.
The basic function of the lexical route is to allow the whole word form of words to be stored
and recalled. In the orthographic lexicon, there is a single node for each spelling, and in the
phonological lexicon, there is a single node for each phonological word. At the letter level, the
orthographic form of words is simply represented as a contiguous set of letters, and at the letter
feature level, the visual patterns of the letters are represented. At the phoneme level, the
phonological form of words uses a representation that is structured in terms of its speech form,
with phonemes being organized into a syllabic template. This template has slots for phonemes that
are organized according to an onset-vowel-coda distinction. It allows three phonemes in the onset,
one in the vowel, and four in the coda for each of two possible syllables. Stress information is also
stored (i.e., whether the word has first or second syllable stress).
4
Figure S1. The Connectionist Dual-Process Model of Reading Aloud (CDP++.parser version;
Perry, Zielger, & Zorzi, 2013). Note: f = feature, l = letter, S = Stress, o = onset, v = vowel, c =
coda. Numbers correspond to the overall slot number within the Feature, Letter, and Stress nodes,
or the particular slot within an onset, vowel, or coda grouping for other representations. The thick
divisor in the Phoneme Output Buffer represents a syllable boundary. The thick dotted lines
represent how self-teaching occurs (i.e., letters→ sublexical decoding→ output nodes→
phonological lexicon→ orthographic lexicon).
Processing dynamics of nodes in the feature and letter level, the orthographic and phonological
lexicons, and the phoneme and stress output buffers is based on standard interactive-activation
equations (McClelland & Rumelhart, 1981), where all inputs into a given node are first summed
and then transformed using a sigmoid function. This includes input from other nodes, and, with
the phonological and orthographic lexicons, inhibitory input from a frequency scaling parameter
Graphemic Parser
b r æn d əd
o1 o2 o3 v1 c1 c2 c3 c4 o1 o2 o3 v1 c1 c2 c3 c4
Phoneme Output Nodes
S1 S2
Stress Output Nodes
Phonological Lexicon phonemes
graphemes
sublexical
stress nodes
Two-layer Associative
Network
Orthographic Lexicon
branded
(print)
[branded]
[‘bræn.dəd]
’
(o1)(o2)(v1)(c1)(o1)(v1)(c1)
b(o)
r(o)
a(v)
n(c)
d(o)
e(v)
d(c)
b r a n d e d
l1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13 l14 l15 l16
Letter Nodes
b r æ n d ə d
b r a n d e d
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14f15f16
Feature Nodes
/’bræn.dəd/
b(o1) r(o2) a(v1) n(c1) d(o1) e(v1) d(c1)
Sublexical Route
Semantics
Lexical Route
(sound)
Phoneme Output Buffer Stress Output Buffer
Letters
Letter Features
Lexicalization
5
that is proportional to the log frequency of the word. Injection of noise into these representations
(as in our dyslexia simulations) is done at the summing stage (i.e., before the nonlinear
transformation).
The basic function of the sublexical route is to generate the phonology of letter strings without
lexical information. This is important as it represents one way the model can read words without
being previously exposed to their orthographic form. There are a number of steps in this process.
The first involves the graphemic parser of the model. The graphemic parser is a simple two-layer
network which learns to break letter strings into graphemes and then assign them to a slot in the
input layer of the two-layer associative (TLA) network. This layer consists of a syllabically
organized template where graphemes are organized according to an onset-vowel-coda structure
that is largely homologous with the phoneme organization described above. Since the parser has
no knowledge of the lexical form of a word, however, it can potentially parse words in ways that
are not similar to how they might be represented lexically. The graphemic representation of the
letter string is then propagated through the TLA network, where the activation of phoneme nodes
is computed in the standard way by dot product of input and weight patterns followed by a
nonlinear (sigmoid) transformation. Finally, these values are propagated to the phoneme output
nodes, where they are pooled with activation coming from the lexical route.
The model works differently depending on whether a word is in training mode or whether it is
being read aloud. When reading a word aloud, a string of letter features is first activated, and the
model iterates through the processes described above until activation criteria in the phoneme and
stress output buffers are satisfied. In learning mode, the graphemes and phonemes in a word are
aligned in the TLA network, and the TLA network is then trained. The training rule used by the
TLA network is the delta rule (formally equivalent to the Rescorla-Wagner learning rule; Sutton
& Barto, 1981), and since the network only has two layers, this means only linear relationships
between graphemes and phonemes can be learnt.
One limitation of the graphemic parsing mechanism is that, in very rare circumstances, a
disyllabic word may be parsed into three orthographic syllables. This happened in the present study
for the word colonel (which was included in the Castles and Coltheart (1993) word set). This word
was therefore removed from the lexicon of the model and it was not used to calculate the
percentage of correct words in that set. Control simulations where this word was left in and could
be learnt via direct instruction produced virtually identical results.
6
New mechanisms
New learning dynamics and mechanisms were introduced to capture reading acquisition within
a realistic learning environment. These include:
1) The learning method described in Ziegler et al. (2014), where the model was first trained on
a small set of grapheme-phoneme correspondences (listed in the Appendix) and then words were
added to the orthographic lexicon if they were successfully decoded through the decoding network
-- that is, when the phonemes derived from letters were able to activate the correct word in the
phonological lexicon of the model. When a word was successfully decoded and added to the
orthographic lexicon or if it was already in the orthographic lexicon, the decoding network was
trained on that word.
2) A novel lexicalization method reflecting the probabilistic nature of lexicalization and
memory consolidation, as well as the fact that learning can occur via direct teaching and other
methods that do not necessarily need self-generated decoding.
2.1) Lexicalization was made probabilistic. In particular, rather than a word being lexicalized
every time it passed the activation threshold in the phonological lexicon of the model via decoding,
it was only lexicalized some proportion of the time. This proportion was linked to the orthographic
choice parameter: the better a child was at orthographic learning, as estimated by his or her
performance on the orthographic choice task, the higher the probability that the word entered the
orthographic lexicon. This assumption allows inter-individual differences in orthographic learning
to occur that do not depend on decoding.
2.2) Words were given a chance of being lexicalized by direct instruction if they did not reach
the threshold for decoding or were not lexicalized after decoding. The probability that a given
word would become a candidate for lexicalization via direct learning was simply a function of its
frequency (i.e., log [frequency of target word +2] / log [frequency of highest frequency word +2]).
In practice, this means that words of a very low frequency have about a 5% chance of being
selected for direct learning after not being successfully decoded.
3) A child-specific vocabulary, which in its full version included all words (N = 9663) of the
CELEX database (Baayen, Piepenbrock, & van Rijn, 1993) that had an age-of-acquisition rating
of 10 years or less (Kuperman, Stadthagen-Gonzalez, & Brysbaert, 2012) and only one or two
7
syllables. The word colonel was not included (see above). The number of word presentations (i.e.,
learning events) for a model with the full vocabulary was 57978, which is equivalent to 6 passes
(i.e., training epochs) through the full database (9663 × 6). For models with a smaller vocabulary,
the number of word presentations was reduced proportionately by keeping the number of training
epochs the same as for the full vocabulary model. On average, there were 6423 words in the
MDM’s simulations (across all 622 children), and thus the average number of word presentations
was 38538. The three alternative models all used the same vocabulary parameter as the MDM, and
thus the number of words used for each simulation of each child was very similar to the MDM.
The order of presentation of the words was random in the first epoch and the same random order
was used in successive epochs.
4) The presence of noise during learning, which implies that the results are non-deterministic.
Therefore, all simulations were run 10 times and the average of the results was taken for
subsequent analyses. Overall, the simulations required around 240 million word presentations /
learning events (i.e., 622 subjects × 38538 words × 10 repeats). Despite the systematic use of
supercomputing facilities, the computational burden was too large to run the model with a full
lexicon (the final simulations reported here took approximately 20,000 hours of computing time).
During learning, a reduced “runtime” lexicon was therefore compiled by taking the word/nonword
presented to the model and all words that were 1st or 2nd order phonological or orthographic
neighbors (Coltheart, 1978). For words differing in length, each letter or phoneme different was
counted as one neighbor different (i.e., dog and dogs were counted as 1 neighbor different). This
meant that, for a full vocabulary model, the “runtime” phonological lexicon included on average
71.28 (SD: 98.8) words (even when the orthographic lexicon had no words yet). During model
testing (i.e., after the learning phase), the same restriction was used but the “runtime” lexicon also
included all words that had the same first letter/phoneme as the word being tested, as well as any
words that had the same phoneme as the regularized first grapheme of the word (i.e., the phoneme
based on simple spelling-sound translation rules, see (Coltheart, Rastle, Perry, Langdon, & Ziegler,
2001)). This was done because it meant that highly irregular words like whole /hɒl/ had lexical
competitors that had the most common pronunciation of the grapheme used by the word (/w/, e.g.,
one, word, wart) as well as the phoneme used in the lexical form (i.e., /h/, e.g., hope). Thus, during
testing, a stimulus could activate on average 796.2 (SD: 412.1) words in the phonological lexicon
(for a full vocabulary model).
8
Parameter settings
To set the parameter values, we used the method in Ziegler et al. (2008), whereby each
individual started with the same parameter set and these values were modified based on individual
performance in the subcomponent tasks. To get the distributions of parameters for each individual,
a high and a low value was chosen based on the child who scored the worst on a particular task
and the child who scored the best. All other children were then given a score between these two
values based on simple linear interpolation. For example, if the parameter values varied between
0 and 1 and the task scores went from 0 to 5, then the parameter value given for the child that
scored 2.5 on a task would simply be .5.
For each child, the individual set of model parameters was determined on the basis of
performance in the subcomponent tasks in the following way:
1) The orthographic choice task was used to index the level of noise in the orthographic lexicon.
This was computed for each word by taking the parameter of each child (based on his or her
performance in the orthographic choice task) and multiplying it with a number sampled from a Z-
distribution. This product was then added to the net input of the Interactive Activation equations
(Perry et al., 2007) used for each lexical item.
2) The orthographic choice task was also used to index the probability at which a child
lexicalizes a word after either successfully decoding it or being successfully given it via direct
learning. This was determined exactly the same way as the level of noise in the orthographic
lexicon, with the child with the best score having a 100% chance of lexicalization and those with
a lower score having a lower chance.
3) The phoneme deletion task was used to set how much noise was generated in the decoding
network during learning for each participant. Based on this parameter, for a given word presented
to the model, phonemes that would be active in the output were turned off with a certain probability
and another phoneme in the same syllabic position was turned on (i.e., if a phoneme was switched
off in the first onset position, another phoneme was always turned on in the first onset position).
The replacement phoneme was not chosen purely randomly but based on phonetic similarity (e.g.,
/p/ is more likely to be switched with /b/ than with /m/, see Ziegler et al. (2014)).
4) The vocabulary score was used to set how many words were in the phonological lexicon of
each participant. The function used to determine whether a word should be in-or-out of the lexicon
9
was weighted towards keeping high over low frequency words. This was done by first calculating
a value for each word based on its frequency (log [frequency of target word +2] / log [frequency
of highest frequency word +2]). For each word, a random number between one and zero was then
generated and multiplied by the vocabulary parameter. If the value of this number was less than
the value calculated from word frequency, the word was kept in the lexicon; otherwise it was not.
On average (i.e., across the 622 individual models), this meant that 66.5% of the words were in
the lexicon; the lexicon of the child with the smallest vocabulary contained only 36.1% of all
possible words.
The parameters that were manipulated in the MDM and in the alternative models (see below)
to simulate individual differences across the children are listed in Table S1. The parameter values
for the MDM were found by choosing an initial set by hand and then making minor modifications
to them so that they produced similar overall means to the children. All other parameters were
identical to those reported in Perry et al. (1) with two exceptions: the Letter-to-Orthography
inhibition parameter was set lower (from -1.5 to -0.7, which meant incorrect lexical entries were
more likely to get activated) and the lexicon frequency scaling parameter was also set slightly
lower (from .15 to .10, which meant that the effect of word frequency on the resting activation
levels of word nodes was smaller). The models also used an identical threshold to identify when
successful decoding occurred (.15).
Table S1. Parameters that varied across the models
Parameter
Model
Multi-deficit
Global Noise
Phonological
Deficit
Visual Deficit
Letter Noise
0 - 0.008
Letter Switching
0 - .15
Orthographic Noise
0 - .16
0 - 0.008
Phonological Noise
0 - 0.008
Phoneme Noise
0 – 0.008
Phoneme Switching
0 - .78
0.0 - .92
Lexicalization
Threshold
.01 – 1
.7
.6
.55
Vocabulary
0 - .80
0 - .80
0 - .80
0 - .80
10
During learning, the following parameter changes were made to all models so that items in the
phonological lexicon could be activated comparatively more easily: Phonological Lexicon to
Phoneme Buffer inhibition: 0; Phonological Lexicon to Phoneme Buffer excitation: 0;
Phonological Lexicon lateral inhibition: -.03; Phoneme Buffer to Phonological Lexicon Inhibition:
-.02.
Alternative Models
The multi-deficit model was compared with three simpler models. Two implemented single-
deficit hypotheses (a phonological deficit and a visual deficit model), and the third a hypothesis
examining the effect of using the same distribution of noise across all representations (a global
noise model). These differed in how and where noise was applied, and all used the same vocabulary
parameterization as the MDM. All three of the models used a fixed probability of lexicalization,
and an attempt was made to try to find a parameter set that caused the models to show a pattern as
close as possible to the overall means as the actual data. This was done by starting with the MDM
parameter values and setting all of the parameters not associated with the specific model to zero.
The parameters left were then set to a point that produced results as similar as possible to the
overall means. These parameters were found using a hand search where the parameter that was
modified for each model was changed in conjunction with the lexicalization threshold parameter.
Vocabulary score was also used in all alternative models to set the size of the individual
phonological lexicon (as for the MDM model). The specific processing assumptions of the models
were:
1. Phonological deficit model. The critical parameter was the probability of a phoneme being
switched during learning, which was derived from the phoneme deletion scores of each child (in
the same way as in the multi-deficit model).
2. Visual deficit model. Letters at the letter level were switched with adjacent letters with a
probability that was determined from the orthographic choice scores of each child. Switching was
assessed for each letter in each position, starting from the first letter and excluding the last one. If
switching occurred, the letter was switched with the letter to the right of it.
3. Global noise model. Noise was added to each processing level (letter level, orthographic
lexicon, phonological lexicon, and phoneme output buffer) whenever the model was run. The
11
amount of noise was determined by a parameter that was set for each individual based on the mean
performance he or she had on the regular, irregular, and nonwords.
Model Evaluation and Model Comparisons
As mentioned above, the predicted reading scores for each child were computed by averaging
10 simulation runs with the model (due to the non-deterministic nature of the learning process).
The model scores were then compared with the actual child scores (i.e., individual dots in Fig. 5,
S2 and S3). Note that there was no fitting procedure to minimize the prediction error on the
distribution of reading scores (this would be computationally unfeasible given the stochastic nature
of the model). The parameters described above (e.g., probability of lexicalization, size of
phonological lexicon, etc. see Table 1) were simply set to vary within a range that allowed the
model to produce mean scores similar to the mean across all participants (see also above). This
implies that the model predictions on the full distribution of reading scores are not tied to the
dataset and are not influenced by specific cases (i.e., overfitting is not possible). The same
procedure was used for the alternative models, and, as can be seen in the results in the main text
(Fig. 4) and Table S2, the mean results are very similar to the actual results found for all but the
global noise model.
Table S2. Mean overall percentage correct scores for the three word types and summed squared
error differences between the model scores and the human data
Dataset
Percentage Correct
Summed squared error (SSE)
Regular
Irregular
Nonword
Regular
Irregular
Nonword
Total SSE
Human Data
88.66
72.75
67.09
MDM
88.82
71.79
64.79
0.03
0.93
5.32
6.27
Visual Def
86.01
70.02
66.66
7.00
7.45
0.19
14.63
Phon Def
89.16
71.27
65.25
0.26
2.18
3.41
5.84
Global Noise
86.60
54.69
77.66
4.24
326.15
111.71
442.11
Note: Phon = Phonological; Def = Deficit
12
Despite the small differences in overall means, inspection of Figure 5, Figure S2, and Figure
S3 shows that all of the single deficit models produce distributions of data that differ considerably
to those found in the actual data. These cannot be fixed by any simple modifications to the
parameters used. In particular, with the phonological deficit model, the distribution of irregular
word scores is too tight compared to the actual data. This is because the phoneme switching
parameter affects nonwords more than irregular words. Thus, if this parameter is increased to try
to widen the distribution of irregular word scores, nonword performance drops below the overall
mean results. There is also a less obvious difference with the nonwords, where the model produces
a more sigmoidal function than the MDM when the correct function should look linear. This also
cannot be simply fixed, because alternative values of the phoneme switching parameter decrease
the fit to the overall means, whereas with the MDM, nonwords are also affected by orthographic
noise and this makes the distribution of simulated scores more similar to the distribution observed
in the human data.
With the visual deficit model, the distribution is too restricted with all groups of words. This
cannot be fixed by increasing the range of noise, because this causes the performance of the model
to drop too low. With the global noise model, there appeared to be no set of parameters that can
could be chosen to get the model to display a pattern of means similar to the means of the children.
The reason for this is that injecting the same level of noise in all representations causes a much
larger detriment to irregular word performance than nonword performance, a pattern also observed
by Nickels, Biedermann, Coltheart, Saunders, and Tree (2008) in simulations with the Dual-Route
cascaded model of Coltheart, Rastle, Perry, Langdon, and Ziegler (2001). With nonwords, noise
increases the competition between alternative phonemes, but this does not necessarily cause poor
performance – for example, both /zu:d/ or /zʊd/ are reasonable pronunciations of zood.
Alternatively, with irregular words, increased competition from (incorrect) phonemes may prevent
the correct phoneme from becoming the most active (due to lateral inhibition), thereby leading to
a word error.
In terms of model comparisons, we provide r2 as well as BIC scores. The latter were computed
as: BIC = n + n ln (2π) + n ln(RSS/n) + (ln n) (p + 1), where n is sample size, p is the number of
free parameters, and RRS is the residual sum of squares (i.e., sum of the squared prediction errors).
Note that this formula is often used without the two initial terms (though here it is identical to the
one used in the base package of R software). The MDM was considered to have 4 free parameters
13
(phoneme switching, orthographic noise, lexicalization threshold, vocabulary; see Table S1). The
three alternative models were considered to have 2 free parameters based on vocabulary and one
model-specific parameter.
References
Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX Lexical Database (CD-
ROM). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.
Castles, A., & Coltheart, M. (1993). Varieties of developmental dyslexia. Cognition, 47(2),
149-180.
Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies
of information processing (pp. 151-216). London: Academic Press.
Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. C. (2001). DRC: A dual route
cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204-
256.
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings
for 30,000 English words. Behavioral Resesearc Methods, 44(4), 978-990.
Landerl, K., Ramus, F., Moll, K., Lyytinen, H., Leppanen, P. H., Lohvansuu, K., . . . Schulte-
Korne, G. (2013). Predictors of developmental dyslexia in European orthographies with varying
complexity. Journal of Child Psychology and Psychiatry, 54(6), 686-694.
McClelland, J. L., & Rumelhart, D. E. (1981). An Interactive Activation model of context
effects in letter perception: 1. An account of basic findings. Psychological Review, 88(5), 375-407.
Nickels, L., Biedermann, B., Coltheart, M., Saunders, S., & Tree, J. J. (2008). Computational
modelling of phonological dyslexia: how does the DRC model fare? Cognitive Neuropsychology,
25(2), 165-193.
Olson, R. K., Forsberg, H., Wise, B., & Rack, J. (1994). Measurement of word recognition,
orthographic, and phonological skills Frames of reference for the assessment of learning
disabilities: New views on measurement issues (pp. 243-277). Baltimore, MD, US: Paul H Brookes
Publishing.
Perry, C., Ziegler, J. C., & Zorzi, M. (2007). Nested incremental modeling in the development
of computational theories: the CDP+ model of reading aloud. Psychological Review, 114(2), 273-
315.
14
Perry, C., Ziegler, J. C., & Zorzi, M. (2010). Beyond single syllables: Large-scale modeling of
reading aloud with the Connectionist Dual Process (CDP++) model. Cognitive Psychology, 61(2),
106-151.
Perry, C., Ziegler, J. C., & Zorzi, M. (2013). A Computational and Empirical Investigation of
Graphemes in Reading. Cognitive Science, 37(5), 800-828.
Peterson, R. L., Pennington, B. F., & Olson, R. K. (2013). Subtypes of developmental dyslexia:
Testing the predictions of the dual-route and connectionist frameworks. Cognition, 126(1), 20-38.
Ziegler, J. C., Bertrand, D., Tóth, D., Csépe, V., Reis, A., Faísca, L., . . . Blomert, L. (2010).
Orthographic depth and its impact on universal predictors of reading: A cross-language
investigation. Psychological Science, 21(4), 551–559.
Ziegler, J. C., Castel, C., Pech-Georgel, C., George, F., Alario, F. X., & Perry, C. (2008).
Developmental dyslexia and the dual route model of reading: Simulating individual differences
and subtypes. Cognition, 107, 151–178.
Ziegler, J. C., Perry, C., & Zorzi, M. (2014). Modelling reading development through
phonological decoding and self-teaching: implications for dyslexia. Philosophical Transactions of
the Royal Society B: Biological Sciences, 369 (1634), 20120397.
15
Appendix. Grapheme-phoneme correspondences used for initial explicit teaching
Grapheme
Phoneme
Grapheme
Phoneme
A
{
Nn
n
Ae
1
O
Q
Ai
1
Oa
5
Au
9
Oe
5
Augh
$
Oi
4
Ay
1
Oo
u
B
b
Ou
6
c
k
ow
6
Ch
J
Oy
4
Ck
k
P
p
D
d
Ph
f
E
E
Pp
p
Ea
i
R
r
Ee
i
Rr
r
Ei
1
S
s
Eigh
1
sh
S
Eu
u
ss
s
Ew
u
t
t
Ey
1
tch
J
F
f
th
T
Ff
f
tsch
J
G
g
tt
t
Gn
n
u
V
H
h
ue
u
I
I
ui
u
Ie
2
uy
2
J
_
v
v
K
k
w
w
Kn
n
wh
w
L
l
wr
r
M
m
y
2
N
n
z
z
Ng
N
Note: Phonemes are in the format of the CELEX database
16
Supplementary Figures
Fig. S2. Predicted versus actual reading performance for all children (mean scores in the leftmost
column) with the multi-deficit, global noise, phonological deficit, and visual deficit model. BIC
= Bayesian Information Criterion.
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Regular Irregular Nonword
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Multi-deficit
Global Noise
Visual Deficit Phonological Deficit
Proportion Correct (Human)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Word Type
BIC = -1152; r2= .67 BIC = -1136; r2= .72 BIC = -513; r2= .63
All BIC = -316; r2= .28
All BIC = -2630; r2=.71
All BIC = -2244; r2= .63
All BIC = -2027; r2= .59
BIC = -1009; r2= .56 BIC = 251; r2= .47 BIC = -74; r2= .49
BIC = -1075; r2= .56 BIC = -897; r2= .62 BIC = -426; r2= 59
BIC = -992; r2= .50 BIC = -938; r2= .64 BIC = -310; r2= .47
Proportion Correct (Model)
17
Fig. S3. Predicted versus actual reading performance for the normally developing children (mean
scores in the leftmost column) with the multi-deficit, global noise, phonological deficit, and
visual deficit model. BIC = Bayesian Information Criterion.
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Regular Irregular Nonword
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Multi-deficit
Global Noise
Visual Deficit Phonological Deficit
Proportion Correct (Human)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
BIC = -640; r2= .47 BIC = -512; r2= .67 BIC = -255; r2= .49
All BIC = -321; r2= .23
All BIC = -1288; r2=.64
All BIC = -1101; r2= .58
All BIC = -985; r2= .54
BIC = -428; r2= .33 BIC = 80; r2= .23 BIC = -207; r2= .27
BIC = -590; r2= .41 BIC = -415; r2= .63 BIC = -213; r2= .43
BIC = -513; r2= .37 BIC = -405; r2= .64 BIC = -171; r2= .33
Proportion Correct (Model)