Getting to the bottom of orthographic depth
Xenia Schmalz
&Eva Marinus
&Max Coltheart
&Anne Castles
Abstract Orthographic depth has been studied intensively as
one of the sources of cross-linguistic differences in reading,
and yet there has been little detailed analysis of what is meant
by orthographic depth. Here we propose that orthographic
depth is a conglomerate of two separate constructs: the com-
plexity of print-to-speech correspondences and the unpredict-
ability of the derivation of the pronunciations of words on the
basis of their orthography. We show that on a linguistic level,
these two concepts can be dissociated. Furthermore, we make
different predictions about how the two concepts would affect
skilled reading and reading acquisition. We argue that refining
the definition of orthographic depth opens up new research
questions. Addressing these can provide insights into the spe-
cific mechanisms by which language-level orthographic prop-
erties affect cognitive processes underlying reading.
Keywords Reading .Orthographic depth .Cross-linguistic
What is orthographic depth?
In the study of reading, it is important to establish to what
extent findings from reading in one language can be general-
ized to another, and what particular experimental results are
specific to the particular orthography used in the studies
(Frost, 2012;Share,2008). In recent decades, cross-
linguistic research in reading has focussed particularly on
the concept called orthographic depth as a source of cross-
linguistic orthographic differences in reading behavior. Broad-
ly speaking, orthographic depth refers to the reliability of
print-to-speech correspondences. English is considered to be
a deep orthography, as there are often different pronunciations
for the same spelling patterns (e.g., Btough^Bthough^
Ziegler, Stone, & Jacobs, 1997). Hence, it has often been
contrasted with Bshallow^orthographies with more reliable
correspondences, such as Serbo-Croatian (Frost, Katz, &
Bentin, 1987; Turvey, Feldman, & Lukatela, 1984), German
(Frith, Wimmer, & Landerl, 1998; Landerl, Wimmer, & Frith,
1997; Wimmer & Goswami, 1994; Ziegler, Perry, Jacobs, &
Braun, 2001), and many others (see Katz & Frost, 1992;
Ziegler & Goswami, 2005 for reviews).
Orthographic depth is relevant for a broad range of issues,
including reading development, developmental and acquired
reading disorders, and theoretical accounts of reading. All
aspects of reading are intrinsically linked to the characteristics
of the orthography, therefore establishing what orthographic
characteristics affect reading processes, and the cognitive
mechanisms via which this occurs, is important for practical
and theoretical reasons. For example, research on reading ac-
quisition has consistently shown that achieving reading accu-
racy is a slower process for children learning to read in deep
compared to shallow orthographies (e.g., Frith et al., 1998;
Landerl, 2000;Seymour,Aro,&Erskine,2003; Wimmer &
Goswami, 1994). To account for these findings, theories of
reading acquisition often consider the role of orthographic
depth, and the challenges that it poses for young readers
(Goswami, 1999; Liberman, Liberman, Mattingly, &
Shankweiler, 1980; Ziegler & Goswami, 2005). The very
mechanisms that underlie reading acquisition might be impor-
tant to different degrees depending on orthographic depth:
numerous studies have shown differences between orthogra-
phies in the strength of various predictors of reading ability.
*Xenia Schmalz
Cognitive Science, Macquarie University, Sydney, NSW, Australia
DPSS, Università degli Studi di Padova, Padova, Italy
Specifically, phonological awareness appears to be a stronger
predictor of reading ability for deep orthographies, as it is
needed to make sense of the complicated print-to-speech con-
version system. Conversely, there is some evidence that rapid
automatised naming is a stronger predictor of reading abilities
in shallow orthographies, as it is important for developing
fluency, an aspect with which poor readers of shallow lan-
guages tend to struggle (e.g., Caravolas et al., 2012; Moll
et al., 2014; Vaessen et al., 2010; Ziegler et al., 2010).
Furthermore, behavioral studies suggest that the symp-
toms associated with developmental dyslexia differ as a
function of orthographic depth (Landerl et al., 1997;
Landerletal.,2013;Wimmer,1996; but see Ziegler, Per-
ry, Ma-Wyatt, Ladner, & Schulte-Körne, 2003). The phe-
notype of dyslexia has been shown to depend on the depth
of the orthography: in deep orthographies, dyslexia is
characterized by inaccurate reading, while in shallow or-
thographies high accuracy can be achieved, but a slow-
ness in reading persists (Wimmer, 1993). Although work
in English has established the presence of various sub-
types of dyslexia (Castles & Coltheart, 1993), it has been
questioned whether these can be applied to more shallow
languages, where the hurdles associated with developing
sound reading skills are different (Bergmann & Wimmer,
2008; Wimmer, Mayringer, & Landerl, 2000). These be-
havioral findings on dyslexia and orthographic depth are
supplemented by neuroimaging data, which have shown
cross-linguistic differences in the brain activation patterns
during reading in dyslexic compared to control readers
(for a recent review, see Richlan, 2014).
In addition, the concept of orthographic depth touches
on issues that are central to debates in the reading literature
in general, such as the extent to which reading processes are
universal or language-specific (Dehaene, 2009;Frost,
2012;Share,2008). Previous research suggests that the
cognitive processes underlying skilled reading are depen-
dent on orthographic depth (Frost, 1994; Frost et al., 1987;
Schmalz et al., 2014; Ziegler et al., 2001). Determining
whether any aspects of the reading process are universal,
and which aspects depend on the characteristics of the or-
thography, has been recently argued to be an essential and
inevitable step in creating models of reading (Frost, 2012).
More specifically, and relating directly to the concept of
orthographic depth, the majority of reading research is
based on English. As has been argued elsewhere, this poses
a threat to the generalizability of this research, especially
since English is considered to be an outlier on the ortho-
graphic depth scale compared to other orthographies
(Share, 2008). Although orthographic depth is not the only
source of variability across orthographies, it has probably
received the most attention in the past decades. Therefore,
understanding what it is and how it affects reading process-
es is of theoretical importance.
It is clear that orthographic depth is an important concept,
and understanding how it relates to reading is pivotal, as it is a
strong source of linguistic variability between alphabetic or-
thographies. Here, we argue that it is currently unclear what
precise mechanisms drive these cross-orthographic differ-
ences, both on a linguistic and behavioral level. We propose
that a more precise definition of orthographic depth is needed
for future research. In particular, answering the question,
Bwhat is orthographic depth,^involves determining, on a lin-
guistic level, what different aspects underlie this concept, and
how these can be quantified. Once a clear definition of ortho-
graphic depth is formulated, current theories and models of
reading can be used to make specific predictions about how
each aspect of orthographic depth might affect skilled reading
and reading acquisition. In the current paper, we discuss the
concept of orthographic depth in three sections. First, we pro-
vide an overview of the previous theoretical work on this
concept (Definitions to date). Then, we propose quantification
methods of the cross-linguistic variability which can be linked
to theoretically important concepts (Quantifications of ortho-
graphic depth). Finally, we outline testable predictions that
can be drawn from our proposed framework (Predictions of
the new orthographic depth framework for theories of
Definitions to date
Existing definitions of orthographic depth
As orthographic depth has been explored for decades, a num-
ber of definitions have been proposed. Originally, the con-
cept was formulated in terms of a compromise between mor-
phological and phonological transparency (Chomsky & Hal-
le, 1968). In orthographies such as English or Dutch, such
compromises are necessary, because the languages are mor-
phologically deep, in that the same morphemes can have
different pronunciations in different contexts. Therefore,
the orthography needs to convey either the morphology or
the phonology of the word: it cannot convey both. For exam-
ple, in English, the words Bheal^and Bhealth^have the same
spelling pattern because they are semantically related, even
though they have different pronunciations. Thus, English
often sacrifices phonological transparency for morphologi-
cal transparency. In Dutch, conversely, the words Blezen^(to
read) and B[ik] lees^([I] read) have different spellings, de-
spite being forms of the same verb. This is because the Bz^in
Blezen^is pronounced as /z/, whereas consonants in the final
position of Dutch words are devoiced; therefore, the pronun-
ciation of the final phoneme of Blees^is /s/, which is repre-
sented by the grapheme Bs^. Here, the Dutch orthography
sacrifices morphological for phonological transparency
(Landerl & Reitsma, 2005).
Originally, the term Bdepth^had two levels, relating either
to morphological or phonological transparency. In the context
of the reading literature, the concept of phonological transpar-
ency has received the most attention (Feldman & Turvey,
1983;Frost,1994; Frost et al., 1987). Katz and Frost (1992),
in a review of the Orthographic Depth Hypothesis (ODH),
provide an overview of the origins of the term, and its rela-
tionship to both morphological and phonological transparen-
cy. Their predictions about how depth would affect reading
processes, however, focused exclusively on the relationship
between orthography and phonology as we will discuss in
detail in a later section.
The relationship between orthography and phonology is
considered to vary as a continuum (e.g., Frost et al., 1987;
Goswami, Gombert, & de Barrera, 1998; Seymour et al.,
2003; Sprenger-Charolles, Siegel, Jiménez, & Ziegler,
2011). This implies that a given orthography can be classified
along a single scale. However, this is only possible if this
concept has an explicit and agreed-on definition, which would
allow for the development of a linguistic quantification
scheme. Arguably, this is currently lacking in the available
literature to date.
There is agreement that orthographic depth refers to the
reliability of the print-to-speech correspondences, but what
exactly differs across orthographies and how this should be
quantified is less clear. Katz and Frost (1992) list three differ-
ent aspects of letter-sound correspondences that could help to
flesh out this definition of orthographic depth: BBecause shal-
low orthographies have relatively simple,consistent,and
complete connections between letter and phoneme, it is easier
for readers to recover more of a printed words phonology
prelexically by assembling it from letter-phoneme
correspondences.^(pp. 71-72). Similarly, in a more recent
paper, Richlan (2014) concurs by describing orthographic
depth as Bthe complexity,consistency,ortransparency of
grapheme-phoneme correspondences in written alphabetic
language^(p. 1). What is now needed are studies concerning
how these different concepts work, whether they can be dis-
tinguished from each other, and how each might be quantified.
We argue that a more specific definition is needed to create
an explicit theoretical framework that accounts for the way in
which orthographic depth influences reading. In order to con-
duct meaningful behavioral cross-linguistic studies, the degree
of orthographic depth of the orthographies which are being
studied needs to be defined apriori, preferably using an ob-
jective linguistic quantification method. This is particularly
important because orthographies differ from each other in
many aspects apart from orthographic depth, such as syllabic
complexity, morphological complexity, orthographic density,
or the proportion of mono- versus polysyllabic words in the
language. Unless the concept of orthographic depth is formal-
ly defined, it is easy to fall into circular reasoning, where any
behavioral differences across orthographies are attributed to
orthographic depth post-hoc, even when there is a possibility
that they are caused by other, uncontrolled language-level
Devising a meaningful quantification method bears further
challenges, because the quantification scheme needs to retain
a link to theoretically and practically meaningful constructs; if
it does not, it becomes unclear what the quantification method
is actually measuring. Therefore, we need to first understand
what constructs underlie orthographic depth and whether the-
se are theoretically important. For a particular linguistic con-
struct, there also needs to be enough variability across orthog-
raphies to make across-language studies meaningful. Then, on
a behavioral level, we need to be able to show a noticeable
effect that is directly associated with the concept of study. In
the following section, we provide an overview of how previ-
ous theoretical work on orthographic depth has used this
Orthographic depth in theories and models of reading
Two theories of reading across languages that are primarily
concerned with orthographic depth are the Orthographic
Depth Hypothesis (ODH; Katz & Frost, 1992) and the Psy-
cholinguistic Grain Size Theory (PGST; Ziegler & Goswami,
2005). Both postulate how orthographic depth would affect
reading processes, the ODH with a focus on skilled reading,
and the PGSTwith a focus on reading acquisition, and provide
some definition of what is meant by orthographic depth. As
mentioned in the previous section, Katz and Frost (1992)dis-
tinguish between three concepts underlying orthographic
depth: in a deep language, the print-to-speech correspon-
dences are complex,inconsistent,andincomplete.Itisunclear,
however, precisely how each of these three aspects relates to
each other, and whether each of them influences reading in
different ways. Katz and Frost (1992) say the following about
the specific mechanism that affects reading processes:
We would like to make two points, each independent of
the other. The first states that, because shallow orthog-
raphies are optimised for assembling phonology from a
words component letters, phonology is more easily
available to the reader prelexically than is the case for
a deep orthography. The second states that the easier it is
to obtain prelexical phonology, the more likely it will be
used for both pronunciation and lexical access. Both
statements together suggest that the use of assembled
phonology should be more prevalent when reading a
shallow than when reading a deep orthography (p. 71,
Katz & Frost, 1992).
It is easy to understand this quote in terms of the complex-
ity of print-to-speech correspondences. Behavioral evidence
has shown that the presence of complex multiletter rules (e.g.,
Bth^,Bsh^) slows down reading aloud latencies of words and
non-words (Rastle & Coltheart, 1998;Rey,Jacobs,Schmidt-
Weigand, & Ziegler, 1998). According to Katz and Frost, this
is what gives more time for the lexical route to access the
lexical information, before the sublexical computation of the
pronunciation is complete.
This brings us to the question of how this mechanism
would function in a case where the sublexical information is
either inconsistent or incomplete. In English, an example of an
inconsistent sublexical unit is the letter string Bough,^which
can be pronounced in six different ways for monosyllabic
words alone. For an inconsistent word, sublexical information
is not sufficient to determine the pronunciation instead, the
orthographic lexicon must be consulted in order to determine
how to pronounce a word containing the inconsistent corre-
spondence (e.g., Bthough^and Bthrough,^which contain near-
ly identical sublexical information, but have different body
The third concept introduced by Katz and Frost (1992)is
incompleteness of the sublexical correspondences. In English,
examples of words with incomplete sublexical information are
heterophonic homographs. A heterophonic homograph, such
as the word Bwind^, has two different pronunciations, each of
which is linked to a different meaning. The sublexical infor-
mation is incomplete, as sentence context is needed to activate
both the correct phonology of the word, and the correct se-
mantic representation. In an orthography such as Hebrew, this
presents a routine computational problem: here, vowels are
mostly not represented in written texts. Many words have
identical consonant constellations, and as a result vowel infor-
mation is needed totell them apart: for example, the consonant
string BDVR^can be pronounced, among other alternatives,
as Bdavar,^meaning Bthing^or Bdever,^meaning
Bpestilence^(Frost & Bentin, 1992). Here, the sublexical,
but also an orthographic-lexical, procedure is insufficient to
retrieve a single pronunciation. Instead, lexical-semantic in-
formation needs to be consulted.
Complexity may lead to a quantitative change in the read-
ing processes by slowing down the sublexical route relative to
the lexical route. In contrast, inconsistency and incomplete-
ness may force reliance on lexical strategies, as access to a
single phonemic representation cannot occur without lexical
information. Thus, the distinction drawn by the ODH between
different aspects underlying orthographic depth requires fur-
ther consideration and empirical work. In particular, it is of
interest and so far, to our knowledge, unexplored whether
these three components would affect reading processes in dif-
ferent ways. This would question the utility of the concept of
orthographic depth as a unified construct, and instead support
the view that it consists of different sub-components.
The PGST emphasizes the role of complex correspon-
dences in driving cross-linguistic differences in the diffi-
culty of acquiring a given orthography. According to the
PGST, children learning to read in a deep orthography at-
tempt to minimize the unreliability of their sublexical cor-
respondences by relying on larger units because these tend
to be more predictive of a words correct pronunciation (at
least in English; Peereman & Content, 1998;Treiman,
Mullennix, Bijeljac-Babic, & Richmond-Welty, 1995). As
a result, children learning to read in a deep orthography
need to learn a greater number of correspondences: chil-
dren in a hypothetical perfectly shallow orthography can
simply learn the letters and their corresponding sounds,
and decode all words with perfect accuracy using only
those small units. According to this view, the necessity to
learn many print-to-speech correspondences in deep or-
thographies slows down the process of reading acquisition,
leading to the well-established behavioral pattern where
children learning to read in a shallow orthography based on
word and non-word reading tasks (Frith et al., 1998;
Landerl, 2000; Seymour et al., 2003;Wimmer&Goswami,
According to the PGST, it then seems that orthographic
depth can be described as the existence of complex correspon-
dence rules that are needed in order to decode new words in a
given orthography. However, we argue that this is not the
whole picture: as Katz and Frost (1992), point out, other prop-
erties of print-to-speech correspondences associated with or-
thographic depth relate to their inconsistency and incomplete-
ness. In the following section, we focus on understanding how
inconsistency can be defined, different theories of skilled
reading have different conceptualisations of what it is and
how it affects reading.
Dissociating inconsistency and complexity
Generally, consistency relates to the presence of more than
one pronunciation for a given letter string. It can be defined
either on the level of a grapheme
(e.g., Bea^is an inconsistent
grapheme, because it can be pronounced as in Bbread^or
Bleak^), or of a body (e.g., B-eak^is an inconsistent body,
because it can be pronounced as in Bbreak^or Bleak^).
consistency terminology is generally associated with connec-
tionist models (Harm & Seidenberg, 1999; Plaut, McClelland,
Seidenberg, & Patterson, 1996; Seidenberg & McClelland,
1989); dual-route models (Coltheart, Rastle, Perry, Langdon,
& Ziegler, 2001) are more concerned with the concept of
regularity, which is defined as compliance to a set of
predetermined grapheme-phoneme correspondence rules.
By Bgrapheme^we mean a letter or group of letters that is the written
representation of a phoneme.
Although consistency can be defined at these two different levels, vir-
tually all the research on effects of consistency on reading (e.g. Jared,
2002) has focused just on body-level consistency.
As we explain below, this distinction is important because
the two classes of models make different assumptions about
how speech is computed from print. Broadly speaking, both
classes of models agree on two points: (1) That there is a non-
lexical procedure which uses knowledge of print-to-speech
regularities to assemble a words pronunciation. This proce-
dure is particularly important for reading non-words (in an
experimental setting) and unfamiliar words (in a real-life set-
ting). And (2) that there are some words for which lexical
knowledge needs to be recruited, because the sublexical rou-
tine will not provide a correct pronunciation (in the dual-route
framework, this includes all words with irregular print-to-
speech correspondences, whereas for connectionist, it is lim-
ited to low-frequency words with exceptional correspon-
dences, e.g., Bmeringue^,Bcolonel^). In order to adopt a the-
oretically neutral framework, we use the term unpredictability
to refer to the degree to which this non-lexical reading route,
essential for reading non-words aloud, correctly translates the
words of the orthography from orthography to phonology.
The dissociation between complexity and unpredictability
on a linguistic level is not straightforward in English. This can
be illustrated with the example of the minimal word pair:
Bgist^and Bgift.^Arguably, the pronunciation of the word
Bgist^is transparent, because it can be determined using the
context-sensitive correspondence that a Bg^followed by an
Bi^is pronounced as /d /. Alternatively, the pronunciation of
the word Bgift,^could also be argued to be transparent, if we
instead apply the simpler rule that the letter Bg^is pronounced
as /g/. Therefore, the pronunciation of the word Bgist^can be
resolved by the use of a complex (context-sensitive) corre-
spondence, but in the English orthography it is also, to some
degree, unpredictable whether this complex rule will apply or
The Bgift^Bgist^example shows that in English, the
complexity and unpredictability of sublexical correspon-
dences are related and confounded, and indeed it is difficult
to dissociate the two. This is not always the case in other
orthographies, however. Both Italian and French contain
the Bg[i]^context-sensitive correspondence. In both these
orthographies there is no unpredictability regarding this
rule, as it always applies, meaning there are no words with
the pattern Bgi^where the Bg^would be pronounced as /g/.
As we will show later, this is important: an orthography that
contains many complex rules which are entirely predictable
is different from an orthography that contains many com-
plex rules but also a great deal of unpredictability. We pro-
pose that complexity and unpredictability are two related
but linguistically and theoretically dissociable concepts.
Thus, we argue that orthographic depth, in the context of
European orthographies, is a conglomerate of two separate
concepts, namely the complexity of sublexical correspon-
dences and the unpredictability of wordspronunciations
given these correspondences.
Defining print-to-speech correspondences
As discussed in the previous section, all computational
models of reading include some kind of mechanism that uses
knowledge of the statistical regularities between print and
speech in a given orthography to assemble a words pronun-
ciation. The implementation thereof, however, varies con-
siderably, and is a source of debate between computational
modellers (Coltheart, Curtis, Atkins, & Haller, 1993;
Ziegler, & Zorzi, 2010;Plautetal.,1996; Seidenberg &
McClelland, 1989). The Dual Route Cascaded
(DRC) model of reading (Coltheart et al., 2001) contains
sublexical rules which are defined as the phoneme which
corresponds most frequently to a given grapheme. The rules
are position-specific: each rule is either valid for all positions
(Bt^/t/ - at least for monosyllabic words), or for the begin-
ning, middle, or end positions (e.g., By^/j/ in the begin-
ning of a word, and By^/ai/in the end of a word). All
sublexical correspondences are grapheme-phoneme corre-
spondence (GPC) rules, meaning that they describe the pro-
nunciation of a single phoneme. GPC rules can also be con-
text-sensitive: for example, Bg^is pronounced as /d / when
followed by an Bi^(Bg[i]^/d /). Even those rules are GPCs,
because they relate to a single phoneme (in this case, /d /).
In contrast to the DRC and its GPC rules, triangle models
develop sensitivity to units that are larger than graphemes,
thereby also showing sensitivity to marker effects that are as-
sociated with Blarge^units, such as bodies (Harm &
Seidenberg, 1999;Plautetal.,1996; Seidenberg &
McClelland, 1989). Connectionist Dual Process (CDP)+-type
models (Perry et al., 2007;Perry,Ziegler,&Zorzi,2010) rep-
resent a compromise between the DRC and triangle models:
although the sublexical route is based on graphemes, it also
develops sensitivity to the letters surrounding a given corre-
spondence due to learning in a two-layer associative network.
The reliance on letter clusters that are larger than graph-
emes in the triangle models blurs our proposed distinction
between complexity and unpredictability: Given sufficient
training, the orthography-to-phonology mapping process will
establish orthography-phonology connections between differ-
ent types of units (e.g., Plaut et al., 1996). This will make even
awordlikeBache^(cf. Bcache^) predictable (via the whole-
word correspondence that Bache^maps onto k/, and
Bcache^onto /kæ /). In practice, the amount of training which
the triangle models undergo is not sufficient to establish
whole-word representations of low-frequency words, such
that the orthography-to-phonology pathway will struggle with
low-frequency irregular words. Due to the shared-labour na-
ture of the two routes, semantic processing will be activated
during any reading task, but in the case of low-frequency
irregular words the semantic route is required for a correct
Psychon Bull Rev
In summary, the DRC model has a sublexical route
which operates on a set of pre-determined print-to-
speech correspondences. As long as a word complies to
the print-to-speech correspondences, the sublexical route
will be able to provide a correct output. If the word is
irregular (i.e., it does not comply to the rules), lexical
knowledge needs to be consulted. The triangle models,
conversely, develop reliance on units that are larger than
graphemes, meaning that even high-frequency irregular
words can be read aloud by the orthography-to-
phonology route. While a DRC-like model could be mod-
ified to contain larger-than-grapheme units, these would
remain completely independent of any lexical process
(e.g., see footnote 1 of Coltheart et al., 1993). This dis-
tinction between larger sublexical units and whole-word
orthography-phonology correspondences is not explicit in
the triangle framework. However, there is a qualitative
difference between sublexical clusters and whole-word
units, namely that whole-word units map onto semantic
information while sublexical units, by definition, do not.
Therefore, we propose that within any framework, resolv-
ing print-to-speech ambiguities by relying on larger-than-
grapheme sublexical (i.e., complex) units, and resolving
ambiguities by relying on whole-word units (for unpre-
dictable words) are qualitatively different strategies, in
that the latter inevitably involves excitatory activation of
lexical-semantic processes.
For the sake of simplicity, we provide definitions of
complexity and unpredictability which are in line with
the DRC terminology. As explained above, this does not
mean that our framework is incompatible with other
models of reading, as all models can accommodate both
the presence of complex correspondences and words with
unpredictable pronunciations. We refer to a correspon-
dence between orthography and phonology as complex if
either the orthographic element involved consists of more
than one letter (e.g., Bth^/θ/), or if the correspondence
is context-sensitive (Bg[i]^/d /), or if both are true
Unpredictable words can be defined as words where the
sublexical route provides an incorrect pronunciation. In the
DRC framework, such words are termed irregular words
(e.g,. Andrews, 1982; Content, 1991; Rastle & Coltheart,
1999; Schmalz, Marinus, & Castles, 2013). Within the con-
nectionist models the concept of consistency is stressed.
Although consistency differs from regularity, in the context
of the current review it reflects the predictability of a word:
a consistent word is defined as one where Bits pronuncia-
tion agrees with those of similarly spelt words^(Plaut et al.,
1996, p. 59). This reflects the mechanisms by which the
pronunciation of a word is assembled in a connectionist
model: as the sublexical route operates based on statistical
regularities which are derived from the print-to-speech
correspondences of real words, unpredictable words in this
framework are those which have a different pronunciation
to similarly spelled words.
In summary, as a working definition, we refer to complex
correspondences as those that are multi-letter (Bth^/θ/)
and/or context-sensitive (Bg[i]^/d /), and to unpredictable
words as irregular words given the set of GPCs that are im-
plemented in the DRC. Given that the definitions are arguably
biased towards the DRC framework, we seek for convergence
in alternative approaches for all our findings in the following
Quantifications of orthographic depth
Existing measures of orthographic depth, and their
relation to complexity and unpredictability
Having distinguished between complex versus unpredict-
able correspondences, we can attempt to devise a quan-
tification method for each of these on a linguistic level.
If these represent two separate concepts underlying or-
thographic depth, the first step is to demonstrate that
they vary independently across orthographies. This will
firstly show whether there is enough independent varia-
tion of the two concepts to warrant practically meaning-
ful investigation on a behavioral level, and secondly will
provide insights as to how orthographic depth may be
quantified. Although large-scale linguistic corpus analy-
ses are outside the scope of the current review, we pro-
vide some suggestions that can be expanded on by future
work. We discuss and expand on previous quantification
methods, and consider their advantages and disadvan-
tages. In terms of demonstrating that there is a dissocia-
tion between complexity and unpredictability, we refer to
a computational-model-driven approach (Ziegler, Perry,
& Coltheart, 2000), and a linguistic-corpus-analysis ap-
proach (van den Bosch, Content, Daelemans, & de
Gelder, 1994). We also discuss two commonly taken ap-
proaches to determine the relative depth of a given or-
thography, namely subjective consensus among experts
(e.g., Frost et al., 1987; Seymour et al., 2003), and the
onset entropy measure (Borgwaldt, Hellwig, & de Groot,
Given our DRC-based definitions of complexity and un-
predictability, it is intuitive to start by using the existing ver-
sions of the DRC across orthographies as an attempt to illus-
trate cross-linguistic differences given the GPC rules. Specif-
ically, we can simply take the numbers and proportions of
complex rules, and the proportion of irregular words in the
DRCs of the orthographies in which it has been implemented.
The number of complex rules (those which are multi-letter
and/or context-sensitive) is a measure of complexity as per our
working definition.
This approach of comparing the number and types of GPC
rules across orthographies, and the degree to which they are
sufficient to read aloud words in a given orthography, has been
also taken by Ziegler et al. (2000) when they implemented the
DRC in German. They found that both the number of rules
and especially the number of complex rules and the percent-
age of irregular words was higher in English than in German.
This is in line with the general consensus that German is a
shallow orthography, and English is deep. The DRC has also
been implemented French (Ziegler, Perry, & Coltheart, 2003),
Dutch, and Italian (C. Mulatti, personal communication, 25
May 2014), which allows us to list the numbers, proportions
and types of rules in these five DRCs.
The results of this
analysis are presented in Table 1.
Tab le 1shows, as expected, that English is a Bdeep^or-
thography, in that it has many rules, and a particularly high
percentage of irregular words, while Dutch and German are
Bshallow^, in that they have few rules and a small proportion
of irregular words. Interestingly, the DRC approach places the
French orthography at one end of the continuum for the
number/percentage of complex rules (complexity) accord-
ing to which French appears to be even more complex than the
English orthography and at the other end of the continuum
for the percentage of irregular words (unpredictability)
where French appears to be even more predictable than Ger-
man and Dutch. This shows that the distinction between the
two concepts is meaningful, as they are not perfectly correlat-
ed between orthographies. The Italian DRC shows an even
smaller number of rules, and a larger proportion of single-
letter rules, consistent with the notion that it is an extremely
shallow orthography (e.g., Tabossi & Laghi, 1992).
Although the DRC approach offers insights into the rela-
tive positions of the four orthographies on the two continu-
ums, there are three reasons why this approach is limited.
Firstly, current versions of the DRC are based on monosyllab-
ic words only. In some languages, the proportion of monosyl-
labic words is relatively high; for others, polysyllabic words
form the majority of all words. This is problematic for across-
language comparisons. Furthermore, even in languages where
monosyllabic words are frequent, structural properties vary
between monosyllabic and polysyllabic words. Therefore,
monosyllabic words are not a perfectly representative sample
of any orthography (for a review, see Protopapas & Vlahou,
2009). Although the DRC approach may still be useful to
determine the relative position of each orthography in terms
of orthographic complexity and unpredictability, it would be
valuable to replicate these findings with an approach which is
not limited to monosyllabic words.
Secondly, the cross-linguistic versions of the DRC were
implemented independently of each other, without the aim
of comparing them directly to each other. For example, the
number of words in the DRCs lexicons varies extensively,
with 4583 words for Dutch, 8027 words for English, 2245
words for French, and 1448 words for German. The varying
number of words in the DRCs may also reflect the relative
percentage of monosyllabic words in each language.
Thirdly, it is not established that the GPC rules that are
implemented in the DRC have full psychologic reality. Indeed,
there is evidence that other sublexical units are used during
reading in English (Glushko, 1979; Treiman, Kessler, & Bick,
2003), German (Perry, Ziegler, Braun, & Zorzi, 2010;Schmalz
et al., 2014) and French (Perry, Ziegler, & Zorzi, 2014). Al-
though this does not mean that the DRCs cannot be used as a
tool to capture linguistic variability in the complexity and pre-
dictability of print-to-speech correspondences by using GPCs
and irregular words as a proxy, it is, again, desirable to find
converging evidence from a different approach.
Such converging evidence can be found from a computa-
tional study of a linguistic corpus of English, Dutch, and
French (van den Bosch et al., 1994). The corpuses used in this
study included polysyllabic words as well as monosyllabic
words. In addition, this paper predates the DRCs, and has
not been conducted within the framework of the any particular
theory or model. The approach of this paper was data-driven,
and the authors made no aprioripredictions about the results.
Van den Bosch et al. (1994) conclude that orthographic
depth can be dissociated into two separate measures: the dif-
ficulty of parsing letter strings into graphemes on the one
hand, and the degree of redundancy in the print-to-speech
correspondences on the other hand. The former which we
equate with our concept of complexity was measured by
applying a computationally obtained parsing mechanism to a
set of test words. In an orthography with simple correspon-
dences, parsing is easier, because in many cases parsing a
The working definition ignores two additional source of GPC complex-
ity. The first stems from the position-specificity of rules. As position-
specific rules are implemented as separate GPC rules, as opposed to rules
which apply to all positions, the presence of position-specific GPC rules
inflates the number of rules overall, and therefore this source of complex-
ity is reflected in the total number of rules (see Table 1). A second source
of GPC complexity is the presence of phonotactic rules, which are
context-dependent rules where the influence of a certain phoneme which
precedes or succeeds a letter influences its pronunciation. Such rules are
particularly common in some orthographies, such as Russian, but have
not been researched to a great extent in other orthographies. For example,
neither the current implementation of the German, Dutch, or Italian DRC
contain any phonotactic rules, even though these might be more suitable
to describe numerous aspects of the print-to-speech conversion. We there-
fore excluded phonotactic GPC rules in Table 1.
Table 1lists the number as well as the percentages of each type of GPC
rule. From a theoretical standpoint, saying that a language has a large
number of rules is different from saying that the rules of a given language
are complex. In the set of orthographies that we present here, the number
of rules and percentage of complex rules are highly correlated, therefore
in practice they cannot be used to dissociate between the two different
word into letters would enable the correct mapping of graph-
emes to phonemes: an English word with simple correspon-
dences, like Bcat^, would be parsed into Bc^,Ba^and Bt,^
which can be mapped correctly onto the phonemes of /k/, /
æ/, and /t/; a word with complex correspondences, such as
Bchair^would instead need to be parsed as Bch^and Bair,^
because the constituent letters (Bc^,Bh^, etc.) do not map onto
the correct phonemes. Since for each of the three orthogra-
phies the same amount of training was used, differences in
parsing accuracy of untrained test words reflect the difficulty
of parsing. Parsing accuracy, overall, was low, indicating that
all three orthographies are characterised by high complexity.
French showed the lowest level of accuracy, while Dutch and
English were at approximately the same level (see Table 1).
For quantifying the degree of redundancy, van den Bosch
et al. (1994) report the generalization performance, or the num-
ber of test words pronounced correctly by a computationally
obtained set of print-to-speech correspondences in the three
orthographies. In order to obtain these correspondences, they
first derived all possible print-to-speech correspondences of all
sizes (ranging from single letters to whole words). Then they
compressed the set of correspondences for each orthography to
reduce the redundancy among these rules (e.g., knowing the
correspondences Ba^/æ/ and Bt^/t/, as well as B-at^/
æt/, is redundant; knowing the correspondences Ba^/æ/, Bl^
/l/, and Bm^/m/, as well as B-alm^/m/isnot).
The results showed that both Dutch and French
outperformed English, meaning that there are many English
words that do not comply with these rules. The generalization
measure is reflective of unpredictability: given the set of cor-
respondences that were defined through the compression al-
gorithm, a large number of words in the English orthography
were still unpredictable. The predictability, according to this
measure, was higher in Dutch and French than in English. The
summary of both the variables is presented in Table 1.
To our knowledge, the quantification scheme of van den
Bosch et al. (1994) has not been used to study behavioral
differences in the effects of orthographic depth, nor has it been
applied to other orthographies. This is an important direction
for future research. For our current purposes, it is particularly
interesting that the two concepts which van den Bosch et al.
(1994) suggest as underlying orthographic depth based on
their linguistic-computational analysis are consistent with the
results of the DRC, and our distinction between complexity
and unpredictability. It is worth noting that the results of the
DRC approach and the analysis of van den Bosch et al. (1994)
converge despite the DRCslimitation to monosyllabic words
only. This suggests that for the orthographies studied, general
findings about complexity and unpredictability show broadly
the same patterns for monosyllabic wordscompared to a wider
sample of words.
The case of French is particularly interesting: Both the
DRC approach and the analysis of van den Bosch et al.
(1994) classified French as a relatively complex orthography
(many complex GPC rules, low generalization performance)
even compared to English. Conversely, both approaches clas-
sified French as a predictable orthography even compared to
Dutch and German. In previous work on orthographic depth,
French has often been described as an intermediate orthogra-
phy (Goswami et al., 1998; Paulesu et al., 2001; Seymour
et al., 2003; Sprenger-Charolles et al., 2011). The French or-
thography, therefore, shows the importance of distinguishing
between the two concepts, as a failure to do so provides a
different picture.
This intuitive classification of French as an orthography of
intermediate depth has been supported by some of the previ-
ous quantification schemes, which did not make the distinc-
tion between complexity and unpredictability. For example,
Seymour et al. (2003) classified 13 European orthographies
based on their degree of depth.
They consulted researchers in
Seymour et al. (2003) draw a distinction between orthographic depth
and complexity, but what is meant by complexity is not what we mean by
the same term: they refer to the presence of consonant clusters (i.e.,
syllabic complexity), that do not necessarily map onto the same phoneme
(e.g., Bstr^). This does not relate to orthographic depth, but constitutes a
different dimension.
Tabl e 1 Measures of complexity and unpredictability for Dutch, English, French, German, and Italian based on the DRCs, and from van den Bosch
et al., (1994)forcomparison
Measure Dutch English French German Italian
Total number of rules (DRC) 104 226 340 130 59
Single-letter rules (DRC) 51 (49.0 %) 38 (16.9 %) 46 (13.5 %) 44 (33.8 %) 19 (32.2 %)
Multi-letter rules (DRC) 42 (40.4 %) 161 (71.2 %) 218 (64.1 %) 55 (42.3 %) 8 (13.6 %)
Context-sensitive rules (DRC) 11 (10.6 %) 27 (11.9 %) 76 (22.4 %) 31 (23.8 %) 32 (54.2 %)
Irregular words (%) 6.3 16.9 5.6 10.5 NA
Parsing accuracy (%) 21.3 24.5 12.9 NA NA
Generalization accuracy (%) 81.4 54.3 89.1 NA NA
Note: For the DRC, the numbers represent the number of rules of each type, and the percentage out of the total number of rules in brackets. Results for
parsing accuracy and generalisation accuracy (defined below) are taken from van den Bosch et al. (1994)
participating countries and ranked the orthographies in terms
of their depth based on a more intuitive approach. This landed
French in an Bintermediate^position. It seems, therefore, that
this intuitive approach Baverages out^potentially theoretically
relevant distinctions between separate concepts underlying
orthographic depth.
A more objective approach, which has been picked up by
cross-linguistic researchers, has been introduced by a measure
called onset entropy (Borgwaldt et al., 2004,2005). This
quantification scheme reflects the number of different ways
in which the initial letter of a word, on average, can be pro-
nounced in a given orthography. Initial letters which consis-
tently map onto the same phoneme involve no ambiguity, so
they are assigned a value of 0. The greater the number of
possible pronunciations of the letter, the higher the entropy
value. Borgwaldt et al. (2005) calculated the entropy values
for initial letters across orthographies. The average onset en-
tropy for each orthography was then considered to reflect its
relative degree of orthographic depth.
This measure has intuitive appeal, and has been used in
large-scale behavioral studies of cross-linguistic differences
(Landerl et al., 2013;Molletal.,2014; Vaessen et al., 2010;
Ziegler et al., 2010). One of its advantages is the focus on the
first letter only. Firstly, this eliminates the bias towards mono-
syllabic words that is present in the DRC and some other
approaches. Secondly, it also increases the comparability
across orthographies, because words in all orthographies have
initial letters (Borgwaldt et al., 2005; Ziegler et al., 2010).
Still, neglecting additional information in a word provides
other problems. In English, for example, it is often the vowel
pronunciation that is unpredictable, and vowels occur more
frequently in the middle of a word (Treiman et al., 1995). In
French, print-to-speech irregularities occur mostly in the final
consonants, which are often silent (Lete, Peereman, & Fayol,
2008; Perry et al., 2014; Ziegler, Jacobs, & Stone, 1996).
We provide two examples that showthat although the onset
entropy measure is a useful first step in quantifying ortho-
graphic depth, it confounds orthographic complexity with un-
predictability, meaning that it does not provide the whole pic-
ture. According to the onset entropy measure, French (with a
value of 0.46) is about half-way between English (0.83) and
Bshallow^orthographies such as Finnish (0.0) and Hungarian
(0.17) (Ziegler et al., 2010). Like the subjective rankings de-
scribed above, it seems therefore that this approach to quanti-
fying orthographic depth averages out two different sources
that underlie this construct.
Another example is the German orthography. Table 1
shows that according to the DRC measure, German has a high
degree of predictability: although some context-sensitive rules
are required, these allow the sublexical route to read aloud
90 % of all monosyllabic words correctly. This contrasts with
the results from the onset entropy measure, which classifies it
as relatively deep: the onset entropy value for German is
higher (reflecting higher degree of depth) than that of Dutch,
Hungarian, Italian, and even Portuguese, and only slightly
lower than French (Borgwaldt et al., 2005). This goes against
both the results of the DRC, and the intuitive notion that Ger-
man is close to the shallow end of the orthographic depth
continuum. (Frith et al., 1998; Goswami, Ziegler, Dalton, &
Schneider, 2003; Landerl, 2000; Landerl et al., 1997; Sey-
mour et al., 2003; Wimmer & Goswami, 1994;Wimmer
et al., 2000; Ziegler et al., 2001; Ziegler, Perry, Ma-Wyatt,
et al., 2003).
Similar to the example of French, this counter-intuitive
finding can be explained by the distinction between complex-
ity and unpredictability: the relative complexity of the German
orthographic system inflates the onset entropy value, despite
Germans relatively high degree of predictability. For exam-
ple, German words starting with the letter Bs^can have the
first phoneme /z/, / /, or /s/. The pronunciation is, however,
predictable: in the onset position, when Bs^is succeeded by a
vowel, it is pronounced as /z/, when it is succeeded by Bp^or
Bt^or is part of the grapheme Bsch^,itispronouncedas//,
and in all other cases it is pronounced as /s/. The two examples
of French and German show that onset entropy thus has no
way of distinguishing between correspondence complexity
and unpredictability, and instead Baverages out^the two di-
mensions, thus making French and German appear to be
Bintermediate^orthographies despite their relatively high
In summary, we have described two separate approaches
that both suggest that orthographic depth is not a single con-
cept, but can be dissociated into complexity and unpredictabil-
ity of the print-to-speech correspondences. One was intro-
duced two decades ago (van den Bosch et al., 1994), but to
our knowledge it has not been extended to other orthographies
or formed the basis of behavioral research. For the purpose of
the current paper, the study is valuable because the data-driven
computational-linguistic study by van den Bosch et al. (1994)
led them to the same conclusions as our theory-based DRC
approach. This strengthens the position that on a linguistic
level, orthographic depth can be dissociated into two separate
Limitations and open questions for further research
Our definition of orthographic depth was conceptualized with
the aim of being specific, as this is essential for an objective
classification measure and for precise predictions about be-
havior, based on theories and models of reading. The speci-
ficity of our definitions comes with the trade-off that it does
not capture all sources of cross-linguistic variability. For ex-
ample, unpredictability, when defined at the level of the print-
to-speech correspondences, ignores two further sources of un-
predictability that exist in alphabetic orthographies, namely
Psychon Bull Rev
incompleteness and irregularities associated with lexical stress
As discussed earlier, a previous definition of orthographic
depth also included the concept of incompleteness (Katz &
Frost, 1992). Incomplete sublexical information, within our
framework, makes the pronunciation of a word unpredictable
for the sublexical route, but also requires the use of contextual
semantic information to access a single phonological and se-
mantic entry. Given the need to rely on semantic context to
resolve this type of unpredictability, it is possible that the
incompleteness of the sublexical correspondences presents a
qualitatively different problem compared to complexity and
consistency. Ifthis is the case, placing an orthography which is
characterised by incomplete correspondences, such as He-
brew, on the same continuum as the European orthographies,
might not be particularly meaningful.
Another source of unpredictability that varies across or-
thographies, but is not captured by any of the previous quan-
tification schemes, is lexical stress assignment. Some orthog-
raphies, such as French, have entirely predictable stress as-
signment, but others, such as English (Rastle & Coltheart,
2000; Seva, Monaghan, & Arciuli, 2009), Greek (Protopapas,
Gerakaki, & Alexandri, 2006), Russian (Jouravlev & Lupker,
2014,2015), and Italian (Burani & Arduino, 2004;Colombo,
1992), have some ambiguity when it comes to determining the
position of the stressed syllable, and lexical-semantic knowl-
edge needs to be recruited to resolve these conflicts. In En-
glish, for example, the word Bentrance^a different meaning
depending on whether the first or the second syllable is
stressed. It is still, to some extent, unclear via what mecha-
nisms stress irregularity affects reading (Sulpizio, Arduino,
Paizi, & Burani, 2013; Sulpizio, Burani & Colombo, 2015),
and how it relates to GPC irregularity. Therefore, this leaves
open questions for future research: for example, to what extent
can stress assignment be predicted, how this differs across
orthographies, and what cognitive mechanisms are used to
resolve ambiguities underlying stress assignment.
Defining language-level differences across orthographies
becomes even more complicated when we consider non-
alphabetic orthographies: both the languages and the ortho-
graphic systems of Chinese or Japanese, for example, are so
different to the alphabetic orthographies that we consider here,
that classifying and comparing them along the same continuum
is not possible. In addition to differences in the nature of the
process by which speech is computed from print, non-
alphabetic orthographies may differ in terms of the visual com-
plexity, morphological principles, and evendefinitions of word
boundaries (Chang, Maries, & Perfetti, 2014;Cuietal.,2012;
Huang & Hanley, 1995; McBride-Chang et al., 2012).
Therefore, we believe that the most valuable studies in
terms of getting to the bottom of orthographic depth would
involve the following: (1) Cross-linguistic comparisons,
where two orthographies which are similar on as many aspects
as possible, but different on the particular issue of interest.
Given the difficulty in doing this, a single comparison involv-
ing only two orthographies should not be taken at face value,
and needs to be replicated with other orthographies. (2)
Within-language studies can be conducted to isolate the par-
ticular aspect of orthographic depth that is proposed to drive
cross-linguistic differences. For example, Frost (1994)com-
pared marker effects of lexical-semantic processing in pointed
Hebrew, where the sublexical information is complete, to
unpointed Hebrew, where the sublexical information for the
same words is incomplete. In line with the Orthographic
Depth Hypothesis, the unpointed script showed stronger
lexical-semantic marker effects than pointed Hebrew, in a de-
sign that controlled for any cross-linguistic differences that
may exist in an across-language design.
Predictions of the new orthographic depth
framework for theories of reading
Some key studies within the new framework
Previous research on orthographic depth has been conducted
without bearing in mind the distinction of complexity and pre-
dictability. Therefore, this work is often subject to more than one
interpretation, depending on whether behavioral differences are
proposed to arise as a function of complexity, or as a function of
unpredictability. We review two previous key studies on ortho-
graphic depth and illustrate how different conclusions may be
drawn depending on how orthographic depth is defined.
A key finding supporting the Orthographic Depth Hypoth-
esis (ODH) comes from a study of the frequency and lexicality
effects, and a semantic priming manipulation (Frost, Katz, &
Bentin, 1987). The orthographies explored in this study were
Hebrew (deep), English (medium-deep), and Serbo-Croatian
(shallow). Indeed, there was an increase in the size of the
lexical-semantic effects associated with increasing ortho-
graphic depth, suggesting stronger involvement of the
lexical-semantic route.
Upon closer inspection, it is not clear whether this can be
attributed to orthographic complexity or unpredictability. The
Hebrew orthography is characterized by incompleteness,
therefore these results show that incompleteness of print-to-
speech correspondences leads to increased reliance on the
lexical-semantic route, but remain silent about complexity
and unpredictability. English and Serbo-Croatian differ to
each other on both complexity and unpredictability, therefore
the difference between these two orthographies can be attrib-
uted to either. If unpredictability drives the increased involve-
ment of the lexical-semantic route, this means that lexical
knowledge is recruited because sublexical information is not
sufficient to assemble a fully-specified phonological represen-
tation. If complexity drives these behavioral differences, this
Psychon Bull Rev
means that the presence of complex information slows down
the processing of the sublexical route, which allows for more
involvement of the lexical route. Thus, although the outcomes
of the two scenarios are identical, the mechanisms that lead to
this end state are different. As a result, we do not know wheth-
er indeed complexity has the effect of increased reliance on
lexical-semantic processing, or whether this is specific to in-
completeness and unpredictability. In future research, this
question could be addressed by comparing complex but pre-
dictable orthographies, such as French, to simple and predict-
able orthographies, such as German or Dutch. If increased
lexical-semantic processing is associated with unpredictabili-
ty, but not complexity, we would expect to find similar lexical-
semantic marker effects when we hold predictability constant.
Key evidence for the Psycholinguistic Grain Size Theory
(Ziegler & Goswami, 2005) stems from a study comparing the
size of the length and body-N effects in English and German
(Ziegler et al., 2001). The length effect was stronger in Ger-
man compared to English, and the body-N effect was weaker
in German compared to English, suggesting differences in the
nature of the sublexical processing underlying reading in the
two orthographies. German and English differ to each other on
both complexity and unpredictability, so it is unclear which
aspects of these writing systems drive the behavioral differ-
ences. An increased body-N effect in English compared to
German may reflect a difference in the nature of the sublexical
processing (as suggested by Ziegler et al., 2001). If the dom-
inant functional sublexical units of English are bodies, this
would mean that the sublexical units are more complex. Ac-
cording to this interpretation, the results of the body-N effect
reflect a difference in the complexity of sublexical correspon-
dences. An alternative explanation is that the unpredictability
of English encourages a qualitatively different reading strate-
gy compared to German, namely an increased reliance on
lexical analogy. In this case, a German reader might tend to-
wards reading words and non-words via the sublexical corre-
spondences, whereas an English readers relies to a greater
extent on similar lexical entries. Thus, English readers might
show a stronger body-N effect compared to German readers,
because they are facilitated by the presence of orthographical-
ly similar words. Again, studies with orthographies which are
matched on complexity but differ in terms of predictability
(e.g., English French) or vice versa (e.g., German French)
might be used by future research to dissociate between effects
that are associated with each of these two constructs.
In summary, the next step for future research will be to
conduct behavioral studies to establish the extent to which
the two dimensions affect reading processes. This opens up
a plethora of new research questions about the mechanisms
via which the variables underlying orthographic depth inde-
pendently affect reading, or learning to read. We can show that
the distinction between complexity and unpredictability of
sublexical correspondences is theoretically meaningful if,
based on existing models of reading, there are different pre-
dictions about how the two constructs affect cognitive pro-
cessing. To this end, we provide an overview of specific pre-
dictions within existing models of reading about how both
complexity and unpredictability, as defined above, might af-
fect both skilled reading processes and reading acquisition.
Predictions for complexity and unpredictability in adults
In skilled adult readers, the ODH proposes that complexity
slows down the sublexical assembly process, which gives
more time for the lexical route to access the relevant word
information. Based on computational models of reading, we
would indeed expect that the complexity of print-to-speech
correspondences should affect the speed of sublexical assem-
bly. Simulations with the DRC, as well as behavioral data,
have shown that non-words which contain multi-letter GPCs
(Bboace^) are processed more slowly than non-words of equal
length, but containing only simple correspondences (Marinus
&deJong,2011; Rastle & Coltheart, 1998; Rey et al., 1998;
Rey, Ziegler, & Jacobs, 2000). This is postulated to occur
because the sublexical route in the DRC operates in a serial
fashion. When reading an item containing a multi-letter rule, it
activates the first letter of the digraph and its equivalent pro-
nunciation. This pronunciation needs to be inhibited once the
second letter starts being processed, because the two letters are
then parsed into a two-letter grapheme which has a different
To our knowledge, it has not yet been explicitly shown that
the slowing-down of sublexical assembly leads to an in-
creased reliance on the lexical procedure, as stated by Katz
and Frost (1992), but this is a question that can be easily
addressed by future empirical research. One set of predictions
that would follow is that, in words containing complex corre-
spondences, lexical and semantic markers such as frequency
or imageability effects should be stronger compared to words
containing simple correspondences only.
The concept of unpredictability, and its effect on single-
word reading, has already been addressed in detail by compu-
tational modellers in the form of a debate between regularity
and consistency. As explained in an earlier section, this debate
reflects different mechanisms that are used by the sublexical
routes of computational models to derive the pronunciation of
a letter string. Importantly, most studies have shown an inhib-
itory effect of unpredictability, based on their measure of
choice (Andrews, 1982; Hino & Lupker, 2000;Jared,1997,
2002; Jared, McRae, & Seidenberg, 1990;Metsala,
Stanovich, & Brown, 1998; Parkin, McMullen, & Graystone,
1986;Rastle&Coltheart,1999; Waters & Seidenberg, 1985;
Waters, Seidenberg, & Bruck, 1984).
Open questions remain about the relationship between con-
sistency and completeness. While scenarios of inconsistency
can be resolved by relying on a non-semantic lexical route,
Psychon Bull Rev
access of phonological information for incomplete words re-
lies on semantics. Thus, it is of interest whether semantic
effects may be stronger for words with incomplete versus
inconsistent correspondences. This has theoretical implica-
tions for models of reading as well as defining orthographic
depth: a difference between triangle models and the dual-route
models (both the DRC and CDP+-type models) is that the
latter contain a purely orthographic lexical route that bypasses
the activation of semantics. In triangle models, such a route is
non-existent: all lexical activation passes through a semantic
route. Therefore, dual-route models may predict a difference
in the strength of semantic marker effects between unpredict-
able and incomplete words. Conversely, triangle models re-
quire semantics for both types of items, and predict equally
strong semantic effects for both inconsistent and incomplete
In this section, we have listed several testable predictions,
based on models of skilled reading, which can be explored by
future research using either within- or across-language de-
signs. This will contribute to the understanding of the precise
cognitive mechanisms which drive the cross-orthographic dif-
ferences that have been previously attributed to the broad con-
cept of orthographic depth.
Theories of reading acquisition and orthographic depth
There are fewer specified models of reading acquisition than
models of adult reading. In the case of exploring the effect of
orthographic depth on reading acquisition and making specific
predictions, computational models could be particularly use-
ful. Connectionist-type models, for example, use a learning
algorithm that extracts the regularities in the correspondences
between print and speech. Thus, they are faced with a similar
problem as a child learning to read (Hutzler, Ziegler, Perry,
Wimmer, & Zorzi, 2004). If, for the sake of simplicity, we
focus purely on the acquisition of sublexical skills, we can
make clear predictions about complexity and unpredictability.
As stated by the Psycholinguistic Grain Size Theory (PGST),
complexity of the sublexical correspondences should make it
more difficult to acquire these for children learning to read
(Ziegler & Goswami, 2005). This means that becoming pro-
ficient at using sublexical decoding should take longer in an
orthography with complex correspondences than in an orthog-
raphy with simple correspondences.
In terms of learning the sublexical correspondences, we can
also make clear predictions about unpredictability that should
be testable both with a connectionist-type model and in chil-
dren learning to read. If we were to pick two orthographies
that are comparable in terms of complexity, but different in
terms of predictability (e.g., French and English), we would
expect that learning these correspondences would take the
same amount of time. However, after the correspondences
are learnt, we would expect that the accuracy in applying these
correspondences to new words would be higher in the more
predictable orthography.
Both behavioral and computational data from English and
German provide some support for this set of predictions. For
example, behavioral data has shown that non-word reading
accuracy is higher for German than English children (Frith
et al., 1998; Landerl, 2000; Ziegler, Perry, Ma-Wyatt, et al.,
2003). Thisholds true even when a lenient marking criterion is
used for English, whereby any plausible pronunciation of the
non-word is scored as correct. Similar data has also been ob-
tained from comparisons of English with other shallow or-
thographies, most notably from a large-scale study which in-
cluded children from 13 European countries (Seymour et al.,
2003). A computational study has compared the performance
of a sublexical learning algorithm in German and English
(Hutzler et al., 2004). In these simulations, the modelsnon-
word reading accuracy of German exceeded that of English,
even after a large number of training cycles when the models
had reached a plateau.
A limitation of both the computational and behavioral com-
parisons is that the orthographies differ to English both in
terms of complexity and unpredictability. Therefore, although
the existing data is suggestive, we cannot unequivocally attri-
bute the differences in non-word reading accuracy to
It is also important to bear in mind that the sub-skills un-
derlying reading do not develop in isolation. In particular,
there is a bidirectional relationship between the acquisition
of the lexical and the sublexical route (Share, 1995;Ziegler,
Perry, & Zorzi, 2014): lexical entries are predominantly
established by a self-teaching mechanism which uses knowl-
edge of the sublexical correspondences to decode unfamiliar
words, but lexical entries are also used to refine the knowledge
of sublexical correspondences. Due to this bidirectional rela-
tionship, we expect that high complexity will not only delay
the acquisition of sublexical skills, but also the build-up of
orthographic entries. As a result, complexity should lead to a
quantitative difference where, for example, French children
should lag behind German children, but eventually reach the
same level of decoding accuracy for both words and non-
In the case of unpredictability, there could be some
qualitative differences in the mechanisms that are used for
self-teaching, as lexical involvement is necessary to resolve
ambiguous pronunciations. Recent within-language studies of
orthographic learning provide some support for this notion,
and in particular for the use of semantics (Taylor, Plunkett,
A simulation study like that by Hutzler et al. (2004) could also be used
to test the claims we make about the effect of complexity on learning to
read. However, as Hutlzer et al. used a lenient marking criterion to eval-
uate the modelsnon-word reading performance, it is hard to make direct
comparisons based on their reported data about the speed of acquisition of
the sublexical correspondences.
&Nation,2011; Wang, Castles, & Nickels, 2012). In an or-
thographic learning study, participants are asked to learn new
words. These can be assigned either a predictable or an unpre-
dictable pronunciation. Both studies found that when the pro-
nunciation was unpredictable, semantic context facilitated
learning. This was not the case for predictable words, where
phonological decoding appeared sufficient for orthographic
learning. These findings raise questions about cross-
linguistic differences in learning to read as a function of un-
predictability of sublexical correspondences. It is possible that
children learning to read in a relatively unpredictable orthog-
raphy routinely rely to a greater extent on contextual cues
compared to children learning to read in a relatively predict-
able orthography. This would result in a qualitative shift in the
types of cognitive strategies that are used across orthographies
differing in predictability to establish orthographic represen-
As our framework proposes that orthographic depth does
not represent a single continuum, it contrasts with the view
that the ease of learning to read depends on the relative posi-
tion of ones orthography on this continuum (Seymour et al.,
2003). Our suggestion is consistent with some previous stud-
ies, which have found a non-linear relationship between read-
ing achievement and orthographic depth: specifically, once
English is removed, the correlation between reading outcomes
and orthographic depth disappears (Aro & Wimmer, 2003;
Whetton & Twist, 2003). According to our framework, these
results indicate that reading acquisition in English is impeded
both by the high degree of complexity and unpredictability.
The cross-linguistic comparisons of French and English con-
sistently show that French children fare better than English
children on word and non-word reading tasks (e.g., Goswami
et al., 1998; Seymour et al., 2003), which suggests that com-
plexity is not sufficient to account for the well-established lag
of English-speaking children. Future research is needed to
establish whether English children learning to read are pre-
dominantly hampered by the unpredictability of their orthog-
raphy, or whether complexity and unpredictability interact to
create a particularly high hurdle for children learning to read in
In summary, given the theories of reading acquisition, we
can assume that the two concepts of complexity and unpre-
dictability should affect cognitive processes during learning to
read in different ways. Looking purely at the development of
sublexical decoding skills, we expect that complexity would
slow down the speed of reading acquisition, whereas unpre-
dictability would reduce decoding accuracy, even after all the
correspondences have been learned. Furthermore, if children
are routinely faced with unpredictable words, it is possible that
they need to develop compensatory strategies to achieve high
reading accuracy and comprehension.
Behavioral studies of orthographic depth have beenconducted
for decades, and have shown that it affects the cognitive pro-
cesses underlying skilled adult reading (Frost et al., 1987;
Schmalzetal.,2014; Ziegler et al., 2001), the rate of reading
acquisition (Frith et al., 1998; Landerl, 2000; Seymour et al.,
2003; Wimmer & Goswami, 1994), the prevalence and symp-
toms of developmental dyslexia (Landerl et al., 1997; Paulesu
et al., 2001; Wimmer, 1993,1996), brain activation (Paulesu
et al., 2000; Richlan, 2014), and the strength of cognitive
predictors of reading ability (Caravolas et al., 2012;
Caravolas, Lervag, Defior, Malkova, & Hulme, 2013; Landerl
et al., 2013;Molletal.,2014; Vaessen et al., 2010; Ziegler
et al., 2010). Clearly, orthographic depth is an important and
relevant factor, both for practical and theoretical rea-
sons. Based on the existing evidence, we can be confi-
dent in concluding that orthographic depth affects read-
ing, but in order to learn more about why and how this
happens, a more precise definition of orthographic depth
is required. Such a definition is needed to (1) devise a
quantification method of linguistic characteristics of the
orthographies that is theoretically meaningful, and (2) to
use this quantification method for future cross-linguistic
research to isolate specific cognitive mechanisms that
are affected by the linguistic constructs.
We propose that orthographic depth is a conglomerate
of two separate concepts, namely the degree of complex-
ity and unpredictability of print-to-speech correspon-
dences in a given orthography. We have shown that, on
a linguistic level, the two concepts can be dissociated.
Furthermore, given the currently available models and
theories of reading, we also expect that each of the two
concepts would influence skilled reading and reading ac-
quisition in different ways. Thus, we argue that there are
many unanswered questions in the area of cross-linguistic
research relating to orthographic depth. These can be pur-
sued more effectively in the context of a systematic
framework for orthographic depth.
Acknowledgments We thanks Claudio Mulatti for providing informa-
tion about the Italian DRC. Furthermore, we are grateful to Naama
Friedmann and Anastasia Ulicheva for valuable discussions about ortho-
graphic depth and reading across languages, to Becky Treiman for
organising a talk at WUSTL, where parts of this paper were presented,
and to the anonymous reviewers for providing insightful comments on
earlier versions of the manuscript.
Wang et al. were working within a DRC framework and manipulated
GPC regulari ty. Taylor et al. were working within a more triangle-like
framework, and manipulated body consistency. Their results and conclu-
sions were strikingly similar.
