A Language Resource of German Errors Written by Children with Dyslexia
, Luz Rello
, Silke F
Web Research Group, Universitat Pompeu Fabra
Human-Computer Interaction Institute, Carnegie Mellon University
University of Applied Science Emden/Leer
Maria.Rauschenberger@upf.edu, email@example.com, firstname.lastname@example.org, email@example.com
In this paper we present a language resource for German, composed of a list of 1,021 unique errors extracted from a collection of
texts written by people with dyslexia. The errors were annotated with a set of linguistic characteristics as well as visual and phonetic
features. We present the compilation and the annotation criteria for the different types of dyslexic errors. This language resource has
many potential uses since errors written by people with dyslexia reﬂect their difﬁculties. For instance, it has already been used to design
language exercises to treat dyslexia in German. To the best of our knowledge, this is ﬁrst resource of this kind in German.
Keywords: Written Errors, Errors, Dyslexia, Visual, Phonetics, Resource, German
Dyslexia is a speciﬁc learning disability with neurologi-
cal origin. It is characterized by difﬁculties with accurate
and/or ﬂuent word recognition and by poor spelling and de-
coding abilities. These difﬁculties typically result from a
deﬁcit in the perception of visual and auditory components
General misspells have already proven to be a useful source
of knowledge for various applications (Gelman and Bar-
letta, 2008; Piskorski et al., 2008; Baeza-Yates and Rello,
2012). A list of annotated errors of children with dyslexia in
German is a useful resource because the errors that people
with dyslexia make reﬂect the types of difﬁculties that they
have (Sterling et al., 1998). As a matter of fact, these type
of written errors have been used for various purposes such
as studying dyslexia (Arag
on and Silva, 2000; Connelly et
al., 2006), diagnosing dyslexia (Schulte-K
orne et al., 1996;
Toro and Cervera, 1984), for build tools to treat
create applications to support dyslexia, such as dyslexia
screeners (Rello et al., 2016b), spellcheckers (Korhonen,
2008; Pedler, 2007; Rello et al., 2015), text prediction soft-
or spelling exercises (Rauschenberger et al., 2015;
Rello et al., 2014b). There are similar errors resources for
English (Pedler, 2007) and Spanish (Rello et al., 2014a;
Rello et al., 2016a) but, to the best of our knowledge, this
is ﬁrst resource of this kind in German.
There are similar errors resources for English (Pedler,
2007) and Spanish (Rello et al., 2014a; Rello et al., 2016a)
but, to the best of our knowledge, this is ﬁrst resource of
this kind in German.
In this paper, we present the creation of a new resource
composed of German errors written by people with dyslexia
that did not exist before. This involved the collection and
the annotation of the errors with different kind of infor-
mation, such as, phonological and visual information; and
the creation of new categories speciﬁcally for German lan-
Dyseggxia is available at https://itunes.apple.
Penfriend XL is available at http://www.penfriend.
guage. The annotation criteria had to be adapted for Ger-
man because it is a language with a different orthography
and syllabic structure (Seymour et al., 2003). The resource
of dyslexic errors are available on-line.
2. Collecting Errors
We collected 47 texts (homework exercises, dictations, and
school essays) written by students from 8 to 17 years old. In
Figure 1 we show an example of a handwritten text from a
10 year-old boy with dyslexia. We kept collecting texts un-
til we reached 1,000 written errors by people with dyslexia.
Previous research have shown that around thousand errors
are enough to extract for useful conclusions (Pedler, 2007;
Rauschenberger et al., 2015; Rello et al., 2014b).
A total of 32 texts came from children who have been di-
agnosed with dyslexia. The remaining 15 texts came from
students with a high spelling error rate that were chosen by
their teachers. The students attended either primary school,
comprehensive school (Gesamtschule), high school (Gym-
nasium) or a school for children with learning difﬁculties
3. Error Classiﬁcation
We analyzed the errors and deﬁne two more error categories
speciﬁc to German: capital letter and non-capital letter er-
rors. The rest of the errors were consistent with Pedler’s
classiﬁcation of dyslexic errors (Pedler, 2007).
categories are the following:
– Substitution. Changing one letter for another, for ex-
– Insertion. An insertion of one letter, such as *muttig
– Omission. An omission one letter, as in *zusamen
The resource is available at http://goo.gl/LRaUDA .
Examples with errors are preceded by an asterisk ‘*’. We use
the standard linguistic conventions: ‘<>’ for graphemes, ‘/ /’ for
phonemes and ‘[ ]’ for phones.
Hallo ich bin Till *Tieger.
Hallo ich bin Till Tiger.
‘Hi, I am Till Tiger.’
Ich bin klein und dünn. Ich *kannn nicht gut brüllen.
Ich bin klein und dünn. Ich kann nicht gut brüllen.
‘I am small and thin. I cannot roar good.’
Mama *Tieger ist groß und *Stag und kannn tolle *gechichten ärzälen.
Mama Tiger ist groß und stark und kann tolle Geschichten erzählen.
‘Tiger Mum is tall and strong and can tell fantastic stories.’
Papa *Tieger ist *muttig und *nimmand kann so laut *brulen wie er.
Papa Tiger ist mutig und niemand kann so laut brüllen wie er.
‘Tiger Dad is brave and nobody can roar as load as he.’
*manchmal bin ich glücklich.
Manchmal bin ich glücklich.
‘Sometimes I am happy.’
*Sonntag morgens *Früchtugen wir alle *zusamen *in Bett.
Sonntag morgens Frühstücken wir alle zusammen im Bett.
‘Sundays, we all have breakfast in the bed.’
Dann bin *ick *Glücklich. Wir Grümeln und Knudeln und machen eine
Dann bin ich glücklich. Wir krümeln und knuddeln und machen eine
‘Then I am happy. We crumble and cuddle and make apillow fight’
Ich bin auch *Glücklich wenn ich eine *Schöne *Blumme sehe und an ihr
Ich bin auch glücklich wenn ich eine schöne Blume sehe und an ihr rieche.
‘I am happy too, if I see a beautiful flower and I smell her.’
Wenn ich mit meiner *Freunden *Miear ein *lager im Wald *bauee habe ich
Wenn ich mit meiner Freundin Mia ein Lager im Wald baue, habe ich Spaß.
‘I have fun, when I am making a camp in the woods with Mia.’
Figure 1: Example of a handwritten text of a 10 year-old boy with dyslexia (left) and its transcription in German and
– Transposition. Reversing the order of two letters, for
example Porblem (Problem, ‘problem’).
– Multi-errors. They differ in more than one letter from
the target word such as *Stag (stark, ‘strong’).
– Word boundary errors. They are run-ons and split
words. A run-on is the result of omitting a space, such
argern, ‘don’t tease’). A split
word occurs when a space is inserted in the middle of
a word, such as Vogel futter (Vogelfutter, ‘bird food’).
– Capital Letter. In German nouns are written with
capital letters, while other kinds of words like verbs,
adjectives or articles are not (Stang, 2010). For exam-
ple *geschichten (Geschichten, ’stories’).
– Wrong Capital Letter. As explained before, it is con-
fusing for children to decide which word has to be
written with or without capital letters. For instance,
verbs, adjectives or articles are not written with capi-
tal letters as in *Gl
4. Error Annotation
We annotated each of the word-error pairs with linguistic
features and created new categories for German. Each of
the word-error pairs was enriched with meta data and was
classiﬁed as the following:
– Unique numbering: uniﬁed number to distinctly
identify the data.
– Target word: word the person aimed to write.
– Misspelled word: the wrongly written word.
– Damerau-Levenshtein distance: the minimum num-
ber of edits (insertion, deletion, substitution, transpo-
sition) required to change the misspelled error into the
(target) correct word (Damerau, 1964; Levenshtein,
– Target and misspelled word frequencies: deﬁned as
the number of hit counts in a major search engine
the frequencies of the target and misspelled word. The
search engine does not distinguished between non-
capital and capital letters. Therefore words which only
The Levenshtein distance (Levenshtein, 1965) is the mini-
mum number of substitutions, insertions and deletions to trans-
form one string into another. The Damerau version (Damerau,
1964) counts a transposition as a single error instead of two er-
Here we refer to all web pages written in German and not
only web pages from Germany. For determining whether a web
page was written in German, we used Google Advanced Search
settings (http://www. google.com/advanced search).
differ through a capital letter the same frequency e.g.
*hubschrauber (Hubschrauber, ’helicopter’) have.
– Target and misspelled length: number of characters
the target word and the error word have.
– Error position: the position in the target word where
the error occurs.
– Syllable error: the position of the syllable in the tar-
get word where the error occurs.
– Target word syllables: number of syllables.
– Target syllable: the structure of the syllable where
the error occurs, such as C(onsunant)V(owel), CVC,
or CCV, among others.
– Type of error: The errors were tagged according to
the classiﬁcation presented in Section 3.
– Real word: this Boolean attribute records if the er-
ror produced another real word. For example Schal
(’scarf’) and Schall (’sound’).
– First letter error: this Boolean attribute records if the
error is produced in the ﬁrst letter of the word, for in-
– Last letter error: this Boolean attribute records if the
error is produced in the last letter of the word such in
dan (dann, ’then’).
– Correct Letter and Error Letter: The correct letter
is the letter that was mistaken in the correct word by
the Error Letter.
4.1. Visual Features
For each target and error grapheme we annotate the letters
involved in the error with the following visual information,
considering handwritten text (Table 1).
Four handwriting alphabets are commonly used in Ger-
man schools (Topsch, 2005; Bartnitzky, 2010). These are
the Lateinische Ausgangsschrift, Vereinfachte Ausgangss-
chrift, Schulausgangsschrift and Grundschrift. In some
states there is one mandatory alphabet to be used by the
school, while in other states schools can decide. For our
method we choose the Lateinische Ausgangsschrift (Top-
sch, 2005), shown in Figure 2, because it is commonly used
in schools where the texts were collected.
• Mirror letter: Boolean attribute that indicates if the
mirror of a letter produces another letter, such as <d>
and <b> or <m>, and <w>.
• Rotation: Boolean attribute that indicates if the rota-
tion of a letter produces another letter, such as <d>
• Fuzzy letters: Boolean attribute that indicates if the
letter has similar visual letters (not due to rotate or
mirror) such as <s> and <z>.
The syllables where checked with http://www.duden.
Mirror Yes = <b, p, d, q, m, w, u, n, v, H>
No = rest of letters
Rotation Yes = <b, g, h, y, p, d, H>
No = rest of letters
Fuzzy Yes = <a, o,
o, b, d, g, h, m, n,
p, q, u, v, w, y>
Table 1: Visual features of the annotated target and error
Figure 2: Lateinische Ausgangsschrift, a handwriting al-
phabet commonly used in German schools.
4.2. Phonetic Features
Each of the error words were tagged using a scale inspired
by the error analysis of the DRT (Grund et al., 2004). This
scale is based on traditional articulatory phonetic features
(International Phonetic Association, 1999) and is divided
into the following categories.
• Sound distinction. This category has two values:
similar sound errors, e.g. *eingebackt [‘aIngebakt]
(eingepackt [‘aIngepakt], ’wrapped’) and different
sound errors, e.g. *T
usch [tyS] (Tisch [tIS], ’table’).
• Sound sequence. The category has three values: er-
ror words with missing phonemes, e.g. *M
archen, ’fairy tale’); added phonemes, e.g. *Spieln
(Spiel, ’game’) or transposition of letters, e.g. *Por-
blem (Problem, ’problem’).
• Combination of consonants. Some consonants are
pronounced in a different way when they are com-
bined with each other. For example the consonant
<s> [s] and <p> [p] are pronounced like [Sp] when
they are written together.
• Words with <v>. Words written with a <v> since its
sound correspondence is not transparent in German,
• Umlaut. There are three umlauts in the German lan-
o>. The dots are often missing in texts.
• Double consonant / false double consonant. Af-
ter a short, stressed vowel there are usually two or
more consonants following. If there is only one con-
sonant following, this one should be doubled most of
the times, e.g. *vergesen (vergessen, ’forget’). Double
consonants also appear at syllable boundaries. This
category include false double consonants and double
consonants in the wrong place, such as *Unffall (Un-
• Lengthening. There are different types of lengthening
for a vowel in German. This process gives as a result
a long stressed vowel. The long vowel <i> [i:] is
frequently lengthened with an e which is not audible.
A typical error is *wider (wieder [Vi:d5], ’again’).
About 20% of the long, stressed vowels are length-
ened with a <h>, e.g. *erz
’tell’) (Grund et al., 2004). A few long stressed vowels
are lengthened by double vowels like *Hare (Haare
[‘ha:r@], ’hair’). This category include false length-
ening errors produced by adding <e> or <h> after a
long stressed vowel in the wrong place, e.g. *w
are [‘vE:r@], ’would be’).
• Derivation. Related words that are often written the
same way or similar but pronounced different. To
write these words in the right way, one possibility is to
have a look at the plural form so that the right writing
can be derived, e.g. Walt (Wald [valt]; W
Words with <s/B>. A word with a voiced [s] is al-
ways written with an <s>. Words with a voiceless [s]
have speciﬁc rules which determine if they have to be
written with <s> <ss> or <B>, e.g. *Reisverschlus
In this paper we have presented the compilation and the an-
notation criteria of a list of 1,021 unique errors written by
people with dyslexia in German. The adaptation of a Span-
ish based method to the German language raised a number
of challenges. For instance, the handwriting systems taught
in schools in Germany are different from the Spanish ones,
so the visual features needed to be redeﬁned. We annotated
each of the word-error pairs with linguistic features and two
new error categories were specially created for the German
language. We are planning to use the resource for the detec-
tion of dyslexia in German (Rauschenberger, 2016) using
We deeply thank Janka Melgert-Retelsdorf and all children
for making the collection of textes possible. We also thank
Hendrik Witzel for helping with the phonetic error annota-
on, L. E. and Silva, A. (2000). An
de un instrumento para detectar errores de tipo disl
(IDETID-LEA) (Qualitative analysis of an instrument
to detect dyslexic errors, IDETID-LEA). Psicothema,
Baeza-Yates, R. and Rello, L. (2012). On measur-
ing the lexical quality of the web. In The 2nd Joint
WICOW/AIRWeb Workshop on Web Quality, pages 1–6,
Bartnitzky, H. (2010). Grundschrift - damit kinder besser
schreiben lernen. In Grundschulverband, editor, Grund-
schulverband aktuell, volume 110, pages 4–8. Grund-
Connelly, V., Campbell, S., MacLean, M., and Barnes, J.
(2006). Contribution of lower order skills to the writ-
ten composition of college students with and without
dyslexia. Developmental Neuropsychology, 29(1):175–
Damerau, F. J. (1964). A technique for computer detection
and correction of spelling errors. Communications of the
Gelman, I. A. and Barletta, A. L. (2008). A “quick
and dirty” website data quality indicator. In The 2nd
ACM Workshop on Information Credibility on the Web
(WICOW ’08), pages 43–46, Napa Valley, USA.
Grund, M., Naumann, C. L., and Haug, G. (2004). Di-
agnostischer Rechtschreibtest f
ur 5. Klassen: DRT 5 ;
Manual. Deutsche Schultests. Beltz Test, G
aktual. auﬂ. in neuer rechtschreibung edition.
International Phonetic Association. (1999). Handbook of
the International Phonetic Association: A guide to the
use of the International Phonetic Alphabet. Cambridge
University Press, Cambridge.
Korhonen, T. (2008). Adaptive spell checker for dyslexic
writers. In Proceedings of the 11th international con-
ference on Computers Helping People with Special
Needs (ICCHP ’08), pages 733–741, Berlin, Heidelberg.
Levenshtein, V. (1965). Binary codes capable of correcting
spurious insertions and deletions of ones. Problems of
Information Transmission, 1:8–17.
Pedler, J. (2007). Computer Correction of Real-word
Spelling Errors in Dyslexic Text. Ph.D. thesis, Birkbeck
College, London University.
Piskorski, J., Sydow, M., and Weiss, D. (2008). Exploring
linguistic features for web spam detection: a preliminary
study. In Proceedings of the 4th International Workshop
on Adversarial Information Retrieval on the Web (AIR-
Web ’08), pages 25–28, New York, NY. ACM Press.
Rauschenberger, M., F
uchsel, S., Rello, L., Bayarri, C.,
and Thomaschewski, J. (2015). Exercises for German-
Speaking Children with Dyslexia. In Human-Computer
Interaction–INTERACT 2015, pages 445–452. Springer
Rauschenberger, M. (2016). Dysmusic: Detecting
dyslexia by web-based games with music elements. In
Proc. Web4All’16, Montreal, Canada. ACM Press.
Rello, L., Baeza-Yates, R., and Llisterri, J. (2014a). Dys-
List: An annotated resource of dyslexic errors. In Pro-
ceedings of the 9th International Conference on Lan-
guage Resources and Evaluation (LREC 2014), pages
1289–1296, Reykjavik, Iceland, May.
Rello, L., Bayarri, C., Otal, Y., and Pielot, P. (2014b). A
computer-based method to improve the spelling of chil-
dren with dyslexia using errors. In Proc. The 16th In-
ternational ACM SIGACCESS Conference of Computers
and Accessibility (ASSETS 2014), Rochester, USA, Oc-
Rello, L., Ballesteros, M., and Bigham, J. (2015). A
spellchecker for dyslexia. In Proc. ASSETS’15, Lisbon,
Portugal. ACM Press.
Rello, L., Baeza-Yates, R., and Llisterri, J. (2016a). A
resource of errors written in spanish by people with
dyslexia and its linguistic, phonetic and visual analysis.
Language Resources and Evaluation.
Rello, L., Ballesteros, M., Ali, A., Serra, M., Alarc
and Bigham, J. P. (2016b). Dytective: Diagnosing risk
of dyslexia with a game. In Proc. Pervasive Health’16,
orne, G., Deimel, W., M
uller, K., Gutenbrunner,
C., and Remschmidt, H. (1996). Familial aggregation
of spelling disability. Journal of Child Psychology and
Seymour, P. H. K., Aro, M., and Erskine, J. M. (2003).
Foundation literacy acquisition in European orthogra-
phies. British Journal of Psychology, 94(2):143–174.
Stang, C. (2010). Duden, Deutsche Rechtschreibung.
Praxis kompakt. Dudenverl, Mannheim and Leipzig and
Wien and Z
Sterling, C., Farmer, M., Riddick, B., Morgan, S., and
Matthews, C. (1998). Adult dyslexic writing. Dyslexia,
Topsch, W. (2005). Grundkompetenz Schriftspracher-
werb: Methoden und handlungsorientierte Praxisanre-
gungen, volume Bd. 5 of Beltz P
adagogik. Beltz, Wein-
heim and Basel, 2.,
uberarb. und erw. auﬂ edition.
Toro, J. and Cervera, M. (1984). TALE: Test de An
de Lectoescritura (TALE: Literacy Analysis Test). Visor,