Screening Risk of Dyslexia through a Web-Game using
Language-Independent Content and Machine Learning
Maria Rauschenberger Ricardo Baeza-Yates Luz Rello
WSSC, DTIC Khoury College of Dept. of Information Systems
Universitat Pompeu Fabra & Computer Sciences and Technology
Max Planck Institute for Northeastern University at SV IE Business School
Software Systems firstname.lastname@example.org email@example.com
Children with dyslexia are often diagnosed after they fail
school even if dyslexia is not related to general intelligence.
In this work, we present an approach for universal screening
of dyslexia using machine learning models with data gathered
from a web-based language-independent game. We designed
the game content taking into consideration the analysis of
mistakes of people with dyslexia in diﬀerent languages and
other parameters related to dyslexia like auditory perception
as well as visual perception. We did a user study with 313
children (116 with dyslexia) and train predictive machine
learning models with the collected data. Our method yields
an accuracy of 0.74 for German and 0.69 for Spanish as
well as a F1-score of 0.75 for German and 0.75 for Spanish,
using Random Forests and Extra Trees, respectively. To
the best of our knowledge this is the ﬁrst time that risk of
dyslexia is screened using a language-independent content
web-based game and machine-learning. Universal screen-
ing with language-independent content can be used for the
screening of pre-readers who do not have any language skills,
facilitating a potential early intervention.
•Human-centered computing → Field studies; User
Empirical studies in accessibility;
and professional topics → People with disabilities;
•Software and its engineering → Interactive games;
Dyslexia; Detection; Pre-Readers; Serious Games; Web-based
Assessment; Universal Screening; Language-Independent; Vi-
sual; Auditory; Gamiﬁcation.
Dyslexia is a speciﬁc learning disorder which aﬀects from
5% to 15% of the world population [
]. Children with dyslexia
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proﬁt or commercial advantage and that copies bear this notice and the full citation
on the ﬁrst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission
and/or a fee. Request permissions from firstname.lastname@example.org.
W4A ’20, April 20–21, 2020, Taipei, Taiwan
c 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-7056-1/20/04. .. $15.00
are often diagnosed with spelling and reading errors or af-
ter school failure, even if dyslexia is not related to general
Generally, dyslexia manifestations can be observed when
children reach a certain age and literary knowledge. Current
approaches to screen (pre-)readers require expensive person-
nel, such as a professional therapist or special hardware such
as fMRI scans [
]. Previous research have studied signs of
dyslexia that are not related to reading and writing such as
visual perception, short-term memory, executive functions
or auditory perception [
]. These signs could be used to
screen potential dyslexia in pre-readers and our work shows
a possible approach for doing this by using machine learning
with data coming from a language-independent content inte-
grated in a web-based game. Our game has the potential of
being easily accessible, making parents aware the potential
risk of dyslexia to further look for more help, e.g., a medical
doctor or therapist.
The game and the user study is designed with the human-
centered design framework [
] to collect the data set. This is
relevant since collecting personal data is challenging because
of privacy and trust issues [2, 7]. As a result, the ﬁnal data
sets are small and small data makes the prediction with
machine learning models more diﬃcult. That is, there is
the risk of over-ﬁtting or having a data set too small to be
divided into meaningful test, training and validation sets.
We use standard machine learning classiﬁers like Random
Forest with and without class weights, Extra Trees and
Gradient Boosting from the Scikit-learn library for the
prediction of dyslexia with small data sets.
Our main contributions are the user study results and the
ﬁrst web-based game for screening risk of dyslexia, based on
language-independent content and using machine learning.
To gather the data of this study, we had participants already
diagnosed with dyslexia, instead of using pre-readers (younger
children), since that would have required a long-term study.
Our results show that the approach is feasible and that a
higher prediction accuracy is obtained for German than for
The rest of the paper is organized as follows: Section 2
covers the related work while Section 3 explains the rationale
behind the game design. In Section 4 we cover the study
methodology and in Sections 5 and 6 the predictive models
and their results. We discuss the results in Section 7 ﬁnishing
with conclusions and future work in Section 8.
Figure 1: Participants playing the visual part (left)
and the musical part (right) of MusVis. Photos in-
cluded with the adults’ permission.
2. RELATED WORK
Various applications and games to support, detect and
treat dyslexia have been developed [
]. Gamiﬁcation has
been used to design various use cases, applications as well as
]. Gamiﬁcation designs the game
play of games with game elements to engage and motivate
]. Games are developed to screen readers [
] using linguistic content and to screen pre-readers [
] focusing on the gameful experience. Only Lexa
] published an accuracy (89.2%) using features related
to phonological processing. However, they did not include
game elements, and features are collected with extensive
tests (extensive resources like time and cost). In addition,
the classiﬁcation is carried out on a small sample (n = 56),
without any validation and no precautions or discussions
Here we advance previous approaches by taking precautions
on over-ﬁtting, by not focusing on linguistic knowledge, and
by using the same game content for every language. This
will reduce the eﬀort and time to design diﬀerent content for
diﬀerent languages but more importantly, the content could
be used for pre-readers.
3. GAME DESIGN
The aim of our web-game called MusVis (see Figure 1)
is to measure the reaction of children with and without
dyslexia while playing, in order to ﬁnd diﬀerences on their
behavior. A video of MusVis is available at http://bit.ly/
MusVisContent. We designed our game with the assumption
that non-linguistic content like rhythm or frequency [
represent the diﬃculties that a child with dyslexia has with
writing and reading [
], and dyslexia can be measured
through the interaction of a person [
] like total number
of clicks or play duration.
An early game design and content was previously tested
with a ﬁve-user study [
] to eliminate major usability
mistakes which could have an inﬂuence on the prediction
and validated our game measurements [
]. In our pilot
study (n = 178), we found that there were four signiﬁcant
game measurements for Spanish, German, and English as
well as eight signiﬁcant game measurements for Spanish [
The game is implemented as a web application using
Figure 2: Example of the auditory part from the
game MusVis for the ﬁrst two clicks on two sound
cards (left) and then when a pair of equal sounds is
found (right). The participant is asked to ﬁnd two
equal auditory cues by clicking on sound cards.
Figure 3: Example of the visual part of the game
MusVis with the priming of the target cue symbol
(left) and the nine-squared design including the dis-
tractors for each symbol (right).
and a PHP server plus a MySQL database for the back-end.
One reason for this is simplicity for remote online studies.
Another reason is the advantage of adapting the application
for diﬀerent devices in future research studies.
We designed the language-independent game content tak-
ing into account the knowledge of previous literature select-
ing the most challenging content for people with dyslexia.
Therefore, we designed language-independent content with
auditory and visual cues.
We designed an auditory part (see Figure 2) and a vi-
sual part (see Figure 3) using features extracted from the
literature. The game play is diﬀerent due to the unequal
perception of auditory and visual cues but each part targets
in both cases general skills, e.g., short-term memory .
As is well known, children have more diﬃculty paying
attention over a longer period of time. Therefore, the two
parts have four stages which are counter-balanced with Latin
]. Each stage has two rounds, which sums up to
16 rounds in total for the whole game. Each stage ﬁrst has a
round with four cards and then with six cards, needing less
Key Name Description
CS Complex vs.
Children with dyslexia (DG) recall signiﬁcantly fewer items correctly in a lab study for long memory
spans . The rhythmic complexity did not have an eﬀect on the diﬀerence between DG and
children without dyslexia (CG) .
Pitch perception is essential for prosodic performance , is correlated to language development,
and can be used as a predictor for language .
Acoustic parameter diﬀerences in short tones (< 350 ms) are diﬃcult to distinguish for a person with
language diﬃculties .
Both groups showed signiﬁcant diﬀerences when comparing rise time . Rise time and prosodic
development are strongly connected and were shown to be most sensitive to dyslexia .
DG show deﬁcits in recalling the patterns of auditory cues [
]. However, rhythm modulations show
no eﬀect on the children performance .
DG show weaknesses in short-term memory tasks  when more items are presented . Also,
deﬁcits can be frequently observed for the short-term auditory memory span .
DG have diﬃculties with similar sounds and the phonological neighborhood when long memory spans
CAPS similarity eﬀect
are used .
Since the phonological grammar of music is similar to the prosodic structure of language, music (i.e.,
a combination of acoustical parameters) can be used to imitate these features [
]. DG are “reliably
impaired in prosodic tasks” .
Table 1: Description of the auditory attributes which show promising relations to the prediction of dyslexia.
SD RT Rh STM General
Goswami et al. X X X X X
Huss et al. X X X X
Johnson  X
Overy  X X X
Yuskaitis et al.  X X
Frequency X X X X X X
Length X X X X
Rise time X X X X X X
Rhythm X X X X X X
Table 2: Mapping of the evidence from literature
to distinguish a person with dyslexia, the attributes
and general assumptions, and the stages of the au-
ditory part of the game MusVis.
than 10 minutes to play. We aim to address participants’ mo-
tivation for both game parts with the design of the following
game mechanics frequently used in learning environments
]: rewards (points), feedback (instant feedback) or chal-
lenges (time limit), plus the game components (story for the
The content design, user interface, interaction and imple-
mentation for the auditory and visual parts of the game are
described in the following sections.
3.1 Auditory Game Design
The auditory part is inspired in the traditional game Mem-
ory in which pairs of identical cards (face down) must be
identiﬁed by ﬂipping them over [
]. We chose this game play
because it is a well-known children game and could be easily
transformed to use auditory cues. To create the auditory
cues, we used acoustic parameters; for example, to imitate
the prosodic structure of language which is similar to the
phonological grammar of music .
Musicians with dyslexia score better on auditory percep-
tion tests than the general population, but not on auditory
working memory tests [
]. Auditory working memory helps
a person to keep a sound in mind. We combined, for exam-
ple, the deﬁcits of children with dyslexia in auditory working
memory with the results on the short duration of sounds [
while taking the precaution of not measuring hearing ability
]. Each stage is assigned to one acoustic parameter like
frequency or rhythm which is designed with the knowledge
of the analysis from previous literature .
Therefore, we used the acoustic parameters frequency,
length, rise time and rhythm as auditory cues. Each au-
ditory cue was assigned to a game stage (see Table 2), which
we mapped to the attributes and literature references (see
Table 1) that provide evidence for distinguishing a person
For example, our rhythm stage uses the following char-
acteristics: complex vs. simple [
], sound duration,
], short-term memory [
], phonological simi-
larity eﬀect [
], and correlated acoustic parameters speech
Each acoustic stage has three auditory cues (we use MP3
for sound ﬁles). Each stage is assigned to one acoustic
parameter of sound, which is designed with knowledge of the
analysis from previous literature (e.g., frequency or rhythm).
The auditory cues are generated with a simple sinus tone
using the free software Audacity.
The exact parameters
of each auditory cue are already published [
] and the
auditory cues are available at GitHub.
] Each stage has
two rounds, with ﬁrst two and then three auditory cues that
must be assigned by choosing the same sound (see Figure
2). The arrangement of sounds (which auditory cue matches
which card) is random for each round.
3.2 Visual Game Design
The visual game play uses a Whac-A-Mole interaction
similar to the ﬁrst round of Dytective [
]. But instead of
using letter recognition as does Dytective, we used language-
independent visual cues. An example for letter recognition
would be ﬁnding the graphical representation of the letter
We adapted the interaction design and content for this
purpose (see Figure 3). For the visual game, we designed
cues that have the potential of making more cues with similar
features and represent horizontal and vertical symmetries
that are known to be diﬃcult for a person with dyslexia in
diﬀerent languages [35, 38, 51].
To create the visual cues, we designed diﬀerent visual
Audacity is available at http://audacity.es/, Last access:
We used the standard linguistic conventions: ‘<>’ for
graphemes, ‘/ /’ for phonemes and ‘[ ]’ for phones.
Figure 4: Overview of the designed visual cues. The
ﬁgure shows the target cue (top) and distractor cues
(below) for the four diﬀerent stages (z, symbol, rect-
angle, face) of the visual part of the game MusVis.
representations similar to visual features of annotated error
words from people with dyslexia [
] and designed
the game as a simple search task, which does not require
In the beginning, participants are shown the target visual
cues (see Figure 3, left) for three seconds. They are asked
to remember this visual cue. After that, the participants
are presented with a setting where the target visual cue
and distractors are displayed (see Figure 3, right). The
participants try to click on the target visual cue as often as
possible within a span of 15 seconds. The arrangement of the
target and distractor cues randomly changes after every click.
The visual part has four stages, which are counter-balanced
with Latin Squares . Each stage is assigned to one visual
type (symbol, z, rectangle, face) and four visual cues for each
stage are presented. One visual cue is the target, which the
participants need to ﬁnd and click (see Figure 4, top). The
other three visual cues are distractors for the participants.
Each stage has two rounds with ﬁrst a 4-squared and then
a 9-squared design (see Figure 3, right). The target and all
three distractors are displayed in the 4-squared design. In
the 9-squared design, the target is displayed twice as well as
distractors two and three. Only distractor one is displayed
4. USER STUDY METHODOLOGY
We use the human-centered design framework to design our
study and to collect the data for the prediction of dyslexia.
We conducted a within-subject design study (n = 313) which
means that all participants played all game rounds [
the same language-independent content. Only the game
instructions were translated into each native language.
Spanish participants diagnosed with dyslexia were mainly
recruited from public social media calls by non-proﬁt organi-
zations. We recruited German participants diagnosed with
dyslexia mainly over support groups on social media. Also,
some English speakers contacted us through this call as our
location is international. The control groups for Spanish and
German were recruited mostly with the collaboration of four
schools, two in each country.
4.1 Online Data Collection
Collecting data is costly in terms of time consumption and
privacy issues, specially if the data is related to education
and health. Therefore, we must make the best of the limited
]. In our case, we need a certain age range
to make sure a person with dyslexia is already diagnosed
and has not been fully treated yet. Since our collected data
is considered small data [
], we need to analyze them
accordingly, i.e., avoid over-ﬁtting using cross-validation
instead of training, test and validation sets as well as using
classiﬁers conﬁgured to avoid over-ﬁtting.
4.2 Procedure and Ethics Statement
First, the parents were informed about the purpose of
the voluntary study. Next, only after the parents gave the
consent, children were allowed to participate in this user study
from home or from school, with the ﬁrst author of this work
present or always available through digital communication.
The data collection for this user study has been approved
by the German Ministry of Education, Science and Cul-
ture in Schleswig-Holstein (Ministerium fur
¨ Bildung, Wis-
senschaft und Kultur) and Lower Saxony State Education
Authority (Nieders¨ orde). In Spain achsische Landesschulbeh¨
governmental approval was not needed in addition to the
If the study was conducted in a school or learning center,
the parents or the legal guardian consent was obtained in
advance and the user study was supervised by a teacher
or therapist. After the online consent form was approved,
we collected demographic data which was completed by the
participant’s supervisor (e.g., parent/teacher), including the
age of the participant, the dyslexia diagnosis (yes/no/maybe)
and the native language. We ask the participant’s supervisor
to only say YES for a participant if the child had an oﬃcial
diagnosis, for example from an authorized specialist or a
After that participants played both parts of the game. At
the end, two feedback questions are asked and the partici-
pant’s supervisor could leave contact details to be informed
about the results of the study. Personal information of the
participant’s supervisor such as name or email is not pub-
lished and is stored separately from the participants data,
if given. On the other hand, the name of the child is not
collected and all data is stored on a password secured web
The data includes only participants that completed all
16 rounds of the web game using a computer or a tablet.
Dropouts happened mostly because participants used a dif-
ferent browser (e.g., Internet Explorer instead of Google
Chrome) or a diﬀerent device (tablet instead of a computer).
For the predictive models, we took 313 participants into
account, including the 178 participants from the pilot study
]. To have precise data, we took out participants that
reported in the background questionnaire that they suspected
of having dyslexia but did not have a diagnosis (n = 48).
The remaining participants were classiﬁed as diagnosed
with dyslexia (DG) or not showing any signs of dyslexia
(control group, CG), as reported in the background question-
We separated our data into three data sets: one for the
Spanish participants (ES, n = 153), a second for the Ger-
Data set N n Dyslexia (DG)
age female male
DE 149 59 10.22 21 38
ES 153 49 9.47 26 23
ALL 313 116 9.77 50 66
Data set N n Control (CG)
age female male
DE 149 90 9.58 42 48
ES 153 104 9.99 58 46
ALL 313 197 9.76 103 94
Table 3: Overview of the participants per data set.
man participants (DE, n = 149), and one for all languages
(ALL, n = 313) in which we included participants that spoke
English (n = 11). Participants ranged in age from 7 to 12
years old. The users in the data sets are described in Table
3. Participants played the game either in English, German
or Spanish depending on their native language. We had
some bilingual participants (n = 48) in the Spanish data set
(Spanish and Catalan) since the media call was done from the
non-proﬁt organization ChangeDyslexia.
For these cases,
we used the language they reported to be more comfortable
with, which was used for the instructions of the game. We
do not use the native language, but rather the language the
game was played in as the criterion to split the data sets for
three reasons. First, the deﬁnition of a native language or
mother tongue can be made easily when a participant speaks
only one language. But this is not the case for bilingual par-
ticipants because they might not be able to choose, and then
we cannot distinguish the mother tongue or native language
]. Second, this question is a self-reported question
and every participant’s supervisor might deﬁne it diﬀerently
for each child. Finally, some bilingual speakers spoke similar
Latin languages (Spanish and Catalan). We consider these
participants in the ES data set, as the instructions of the
game were in Spanish.
4.4 Dependent Variables and Features
The participant features are detailed in Table 4 while the
dependent variables collected through the game are listed
in Table 5. These variables were used for the statistical
comparison of the pilot study and for the selection of the
features for the predictive models. Feature 3 was set with the
language selected for the instructions. Features 1, 2, 4 to 8
were answered with the online questions by the participants’
supervisor. Feature 9 was collected from the browser during
the study experiment.
We used the following dependent variables for the statisti-
Auditory game part
Duration round (milliseconds) starts when round
Duration interaction (milliseconds) starts after
the player clicks the ﬁrst time on a card in each
Average click time (milliseconds) is the duration
of a round divided by the total number of clicks.
Time interval (milliseconds) is the time needed
for the second, third, fourth, ﬁfth and sixth clicks.
1 Age It ranges from 7 to 12 years old.
2 Gender It is a binary feature, either with a female
or male value.
3 Language It is either Spanish, German or English.
4 Native It indicates if the language used for the
Language instructions is the ﬁrst language of the
participants, being Yes, No or Maybe.
It indicates if a participant plays a musical
instrument, being No, Yes, less than 6
months or Yes, over 6 months.
It indicates how well the participant knows
the visual Memory game, being
Participant gave no answer, Participant
does not known the game, Played once,
Played a few times or Played a lot.
It indicates the self-reported answer with a
6-level Likert scale  to the statement:
’the auditory part was easy for the
participants.’ The values are Answer
unknown, Strongly disagree, Disagree,
Undecided, Agree or Strongly Agree.
It indicates the self-reported answer of the
Visual Part statement: ’the visual part was easy for
the participants.’ (same Likert scale from
9 Device It is the device the participants used and
is a binary feature with the value
Computer or Tablet.
Table 4: Description of participant features.
Logic we deﬁne it as True when in a round the
ﬁrst three clicked cards are diﬀerent, otherwise, it
Instructions is the number of times the game
instructions were listened by the player.
Visual game part
• Number of hits is the number of correct answers.
Number of misses is the number of incorrect an-
Eﬃciency is the number of hits multiplied by the
total number of clicks.
Accuracy is the number of hits divided by the total
number of clicks.
Time to the ﬁrst click (milliseconds) is the dura-
tion between the round start and the ﬁrst user
Total number of clicks is the number of clicks
during a round.
We would like to further elaborate on the game measure-
ment Logic, which is based on the direct experience of the
user study. Some children may not have really listened to
the sounds and played logically. As each round is designed
such that the ﬁrst two clicks never match, if the participant
chooses for the third click a diﬀerent card, s/he is increas-
ing the chances of ﬁnding a match independent of the total
amount of cards.
Auditory features Visual features
10–17 Time to click. 106–113 Time to click.
18–25 Total clicks. 114–121 Total clicks.
26–33 Duration per round. 122–129 Correct answers.
130–137 Wrong answers.
42–49 Average click time. 138–145 Accuracy.
50–57 Logic. 146–153 Eﬃciency.
58–65 2nd click interval. 154–161 2nd click interval.
66–73 3rd click interval. 162–169 3rd click interval.
74–81 4th click interval. 170–177 4th click interval.
82–89 5th click interval. 178–185 5th click interval.
90–97 6th click interval. 186–193 6th click interval.
98–105 Instructions. 194–201 Time last click.
Table 5: On the left are features 10 to 105 for the
auditory part and on the right are features 106 to
201 for the visual part of the game MusVis.
The features for the data sets ALL, ES, and DE are the
same. Each data set has 201 features per participant, where
features 10 to 105 are the variables from the auditory part
and features 106 to 201 are the variables from the visual part
(see Table 5).
5. PREDICTIVE MODELS SETUP
In this section we present the machine learning techniques
used for the data sets ALL (n = 313), ES (n = 153), and DE
(n = 149). First, we explain the choice of predictive models
and then the feature selection.
5.1 Model Selection
We used Random Forest (RF), Random Forest with class
weights (RFW), Extra Trees (ETC), Gradient Boosting (GB),
and the Dummy Classiﬁer (Baseline), which are described
in the Scikit-learn version 0.21.2 [
]. We address the risk
of over-ﬁtting on our small data sets with 10-fold cross-
validation and the default parameters suggested in the Scikit-
learn library to avoid training a model by optimizing the
parameters speciﬁcally for our data [
]. While we have
small data, we are not optimizing the input parameters of
classiﬁers until we can hold out a test data set as proposed
by scikit-learn 0.21.2 documentation to evaluate the changes
] and to avoid biases [
]. To explore the best prediction
conditions we used the feature selection as described in the
5.2 Informative Features
We address the danger of selecting the correct features [
by taking into account the knowledge of previous literature
about the diﬀerences of children with an without dyslexia.
For example, since there are two theories of the cause of
dyslexia (visual vs. auditory [
]), we use subsets of visual and
auditory features to explore the inﬂuence on the classiﬁers.
We rank the most informative features with Extra Trees.
The results show a ﬂat distribution for all three data sets
and a step at the information score of 0.008: ALL (n = 33
features), ES (n = 41 features), and DE (n = 38 features).
The comparison of the most informative features reveals that
the data sets have only a few features in common, e.g., four
features for Spanish and German (Logic, 6th click interval,
total clicks, duration interaction) or only 16 features in ALL
compared to Spanish and German. Visual and auditory
features are equally represented in the ranking of the most
informative features; for example, ALL has 16 auditory
features and 14 visual features.
The biggest step in the informative ranking for all three
data sets is between the ﬁfth and sixth informative features,
e.g., for ALL the step is between the visual part (cue Z, 4
cards) Eﬃciency with the informative score of 0.0128 and
the auditory part (cue Rhythm, 6 cards), Time 5th click with
a score of 0.0104. The only dependent variables with the
same tendency are Number of misses and Total clicks from
the visual game part, but the features from the diﬀerent
rounds for the diﬀerent data sets are mainly not under the
33 informative features (ALL 2/16, ES 3/16 and DE 6/16).
We followed the same steps of the pilot study to compare
the statistical ﬁndings before giving the machine learning
6.1 Statistical Validation
The pilot study collected data from 178 participants (which
were later included into our current data set, n = 313) to
ﬁnd signiﬁcant diﬀerences on the game measurements [
Therefore, we apply ﬁrst the Shapiro-Wilk Test and then
the Wilcoxon Test since all game measures are not normally
distributed. We use the Bonferroni correction (p < 0.002) to
avoid type I errors. We present the results of the statistical
analysis for the validation data (n = 313) separated by
language and for all languages (see Table 6.1). Additionally,
we compare the statistical analysis results from the pilot-
study (n = 178) with the new data set (n = 313).
The ES data set (n = 153) has seven dependent variables
with signiﬁcant diﬀerences between groups: 4th click interval,
duration round, average click time, total number of clicks,
time to the ﬁrst click, number of hits, and eﬃciency. The
ES data set (n = 153) conﬁrmed the results of the pilot
study (n = 178). All other game measurements decreased the
signiﬁcance by slightly increasing the p-value (visual eﬃciency
from 4e − 5 to 1e − 4). The data set ES has seven signiﬁcant
variables that distinguish a person with or without dyslexia.
For the data set ALL (n = 313) we consider only dependent
variables with the same tendency as for the pilot study
(n = 178). We categorize the tendency (e.g., playing faster
or having more clicks) by the group (dyslexia compared
to control group) mean of the dependent variables within
the same language. ALL (n = 313) has two visual game
measurements (number of misses and total clicks) with the
same tendency while the pilot study had ﬁve for the visual
game (total clicks, time to the ﬁrst click, hits, accuracy, and
The DE data set (n = 149) conﬁrmed the results of the
pilot study (n = 57) with no signiﬁcant dependent variables.
The means of the dependent measurements for DE are all
very close (e.g., the time to the ﬁrst click is 2.58s for the
control group and 2.50s for the dyslexia group).
We can conﬁrm that misses did not reveal signiﬁcant dif-
ferences for German or Spanish, even though the tendency
is now the same for both languages. On the other hand, the
total number of clicks is still signiﬁcant.
To sum up, we conﬁrmed one signiﬁcant dependent variable
in ALL (n = 313), seven signiﬁcant dependent variables for
ES (n = 153), and no signiﬁcant dependent variables for DE
(n = 149).
Part Data set Variable Control
mean sd Dyslexia
mean d Mann-Whitney U
W p-value eﬀect
Table 6: Overview of dependent variables for visual (top) and auditory (below) features of MusVis.
Signiﬁcant results are in bold.
Model Data Feat. Recall Precis. F1 Acc.
RF DE 5 0.77 0.78 0.75 0.74
RFW DE 5 0.75 0.75 0.74 0.73
Baseline DE 0.60 0.37 0.46 0.50
ETC ES 20 0.76 0.76 0.75 0.69
RF ES 5 0.74 0.73 0.72 0.65
Baseline ES 0.68 0.46 0.55 0.50
GB ALL 20 0.66 0.65 0.65 0.61
GB ALL 5 0.64 0.64 0.63 0.59
Baseline ALL 0.63 0.40 0.49 0.50
Table 7: Best results of the diﬀerent classiﬁers, fea-
tures and data sets. Results are ordered by the best
F1-score and accuracy.
6.2 Predictive Results
We processed our data sets with diﬀerent classiﬁers and
diﬀerent subsets of features, following the description from
the previous section.
We computed the balanced accuracy for our binary clas-
siﬁcation problem to deal with imbalanced data sets; for
example, the ALL data set has dyslexia 37% vs. control 63%.
The Dummy Classiﬁer is computed for our imbalanced data
with the most frequent label and reported with the balanced
]. We do not apply over- or under-sampling to
address our imbalanced data because the variances among
people with dyslexia are broad, for example, diﬃculty level
or the individual causes for perception diﬀerences.
As described in the previous section the ranking of the infor-
mative features is diﬀerent for the three data sets. Hence, we
explore the inﬂuence of diﬀerent subsets of features, namely:
(1) all represented features (201 features); (2) the 5 most
informative features; (3) the 33 most informative features, as
this was the next natural informative subset; (4) 20 random
features selected from (3); and (5) 27 features that have
the same tendency and which have been answered by the
participants’ supervisors, because they are mainly not under
the most informative feature subsets (although total clicks is
signiﬁcant in the statistical comparison).
We report the two best F1-scores and balanced accuracy
scores for each data set as well as the baseline, as can be
seen in Table 7. We outperform our baseline for all data
sets. The best F1-score, 0.75, is achieved for both languages,
the DE and ES data sets. DE uses 5 features with RF and
ES uses ETC with 20 features. The second best F1-score,
0.74, is achieved with the DE data set using 5 features and
RFW. The best accuracy, 0.74, is achieved with RF while
the second best of 0.73 is achieved with RFW, both in the
DE data set using just 5 features.
For ES, the best F1-score is also 0.75 with ETC and the
selection of 20 features. The second best F1-score for ES is
0.72 with RF and a selection of 5 features. The F1-score is
reduced by 0.1 when combining the two data sets (DE and
ES), since the best F1-score for ALL is 0.65 using GB and 20
features. The second best F1-score for ALL is 0.63 with GB
and 5 features. For ES, the best accuracy is 0.69 with ETC
and the selection of 20 features. The second best accuracy
for ES is 0.65 with RF and a selection of 5 features. The
accuracy is reduced by nearly 0.1 when combining the two
data sets (DE and ES), since the best accuracy for ALL is
0.61 using GB and 20 features. The second best accuracy
for ALL is 0.59 with GB and 5 features. This shows that
there are diﬀerences across languages.
The normalized confusion matrix (see Figure 5) does not
show over-ﬁtting for the best results for DE, ES and ALL.
The fact that the best results are with few features imply
that the rest are highly correlated or noisy.
The reduction of features improves the accuracy for DE
but not consistently for ES and ALL, as can be seen for the
diﬀerent classiﬁers and data sets in Figure 6. For example,
reducing the features for DE improves the accuracy for ET,
Figure 5: Normalized confusion matrix for the three
best results (F1-score and accuracy): a) DE, 5 fea-
tures with RF ; b) ES, 20 features with ETC ; and c)
ALL, 20 features with GB.
RF, and RFW, but not for GB. For ES, the accuracy improves
only for RF and stagnates for RFW when reducing the
number of features, otherwise the accuracy inverts for ETC
and GB. For the data set ALL, RFW and RF improve but
ETC and GB decrease.
Most children with dyslexia show a varying severity of
deﬁcits in more than one area [
], which makes dyslexia more
a spectrum than a binary disorder. Additionally, we rely on
current diagnostic tools (e.g., DRT [
]) to select our
participant groups, which do not yet represent the diversity
of people with dyslexia. We accept that our participants have
a high variance because of the measurement of our current
diagnostic tools and the spectrum that dyslexia has.
7.1 Group Comparison
The measurement data taken from the game MusVis show
that Spanish participants with dyslexia behave diﬀerently
than their control group. Diﬀerences can be reported for
the auditory game part for: 4th click interval, duration,
and average click time. For the visual part, the following
measurements can be reported as indicators: total clicks,
time to the ﬁrst click, hits, and eﬃciency.
We can show with our results over all languages that
the eﬀect for each measurement is conﬁrmed even if we
cannot draw strong conclusions about our sample size on the
comparison of German vs. Spanish speaking participants.
Spanish had eight signiﬁcant indicators in the pilot study
and we expected to reproduce the same number of signiﬁcant
indicators with more German participants.
In general, all participants found the game easy to un-
derstand, and only children at the age of 12 complained
about missing challenges. The amount of positive feedback
and engagement of all age groups let us conclude that the
game mechanics and components applied are also positive
for perceiving MusVis as a game and not as a test.
Dyslexia is known to be present across diﬀerent languages
and cultures [
]. The assumption that the tendencies for
the indicators are similar over all languages cannot (yet) be
proven for all indicators in our study (e.g., German partic-
ipants with dyslexia start to click faster than the Spanish
participants compared to their language control group in
the auditory part). We can exclude external factors such as
diﬀerent applications or study setups as possible inﬂuences
on this opposite tendency. According to the results, we
may have to assume that not all indicators for dyslexia
are language-independent and that some have cultural
Figure 6: The plot shows the relation of accuracy to
features for all classiﬁers in the data set ALL (left),
ES (middle) and DE (right).
dependencies, or we have omitted variable bias. To conﬁrm
this assumption, we will need to obtain larger numbers of
participants for both language groups (Spanish and German)
or investigate further measurements (indicators).
The variables time to ﬁrst click (visual and auditory)
and total number of clicks (visual and auditory) provide
dependencies of the game content and game design. Oth-
erwise, we could not explain the trend diﬀerence between
the auditory and visual parts for total number of clicks
(i.e., total clicks for visual is signiﬁcantly diﬀerent than for
auditory). Additionally, the analysis of the auditory game
part presents one limitation: participants could select a
correct pair by chance, e.g., participants could click through
the game board without listening to the sounds.
Children with dyslexia are detected by their slower reading
]. Therefore, we designed our
game with content that is known to be diﬃcult for children
with dyslexia to measure the errors and duration. Never-
theless, from previous literature we knew that children with
dyslexia do not make more mistakes in games than the con-
trol group [
]. We can conﬁrm that misses did not reveal
signiﬁcant diﬀerences for German or Spanish either. It might
be possible that we cannot compare errors in reading and
writing with errors in this type of game. Then, we cannot
explain (yet) why the Spanish control group made more mis-
takes than the Spanish group with dyslexia. It might also
be possible that participants with dyslexia show generally
diﬀerent behavior that is separated from the content but
depends on the game play.
Spanish children without dyslexia take signiﬁcantly more
time to ﬁnd all pairs and ﬁnish the auditory game part. Chil-
dren without dyslexia take more time before they click the
ﬁrst time (visual) for all languages. This might be due to
the time they need to
process the given auditory infor-
] or recall the auditory and visual information
from short-term memory [
]. However, participants with
dyslexia from the German group are nearly as fast as the
control group in ﬁnding all pairs (auditory) which might be
due to cultural diﬀerences (e.g., more musical training).
The auditory and visual cues are designed on purpose to
be more diﬃcult to process for people with dyslexia than
without. Therefore, children with dyslexia are expected to
need more time (duration), which might be due to a
distinctive encoding of prosody
] and is in line with
the indicator of slower reading. Considering that children
with dyslexia need more time to process information, we
observe this behavior as well for our indicators. For example,
participants with dyslexia from the Spanish group take more
time on the 4th click interval and also on the average click
time compared to the control group. Both results are signiﬁ-
cant and have medium eﬀect sizes of 0.29, so we can estimate
what the eﬀects would be in the whole population .
A person with dyslexia has diﬃculties with reading and
writing independent of the mother tongue, which also appear
when learning a second language [
]. The analysis of
errors from children with dyslexia show similar error cate-
gories for Spanish, English [
], and German [
similarities of perception between the languages.
Our results from the pilot study [
] suggest that we
can measure a signiﬁcant diﬀerence on four indicators for
the visual game with the same tendency between Spanish,
German, and English. With all our data (n = 313), we can
conﬁrm just one signiﬁcant dependent variable with the same
tendency for Spanish and German.
Still this means that people with dyslexia might perceive
our visual game content similarly, independent of the mother
tongue. Further research needs to be done to conﬁrm the
results, but this validation study provides strong evidence
that it will be possible to screen dyslexia with our con-
tent, approach, and game design using the same language-
independent content for diﬀerent languages.
7.2 Screening Differences
Our approach aims to screen dyslexia with indicators that
do not require linguistic knowledge. These indicators are
probably not as strong or visible as the reading and spelling
mistakes of children with dyslexia. Therefore, we consider
our results (highest accuracy of 0.74 and highest F1-scores
of 0.75) for German with Random Forest as a promising way
to predict dyslexia using language-independent auditory and
visual content for pre-readers.
Having an early indication of dyslexia before spelling or
reading errors appear can have a positive impact on the
child’s development, as we can intervene earlier in her/his
education. Therefore, we aim to optimize the recall and
F1-score by ﬁnding as many participants with dyslexia as
We have set ourselves this goal because early detection
in a person with dyslexia has a greater positive eﬀect on
the person with dyslexia than a misjudgement in a person
without dyslexia. However to avoid over-ﬁtting we did not
modify the default value for the threshold (typically 0.5),
something that we plan to study in the near future as we need
to increase recall for the dyslexia class keeping a reasonable
number of false positives.
If a person with dyslexia is not discovered (early), they
are prone to face additional issues such as anxiety, sadness
and decreased attention [
]. Also, a person with dyslexia
needs around two years to compensate for their reading and
spelling diﬃculties. Early treatment among children at risk
of dyslexia as well as children without dyslexia can serve,
both, as a preventive measure and as early stimulation of
Our results support the hypothesis that dyslexia cannot
be reduced to one cause, but is rather a combination of
]. The equal distribution of auditory and
visual features in the informative features ranking supports
the hypothesis of dyslexia being related to auditory and
visual perception in diﬀerent people. We might be able to
measure stronger eﬀects when we design visual and auditory
cues that have more attributes related to dyslexia, including
some that favor the latter.
The ALL data set reached only an accuracy of 0.61, which
might be due to the following reasons. First, the informative
features for each data set are diﬀerent from each other, which
indicates diﬀerent informativeness in German and Spanish.
Combining the data sets into ALL probably adds noise for the
prediction, which results in a lower accuracy. The noise might
be that features are not as informative anymore because
they cancel each other out as they are highly correlated.
In addition, reducing the features only to the features with
the same tendency as used for the statistical analysis did not
reveal any improvement, which supports the hypothesis that
features in ALL cancel each other out.
The results of our current game measures with 313 par-
ticipants conﬁrm diﬀerences in the behavior of Spanish vs.
German participants (i.e., (1) seven signiﬁcant dependent
variables in Spanish vs. none in German and (2) only two
dependent variables with the same tendency over all lan-
These results might be explained by bilingualism. It is
argued that a person who speaks more than one language
has more knowledge of their ﬁrst language than a monolin-
gual person [
], and it is unclear whether this also has an
inﬂuence on “how people perceive diﬀerences as well”. Addi-
tionally, dyslexia detection diﬀerences are reported for trans-
parent (like Spanish) vs. deep (like English) orthographies
(quoted after [
]). In a transparent orthography mainly a
single grapheme (letter) corresponds to a single phoneme
(sound) and dyslexia is reported to be more distinct in deep
If so, this might explain the diﬀerence we have in the
signiﬁcance for the statistical analysis as well as the tendency
of values, and the need for separate models to predict dyslexia
for our German vs. Spanish data set (Spanish has bilingual
Overall, having fewer features improves the accuracy, but
this is less so when we run experiments for ALL or ES. There,
the inﬂuence of the diﬀerent informative features for ES and
DE seem to cancel each other out. The high correlation
between features would explain why, for example, taking
into account 27 features (GB) performs no better than using
20 features (GB) for the ALL data set. The fact that the
accuracy does not increase when more features are used
supports the argument that features are highly correlated.
As described before, small data can help to understand the
data and results better. In our case, we see that ALL does
not perform as well as ES or DE. This is probably due to the
facts described above (e.g., bilingualism, features canceling
each other, English-speaking participants). The prediction
for dyslexia is therefore possible with the data taken from
the same game, but needs diﬀerent models for the prediction
in diﬀerent languages as was proposed by [
], something that
made sense in retrospect.
8. CONCLUSIONS AND FUTURE WORK
We processed our game data with Extra Trees, Random
Forest without and with class weights, and Gradient Boost
to predict dyslexia using a data set of 313 participants. We
reached the best accuracy of 74% for the German case using
RF while the best accuracy for Spanish was 69% using ETC.
Our approach can optimize resources for detecting and
treating dyslexia, however, it would need at the beginning
more personnel to screen many more children at a young
age to enlarge our training data. As children with dyslexia
need around two years to compensate their diﬃculties, our
approach could help to decrease school failure, late treatment
and most importantly, to reduce suﬀering for children and
The main advantage of our language-independent content
approach is that has the potential to screen pre-readers in
the near future. Indeed, we aim to collect more data with
younger children to improve our results.
Future work includes improving our machine learning mod-
els and do further feature analysis. More explainable models
should also be considered.
This paper and content were partially funded by the
fem:talent Scholarship from the Applied University of Em-
den/Leer as well as by the Deutschen Lesepreis 2017 from
the Stiftung Lesen and the Commerzbank-Stiftung. First, we
would like to thank all teachers, students, and parents from
the state of Lower Saxony for their participation and time!
Special thanks goes to one class and one teacher which can-
not be named due to the anonymous regulations. We deeply
thank for their support L. Alb´o, Barcelona; ChangeDyslexia,
Barcelona; M. Jes´us Blanque and R. No´e L´opez, school Hi-
jas de San Jos´e, Zaragoza; A. Carrasco, E. M´endez and S.
Tena, innovation team of school Leonardo da Vinci, Madrid;
in Spain, and L. Niemeier, Fr¨
obel Bildung und Erziehung
utzige GmbH, Berlin; E. Prinz-Burghardt, Lernthera-
peutische Praxis, Duderstadt; L. Klaus, Peter-Ustinov-Schule,
Eckernf¨ orde; orde; H. Marquardt, Gorch-Fock-Schule, Eckernf¨
M. Batke and J. Thomaschewski, Hochschule Emden/Leer,
Emden; N. Tegeler, Montessori Bildungshaus Hannover
gGmbH, Hannover; Y. Schulz, Grundschule Heidgraben, Hei-
dgraben; T. Westphal, Leif-Eriksson-Gemeinschaftsschule,
Kiel; F. Goerke, Grundschule Luetjensee, Luetjensee; B.
Wilke, Schule am Draiberg, Papenburg; P. St¨
raMentis, Rheine; A. Wendt, Grundschule Seth, Seth; K.
Usemann, OGGS Meyerstraße, Wuppertal; in Germany.
We also thank all parents and children for playing MusVis.
Finally, thanks to H. Witzel for his advice durithe develop-
ment of the visual part and to M. Blanca, and M. Herrera
for the translation of the Spanish version.
American Psychiatric Association. Diagnostic and Sta-
tistical Manual of Mental Disorders. American Psychi-
atric Association, London, England, May 2013.
R. Baeza-Yates. Big, small or right data: Which is the
proper focus? https://www.kdnuggets.com/2018/10/
A. Bandhyopadhyay, D. Dey, and R. K. Pal. Predic-
tion of Dyslexia using Machine Learning — A Research
Travelogue, volume 24. Springer Singapore, 2018.
D. W. Black, J. E. Grant, and American Psychiatric
Association. DSM-5 guidebook: The essential companion
to the Diagnostic and statistical manual of mental dis-
orders, ﬁfth edition. American Psychiatric Association,
5th edition edition, 2016.
C. Coleman, N. Gregg, L. McLain, and L. W. Bellair.
A Comparison of Spelling Performance Across Young
Adults With and Without Dyslexia. Assessment for
Eﬀective Intervention, 34(2):94–105, 2008.
G. De Zubicaray and N. O. Schiller. The Oxford hand-
book of neurolinguistics. Oxford University Press, New
York, NY, 2018.
J. J. Faraway and N. H. Augustin. When small data
beats big data. Statistics & Probability Letters, 136:142–
145, May 2018.
H. Fastl and E. Zwicker. Psychoacoustics. Springer
Berlin Heidelberg, Berlin, Heidelberg, third edition,
Field et al. How to design and report experiments. SAGE
Publications, London, 2003.
O. Gaggi, C. E. Palazzi, M. Ciman, G. Galiazzo,
S. Franceschini, M. Ruﬃno, S. Gori, A. Facoetti,
O. Gaggi, C. E. Palazzi, G. Galiazzo, S. Franceschini,
S. Gori, and A. Facoetti. Serious Games for Early Identi-
ﬁcation of Developmental Dyslexia. Comput. Entertain.
Computers in Entertainment, 15(4):1–24, Apr 2017.
L. Geurts, V. Vanden Abeele, V. Celis, J. Husson, L. Van
den Audenaeren, L. Loyez, A. Goeleven, J. Wouters,
and P. Ghesqui`ere. DIESEL-X: A Game-Based Tool
for Early Risk Detection of Dyslexia in Preschoolers.
In Describing and Studying Domain-Speciﬁc Serious
Games, pages 93–114. Springer, Switzerland, 2015.
U. Goswami, L. Barnes, N. Mead, A. J. Power, and
V. Leong. Prosodic Similarity Eﬀects in Short-Term
Memory in Developmental Dyslexia. Dyslexia, 22(4):287–
M. Grund, C. L. Naumann, and G. Haug. Diagnostischer
¨ 5. Klassen: DRT 5 (Diagnostic
spelling test for ﬁfth grade: DRT 5). Deutsche Schultests.
Beltz Test, G¨
ottingen, 2., aktual edition, 2004.
J. Hamari, J. Koivisto, and H. Sarsa. Does Gamiﬁcation
Work? – A Literature Review of Empirical Studies
on Gamiﬁcation. In 2014 47th Hawaii International
Conference on System Sciences, pages 3025–3034. IEEE,
T. Helland and R. Kaasa. Dyslexia in English as a
second language. Dyslexia, 11(1):41–60, Feb 2005.
M. Huss, J. P. Verney, T. Fosker, N. Mead, and
U. Goswami. Music, rhythm, rise time perception and
developmental dyslexia: Perception of musical meter
predicts reading and phonology. Cortex, 47(6):674–689,
ISO/TC 159/SC 4 Ergonomics of human-system inter-
action. Part 210: Human- centred design for interactive
systems. In Ergonomics of human-system interaction,
volume 1, page 32. International Organization for Stan-
dardization (ISO), Brussels, 2010.
A. Jain and D. Zongker. Feature selection: evaluation,
application, and small sample performance. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
D. J. Johnson. Persistent auditory disorders in young
dyslexic adults. Bulletin of the Orton Society, 30(1):268–
276, Jan 1980.
I. Kecskes and T. Papp. Foreign Language and Mother
Tongue. Psychology Press, New York, 1 edition, Jun
annel, G. Schaadt, F. K. Illner, E. van der Meer,
and A. D. Friederici. Phonological abilities in literacy-
impaired children: Brain potentials reveal deﬁcient
phoneme discrimination, but intact prosodic process-
ing. Developmental Cognitive Neuroscience, 23:14–25,
A. Mora, D. Riera, C. Gonzalez, and J. Arnedo-Moreno.
A Literature Review of Gamiﬁcation Design Frameworks.
In 7th International Conference on Games and Virtual
Worlds for Serious Applications, 2015.
J. Nielsen. Why You Only Need to Test with 5 Users.
Jakob Nielsens Alertbox, 19(September 23):1–4, 2000.
J. Nijakowska. Dyslexia in the foreign language class-
room. Multilingual Matters, 2010.
K. Overy. Dyslexia, Temporal Processing and Music:
The Potential of Music as an Early Learning Aid for
Dyslexic Children. Psychology of Music, 28(2):218–229,
E. Paulesu, L. Danelli, and M. Berlingeri. Reading the
dyslexic brain: multiple dysfunctional routes revealed
by a new meta-analysis of PET and fMRI activation
studies. Frontiers in human neuroscience, 8:830, 2014.
A. Poole, F. Zulkernine, and C. Aylward. Lexa: A tool
for detecting dyslexia through auditory processing. 2017
IEEE Symposium Series on Computational Intelligence,
SSCI 2017 - Proceedings, 2018-January:1–5, 2018.
R. F. Port. Meter and speech. Journal of Phonetics,
M. Rauschenberger, C. Lins, N. Rousselle, S. Fudickar,
and A. Hain. A Tablet Puzzle to Target Dyslexia Screen-
ing in Pre-Readers. In Proceedings of the 5th EAI Inter-
national Conference on Smart Objects and Technologies
for Social Good - GOODTECHS, pages 155–159, Valen-
M. Rauschenberger, L. Rello, and R. Baeza-Yates. A
Tablet Game to Target Dyslexia Screening in Pre-
readers. In MobileHCI’18, pages 306–312, Barcelona,
2018. ACM Press.
M. Rauschenberger, L. Rello, and R. Baeza-Yates. Tech-
nologies for Dyslexia. In Y. Yesilada and S. Harper,
editors, Web Accessibility Book, volume 1, pages 603–
627. Springer-Verlag London, London, 2 edition, 2019.
M. Rauschenberger, L. Rello, R. Baeza-Yates, and J. P.
Bigham. Towards language independent detection of
dyslexia with a web-based game. In W4A ’18: The
Internet of Accessible Things, pages 4–6, Lyon, France,
M. Rauschenberger, L. Rello, R. Baeza-Yates, E. Gomez,
and J. P. Bigham. Supplement: DysMusicMusicalEle-
ments: Towards the Prediction of Dyslexia by a Web-
based Game with Musical Elements, June 2017.
M. Rauschenberger, L. Rello, R. Baeza-Yates, E. Gomez,
and J. P. Bigham. Towards the Prediction of Dyslexia
by a Web-based Game with Musical Elements. In The
Web for All conference Addressing information barriers
– W4A’17, pages 4–7, Perth, Western Australia, 2017.
M. Rauschenberger, L. Rello, S. uchsel, and F¨
J. Thomaschewski. A language resource of german errors
written by children with dyslexia. In Proceedings of the
Tenth International Conference on Language Resources
and Evaluation (LREC 2016), Paris, France, May 2016.
European Language Resources Association (ELRA).
M. Rauschenberger, A. Willems, M. Ternieden, and
J. Thomaschewski. Towards the use of gamiﬁcation
frameworks in learning environments. Journal of Inter-
active Learning Research, 30(2), 2019.
L. Rello, R. Baeza-Yates, A. Ali, J. P. Bigham, and
M. Serra. Predicting risk of dyslexia with an online
gamiﬁed test. arXiv preprint arXiv:1906.03168, V.1:1–
13, jun 2019.
L. Rello, R. Baeza-Yates, and J. Llisterri. A resource of
errors written in Spanish by people with dyslexia and
its linguistic, phonetic and visual analysis. Language
Resources and Evaluation, 51(2):1–30, Feb 2016.
L. Rello, M. Ballesteros, A. Ali, M. Serra, D. Alar-
c´on, and J. P. Bigham. Dytective: Diagnosing Risk of
Dyslexia with a Game. In Pervasive Health 2016, pages
89–96, Cancun, Mexico, May 2016. ACM Press.
L. Rello, E. Romero, M. Rauschenberger, A. Ali,
K. Williams, J. P. Bigham, and N. C. White. Screening
Dyslexia for English Using HCI Measures and Machine
Learning. In Proceedings of the 2018 International Con-
ference on Digital Health - DH ’18, pages 80–84, New
York, New York, USA, 2018. ACM Press.
A. D. Ritzhaupt, N. D. Poling, C. A. Frey, and M. C.
Johnson. A Synthesis on Digital Games in Education:
What the Research Literature Says from 2000 to 2010. Jl.
of Interactive Learning Research, 25(2):263–282, 2014.
R. Rouse. Game Design: Theory and Practice, Second
Edition: Theory and Practice, Second Edition. Word-
ware Publishing, Inc., 2004.
G. Schulte-K¨ Diagnostik und Therapie der Lese-orne.
orung (The prevention, diagnosis, and
treatment of dyslexia). Deutsches Arzteblatt interna-
tional, 107(41):718–727, 2010.
G. Schulte-K¨ uller, C. Guten-orne, W. Deimel, K. M¨
brunner, and H. Remschmidt. Familial Aggregation of
Spelling Disability. Journal of Child Psychology and
Psychiatry, 37(7):817–822, Oct 1996.
Scikit-learn. 3.1. Cross-validation: evaluating estimator
cross validation.html, 2019.
Scikit-learn Developers. Scikit-learn Documentation.
K. Seaborn and D. I. Fels. Gamiﬁcation in theory and
action: A survey. International Journal of Human
Computer Studies, 74:14–31, 2015.
C. Steinbrink and T. Lachmann. Lese-
Rechtschreibst¨ (Dyslexia). Berlin orung Springer
P. Tallal. Improving language and literacy is a matter of
time. Nature reviews. Neuroscience, 5(9):721–728, 2004.
S. Varma and R. Simon. Bias in error estimation when
using cross-validation for model selection. BMC Bioin-
formatics, 7:91, Feb 2006.
T. R. Vidyasagar and K. Pammer. Dyslexia: a deﬁcit in
visuo-spatial attention, not in phonological processing.
Trends in Cognitive Sciences, 14(2):57–63, 2010.
 Wikipedia. Memory (Spiel) (Memory Game), 2019.
C. J. Yuskaitis, M. Parviz, P. Loui, C. Y. Wan, and P. L.
Pearl. Neural Mechanisms Underlying Musical Pitch
Perception and Clinical Applications Including Develop-
mental Dyslexia. Current neurology and neuroscience
reports, 15(8):51, 2015.