Screening Dyslexia for English Using HCI Measures and
Carnegie Mellon University
Department of Computer Science
Universitat Politècnica de Catalunya
Web Science and Social Computing
Universitat Pompeu Fabra
University of Washington
Carnegie Mellon University
Jerey P. Bigham
HCI & LTI Institutes
Carnegie Mellon University
Nancy Cushen White
Department of Pediatrics
University of California San Francisco
San Francisco, USA
More than 10% of the population has dyslexia, and most are di-
agnosed only after they fail in school. This work seeks to change
this through early detection via machine learning models that pre-
dict dyslexia by observing how people interact with a linguistic
computer-based game. We designed items of the game taking into
account (i) the empirical linguistic analysis of the errors that peo-
ple with dyslexia make, and (ii) specic cognitive skills related to
dyslexia: Language Skills, Working Memory, Executive Functions,
and Perceptual Processes. . Using measures derived from the game,
we conducted an experiment with 267 children and adults in order
to train a statistical model that predicts readers with and without
dyslexia using measures derived from the game. The model was
trained and evaluated in a 10-fold cross experiment, reaching 84.62%
accuracy using the most informative features.
•Computers and Society
nologies for persons with disabilities;
Dyslexia, screening, early detection, diagnosis, linguistics, serious
games, machine learning
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from email@example.com.
DH’18, April 23–26, 2018, Lyon, France
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6493-5/18/04.. .$15.00
ACM Reference Format:
Luz Rello, Enrique Romero, Maria Rauschenberger, Abdullah Ali, Kristin
Williams, Jerey P. Bigham, and Nancy Cushen White. 2018. Screening
Dyslexia for English Using HCI Measures and Machine Learning. In DH’18:
2018 International Digital Health Conference, April 23–26, 2018, Lyon, France.
ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3194658.3194675
More than 10% of the population has dyslexia [
]. The DSM-V
] denes dyslexia as a specic learning disorder with a neurologi-
cal basis. According to the World Federation of Neurology it occurs
in children who, despite conventional classroom experience, fail to
attain the language skills of reading, writing, and spelling commen-
surate with their intellectual abilities [
]. In summary, dyslexia
is frequent, universal and related to school failure. However, it re-
mains under-diagnosed. For instance, in the UK, a country that
eectively treats dyslexia as compared with other countries, only
5% of the individuals with dyslexia are diagnosed and given appro-
priate help, and [
]. It is estimated that over 85% of adult illiterates
have dyslexia .
Yet, early detection is crucial for addressing dyslexia and ef-
fective remediation. Often, students are under-diagnosed because
current procedures for diagnosis are expensive [
] and require
professional oversight [
]. Our goal is for anyone to know as
early as possible if they might have dyslexia in an inexpensive way.
To achieve this goal, we have created a computer game that
records a wide variety of web-page interaction measures to screen
dyslexia for English. We conducted a user study with 267 partici-
pants to collect data to train a machine learning model that is able
to correctly determine if a person has (or does not have) dyslexia
with 84.62% accuracy.
2 BACKGROUND AND RELATED WORK
The complexity of administering paper-based diagnostic tools, and
the time they require, have led educators to turn towards computer
based screening methods to derive a quick assessment.
2.1 Commercial Software
Among the available commercial software to detect dyslexia in
English there is Lexercise Screener [
] and Nessy [
]. We could
not nd studies behind these commercial applications, although
they are widely used in practice. To our knowledge, they are not
based on a machine learning model predictive of dyslexia, and we
have not found publication of their accuracy.
2.2 Computer Based Games
There are a number of computer games designed to screen for
dyslexia, but they do not use machine learning models.
Lyytinen et al. [
] created the computer game Literate, later
called GraphoGame [
], to identify children at risk in Finland. The
game was tested with 12 and 41 children between 6 and 7 years old
-with statistically signicant dierences.
There are three other on-going projects for early risk detection
of dyslexia that have not yet reported signicant results yet: one
approach for Italian tested with 24 pre-schoolers [
], and a language-
independent approach MusVis evaluated with German, English and
Spanish children [19, 24].
2.3 Machine Learning Approaches
Machine learning approaches to predict dyslexia are more recent. In
2015, the rst method to screen dyslexia in Spanish was introduced;
it used eye-tracking measures from 97 subjects (48 with dyslexia)
]. Later in 2016, eye-tracking measures were also used to predict
dyslexia for Swedish (185 subjects, 97 of then with high-risk of
]. Both methods used Support Vector Machines. Another
study detected dyslexia subtypes in the Hebrew language using
data derived from existing medical records .
The only approach we are aware of to predict risk of dyslexia
using features derived from computer-based measures is the game
Dytective for Spanish. The screener, Dytective, was rst evaluated
with 343 people (95 with diagnosed dyslexia) and attained 83% accu-
racy in a held-out test set with 100 participants using Support Vector
]. Later, the model was improved by applying a neu-
ral network model (Long Short-Term Memory Networks (LSTMs)
]) to a larger dataset–4,335 participants (763 with professional
dyslexia diagnosis)–attaining 91.97% accuracy [
]. This model was
integrated into a free online tool Dytective which has been used
over 100,000 times.
An earlier study piloted Dytective’s screening
measures with 60 English speaking children and found the feature
set was promising, but the study did not fully incorporate machine
learning methods .
We advance these approaches by (i) extending these methods
to the English language and (ii) include a wider number of items
targeting cognitive indicators predictive of dyslexia.
3 USER STUDY
We conducted a within-subject study (267 participants) with all
participants exposed to the same linguistic items integrated into
an online game Dytective.
3.1 Procedure and Ethics Statement
Participants completed the experiment remotely, through a com-
puter at home, school, or in a specialized center in the USA (mainly
from the states of Pennsylvania, New York and Texas). All partici-
pants agreed to participate through an online consent form, and chil-
dren provided assent along with their parent or legal guardian fol-
lowing protocols approved by our institutional review board (IRB).
Parents/legal guardians were specically warned that they could
not help their children complete the study exercises. When schools
and specialized centers oversaw participation, parental/legal guardian
consent was obtained in advance, and the study was supervised by
the school counselor or therapist.
The rst part of the study consisted of a questionnaire collecting
demographic data. This questionnaire was completed by the par-
ticipant’s supervisor (school counselor or therapist) in cases when
the participant was under 18 years of age. Then, following oral
instructions, participants were given 20 minutes to complete the
We recruited 267 participants from one specialized center, three
schools, and from individuals with dyslexia who knew about our
study through our public call online.
Subjects ranged in age from 7 to 60 years old. We classied
these participants into three groups. Of the participants, 52 were
diagnosed with dyslexia -Class D (dyslexia)- (28 female, 24 male,
31) and 206 without a diagnosis of dyslexia
served as a control group -Class N (Not-Dyslexia)- (94 female, 112
11). There were 9 participants at risk of
having dyslexia or suspected of having dyslexia -Class M (Maybe)-
(4 female, 5 male, M=17.66,SD =16.17).2
The rst language of all participants was English, although 84
participants spoke another language (mostly Spanish in the Texas
area). A total of 224 participants reported having trouble with lan-
guage classes at school.
3.3 Dependent Measures
Participants’ performance was measured using the following de-
pendent measures for each of the exercises: (i) Number of Clicks
per item; (ii) Hits (i.e., the number of correct answers); (iii) Misses
(i.e., the number of incorrect answers); (iv) Score (i.e., the sum of
correct answers for each stage’s problem type); (v) Accuracy (i.e.,
the number of Hits divided by the number of Clicks; and (vi) Miss
Rate (i.e., the number of Misses divided by the number of Clicks).
We later used these performance measures together with the
demographic data as features of our prediction model’s dataset (see
All were either adults or children under observation by professionals, the step before
having an ocial diagnosis.
Language Skills Working Memory
Alphabetic Awareness Visual (alphabetical)
Phonological Awareness Auditory (phonology)
Syllabic Awareness Sequential (auditory)
Lexical Awareness Sequential (visual)
Morphological Awareness Executive Functions
Syntactic Awareness Activation and Attention
Semantic Awareness Sustained Attention
Orthographic Awareness Simultaneous Attention
Visual Discrimination and Categorization
Auditory Discrimination and Categorization
Table 1: Indicators used for the design of the test items.
We integrated test items into a software game to serve as the pri-
mary material of our study.
3.4.1 Design and Implementation. Dytective is a cross-platform
and a MySQL database. It was designed with a high level of abstrac-
tion to make it easily portable for future native implementations.
The interface design of the game implements the guidelines that,
according to the latest ndings in accessibility research, ensure
the best on-screen text readability for this target group. Text is
presented in black using a mono-spaced typeface Courier and a
minimum font size of 14 points .
3.4.2 Playing Dytective. At each phase, the player’s goal is to
accumulate points by solving a linguistic problem type as many
times as possible in a 25-second time window. For example, the
player hears the target, non-word crench and then a board is shown
on screen containing the target non-word as well as distractors that
are particularly dicult for people with dyslexia to dierentiate
(See Figure 1 (a)). After each time window, the player continues on
to the next item corresponding to a new linguistic problem type.
3.4.3 Content Design. The test items are composed of a set of
attention and linguistic exercises addressing three or more of the
following indicators belonging to dierent types of Language Skills,
Working Memory, Executive Function, and Perceptual Processes. These
indicators are related to dyslexia [5, 6, 27].
The exercises were designed according to linguistic knowledge
and the expertise of dyslexia therapists (specic to the English
language). In addition, to assist item selection (exercises) we used
the following criteria:
linguistic analyses of 833 confusion sets
, created from the
errors of people with dyslexia writing in English ; and
Performance measures from the linguistic exercises of an
online game called Piruletras.
This game is part of previous
work targeting children with dyslexia to improve spelling
performance . We selected exercises that were more
challenging for the players (those with higher error rates
and need for more time to be solved) since those exercises
were more likely to manifest dyslexia diculties.
A confusion set is a small group of words that are likely to be confused with one
another–such as weather and whether.
The dataset is composed of 226 features per participant (i.e., total of
60,342 data points. Each participant from the dataset was marked
as Dif the participant has dyslexia, Nif not, and M(maybe) if the
participant suspects that he or she has dyslexia but is not diagnosed.
From the dataset we extracted the following features:
of the participant. A binary feature with two values,
female and male.
2 Age of the participant ranging from 7 to 60 years old.
3 Second language
. A binary feature with two values, no and
yes, when the participant had a second language in case of
4 Language subject
. This is a binary feature with two values,
no and yes, when the participant declares that she has trouble
with language classes at school.
; they cor-
respond with the six dependent measures (Clicks, Hits, Misses, Score,
Accuracy, and Missrate) per level played (37 levels).
These features target some of the skills presented in Table 1. Note
that all the exercises involve attention, so all these features target
the executive functions
activation and attention
. In addition, some of them also target
when the participant pays attention to a number of
sources of incoming information at the same time.
These features are performance measures related to
visual discrimination and catego-
. For these tasks the participant hears the name of a
letter, e.g., d, and identies it from among the distractors (or-
thographic and phonetically similar letters, e.g. b, q, p) within
a time frame, using a Whac-A-Mole-style game interaction.
These features relate to
tory discrimination and categorization
. The participant
listens to the sound (phoneme) of a letter and identies it
from among distractors. For example, the participant hears
the phoneme /n/ and then a board is shown containing the
target <n> as well as distractors. We use distractors that are
particularly dicult for people with dyslexia to dierentiate
(i.e., other phonemes that share phonetic features, such as.
nasal and sound consonants.
These features target
discrimination and categorization
. The players hear the
pronunciation of a syllable (e.g., /prin/) and identify its
spelling from among orthographic distractors <pren> <prein>,
<prain>, <prean>, and <pryn>.
These features correspond to a set of exercises where par-
ticipants identify a word’s spelling after hearing its pro-
nunciation (e.g., /greet/ by discriminating among phoneti-
cally and orthographically similar words and/or non-words
(e.g., <create>, <greate>, <great>, <grete>, <greit>, <creet>,
<crete>, <creat>. These features target
auditory working memory
tion and categorization.
These performance features correspond to exercises target-
visual discrimination and categorization
, by requir-
ing participants to nd as many dierent letters as possible
Figure 1: Screenshots of the exercises requiring the player to click on the target non-word listed among the distractors; (a)
select the dierent letter; (b) build a correct word by substituting a letter, (c) selecting a letter, (d) or deleting a letter (e).
within a time frame in a visual search task (e.g. E/F, g/q, c/o,
b/d or p/q). See Figure 1, exercise (b).
These features were extracted from a set of exercises requir-
ing players to listen to a non-word and choose its spelling
(e.g. /lurled/) from among distractors (e.g. <rurled>, <larled>,
<lurded>, <lurleb>, <lorled> ). These features target
tial auditory working memory
nation and categorization. See Figure 1, exercise (a).
These performance features target
; They are derived from exer-
cises requiring participants to supply a missing letter [161-
166] or delete an extra letter in a target word [167-172]. See
Figure 1, exercises (d) and (e), respectively.
These performance features target
. They are collected from exercises re-
quiring participants to nd a morphological error in a sen-
tence when there is also a semantic error. For example, in
the sentence, The aect of the wind was to cause the boat’s
sails to billow. (The word aect should be eect).
These features relate to
nd an error in a sentence related to a grammatical or func-
tion word that changes, (e.g., of instead of on in “Smoking is
prohibited of the entire aircraft”).
This set of features relates to
. These exercises require to nd
an error in a sentence and correct it by choosing a letter
from a set of distractors. See Figure 1, exercise (c).
This set of features -
pants to rearrange letters to spell a real word (e.g., b e c
uase) or to rearrange syllables to spell a real word (e.g., /na/
/na/ /ba) -
ortho- graphic awareness
This set of features, addressing
requires players to separate
words to make a meaningful sentence, e.g. Change sher-
anupthehill to e.g. she ran up the hill.
This set of features targets
sequential visual working
visual discrimination and categorization
since they are gathered from exercises where players see a
Precision – Class D (Dyslexia) 63.76%
Recall – Class D (Dyslexia) 80.24%
Precision – Class N (Not-Dyslexia) 93.88%
Recall – Class N (Not-Dyslexia) 85.83%
Table 2: Classier accuracy in the cross validation experi-
ment, using the optimized feature set.
sequence of letters for 3 seconds and then write the sequence
discriminating targets from distractors.
This set of features
and requires participants to listen and
write a word (e.g., /make/) or targets
phonological awareness (221-
and requires participants to listen and write a non-word
5 RESULTS AND DISCUSSION
To determine whether it is feasible to detect whether a user may
have dyslexia, we set up a machine learning experiment. We car-
ried out an experiment with a binary classier of LIBSVM [
the Gaussian Support Vector Machine (SVM) setup. An SVM is a
method for supervised machine learning that analyzes data and
nds patterns for classication. As other Machine Learning algo-
rithms, given a set of training examples, each marked as belonging
to a category, an SVM training algorithm builds a model that assigns
new examples into the categories. The particular bias of SVMs is
that of constructing a hyperplane (either in the original space or in
a transformed one) for the classication output. This hyperplane
is constructed by combininig the original input examples with the
aim of maximizing the functional margin. Our SVM is trained on
the dataset as the one described in Section 4.
We performed a 10-fold cross validation experiment by dividing
the data into 10 dierent roughly equal subsets (10% of the data
in each subset). Then, we trained a statistical model on the rest of
the data (90%) and tested on the corresponding fold by iterating
10 times; at the end, all data was tested independently. We used
10-fold cross validation because it is normally recommended for
smaller datasets when a single train-development test split might
not be informative enough.
We randomized the data and used stratied sampling to ensure
a similar distribution of data categories in all folds. Participants
marked as M (Maybe) were assigned to the class D (Dyslexia). Out-
liers’ values in the number of Clicks and Misses were limited to a
maximum xed value. Subsequently, the data were scaled to zero
mean and unit variance.
We analyzed the data for features whose distributions were dif-
ferent between dyslexic and non-dyslexic participants. To that end,
aKolmogorov-Smirnov test was performed. The number of Hits and
Misses showed dierent distributions for a number of exercises.
Table 2 shows the accuracy of the SVM model (Gaussian kernel).
This result suggests that the model is able to predict players with
dyslexia quite accurately with a nal result of 80.24% by using a
subset of informative features. Note that the baseline (the percent-
age of subjects assigned to the class Dyslexia in the data set) is
The most informative features were a set of 10 features composed
of Hits and Misses,Misses being the most informative ones at an
individual level. These features are performance measures belong-
ing to exercises that target
Visual Discrimination and Categorization
Auditory Discrimination and Categorization
. More con-
cretely, these features come from exercises where the participant
was required to map (or associate) a letter name or a letter sound
with a grapheme (letter or letters). This is consistent with previous
literature on dyslexia that focus on the decit on the phonological
component in dyslexia [26, 27].
6 CONCLUSIONS AND FUTURE WORK
We presented a method to screen for risk of dyslexia among Eng-
lish speakers that combines machine learning and web-based in-
teraction data collected from a linguistic game. The method was
evaluated with 267 participants and attained 84.62% accuracy on
its prediction. These results build on earlier ndings from the rst
version of Dytective , where only Spanish was considered.
These results should be taken as preliminary, since the model
was trained on a small dataset. Further experiments with more
participants under other less controlled conditions are needed. Our
next step will be to conduct a large-scale study. With positive results,
we will integrate the model in a tool to screen risk of dyslexia
online. Since estimations of dyslexia are much higher than the
actual diagnosed population, we believe this method has potential
to make a signicant impact.
This paper was developed under a grant from the US Department
of Education, NIDRR grant number H133A130057; and a grant from
the National Science Foundation (#IIS-1618784).
We thank the Valley Speech Language and Learning Center in
Brownsville in Texas, and the schools Winchester Thurston School
(Pittsburgh, PA), and Ellis School (Pittsburgh, PA). . We thank the vol-
unteers who participated (Adam Brownold, Elsa Cárdenas-Hagan,
Anne Fay, Susan Freudenberg) in supervising the participants. Thanks
to Lola Álvarez, Susanne Burger, Debbie Meyer, and Regina Rash
for their help with recruiting participants
American Psychiatric Association. 2013. Diagnostic and statistical manual of
mental disorders, (DSM-V). American Psychiatric Publishing, Arlington, VA.
Mattias Nilsson Benfatto, Gustaf Öqvist Seimyr, Jan Ygge, Tony Pansell, Agneta
Rydberg, and Christer Jacobson. 2016. Screening for Dyslexia Using Eye Tracking
during Reading. PloS one 11, 12 (2016), e0165508.
Elizabeth Carrow-Woolfolk. 1995. OWLS, Oral and Written Language Scales. NCS
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector
machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–
27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
F. Cuetos and F. Valle. 1988. Modelos de lectura y dislexias (Reading models and
dyslexias). Infancia y Aprendizaje (Infancy and Learning) 44 (1988), 3–19.
Robert Davies, Javier Rodríguez-Ferreiro, Paz Suárez, and Fernando Cuetos. 2013.
Lexical and sub-lexical eects on accuracy, reaction time and response duration:
impaired and typical word and pseudoword reading in a transparent orthography.
Reading and Writing 26, 5 (2013), 721–738.
Dyslexia Research Institute. 2015. Dyslexia, Identication. http://www.
dyslexia-add.org/issues.html. ( January 2015).
Angela Fawcett and Rod Nicolson. 2004. The Dyslexia Screening Test: Junior
(DST-J). Harcourt Assessment.
Ombretta Gaggi, Giorgia Galiazzo, Claudio Palazzi, Andrea Facoetti, and Sandro
Franceschini. 2012. A serious game for predicting the risk of developmental
dyslexia in pre-readers children. In Proc. ICCCN’12. IEEE, 1–5.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
Computation 9, 8 (1997), 1735–1780.
Interagency Commission on Learning Disabilities. 1987. Learning Disabilities: A
Report to the U.S. Congress. Government Printing Oce, Washington DC.
Yair Lakretz and Michal Rosen-zvi. 2015. Probabilistic Graphical Models of
Dyslexia. In Proc. SIGKDD’15. 1919–1928.
Lexercise. 2016. Dyslexia Test - Online from Lexercise. http://www.lexercise.
com/tests/dyslexia-test. (2016). [Online; accessed 18-September-2017].
Heikki Lyytinen, Jane Erskine, Janne Kujala, Emma Ojanen, and Ulla Richardson.
2009. In search of a science-based application: A learning tool for reading
acquisition. Scandinavian journal of psychology 50, 6 (2009), 668–675.
Heikki Lyytinen, Miia Ronimus, Anne Alanko, Anna-Maija Poikkeus, and Maria
Taanila. 2007. Early identication of dyslexia and the use of computer game-based
practice to support reading acquisition. Nordic Psychology 59, 2 (2007), 109.
Dennis L Molfese. 2000. Predicting dyslexia at 8 years of age using neonatal
brain responses. Brain and language 72, 3 (2000), 238–245.
Nessy. 2011. Dyslexia Screening - Nessy UK. https://w ww.nessy.com/uk/product/
dyslexia-screening/. (2011). [Online; accessed 18-September-2017].
J. Pedler. 2007. Computer Correction of Real-word Spelling Errors in Dyslexic Text.
Ph.D. Dissertation. Birkbeck College, London University.
Maria Rauschenberger, Luz Rello, Ricardo Baeza-Yates, Emilia Gomez, and Jef-
frey P. Bigham. 2017. Towards the Prediction of Dyslexia by a Web-based Game
with Musical Elements. In W4A’17. 4–7. https://doi.org/10.1145/3058555.3058565
L. Rello and R. Baeza-Yates. 2013. Good Fonts for Dyslexia. In Proc. ASSETS’13.
ACM Press, Bellevue, Washington, USA.
L. Rello and M. Ballesteros. 2015. Detecting Readers with Dyslexia Using Machine
Learning with Eye Tracking Measures. In Proc. W4A ’15. ACM, Florence, Italy.
L. Rello and M. Ballesteros. 2017. Data Processing System to Detect
Neurodevelopmental-Specic Learning Disorders. United States Patent and Trade-
mark Oce led in on April 20, 2017, as application 15/493,060 (2017).
L. Rello, M. Ballesteros, A. Ali, M. Serra, D. Alarcón, and J. P. Bigham. 2016.
Dytective: Diagnosing Risk of Dyslexia with a Game. In Proc. Pervasive Health’16.
Luz Rello, Enrique Romero, Maria Rauschenberger,Ab dullah Ali, Kristin Williams,
Jerey P. Bigham, and Nancy Cushen White. 2018. Towardslanguage independent
detection of dyslexia with a web-based game. In W4A’18. Lyon, France. https:
Luz Rello, Kristin Williams, Abdullah Ali, Nancy Cushen White, and Jerey P.
Bigham. 2016. Dytective: Towards Detecting Dyslexia Across LanguagesUsing
an Online Game. In Proc. W4A’16. ACM Press, Montreal, Canada.
S.E. Shaywitz, M.D. Escobar, B.A. Shaywitz, J.M. Fletcher, and R. Makuch. 1992.
Evidence that dyslexia may represent the lower tail of a normal distribution of
reading ability. New England Journal of Medicine 326, 3 (1992), 145–150.
F. R. Vellutino, J. M. Fletcher, M. J. Snowling, and D. M. Scanlon. 2004. Specic
reading disability (dyslexia): What have we learned in the past four decades?
Journal of Child Psychology and Psychiatry 45, 1 (2004), 2–40.
WorldFe deration of Neurology(WFN). 1968. Report of research group on dyslexia
and world illiteracy. Dallas: WFN. (1968).