Conference PaperPDF Available

Screening Dyslexia for English Using HCI Measures and Machine Learning

Authors:

Abstract and Figures

More than 10% of the population has dyslexia, and most are di- agnosed only after they fail in school. This work seeks to change this through early detection via machine learning models that pre- dict dyslexia by observing how people interact with a linguistic computer-based game.We designed items of the game taking into account (i) the empirical linguistic analysis of the errors that peo- ple with dyslexia make, and (ii) specific cognitive skills related to dyslexia: Language Skills, Working Memory, Executive Functions, and Perceptual Processes. . Using measures derived from the game, we conducted an experiment with 267 children and adults in order to train a statistical model that predicts readers with and without dyslexia using measures derived from the game. The model was trained and evaluated in a 10-fold cross experiment, reaching 84.62% accuracy using the most informative features. CCS
Content may be subject to copyright.
Screening Dyslexia for English Using HCI Measures and
Machine Learning
Luz Rello
HCI Institute
Carnegie Mellon University
Pittsburgh, USA
luzrello@cs.cmu.edu
Enrique Romero
Department of Computer Science
Universitat Politècnica de Catalunya
Barcelona, Spain
eromero@cs.upc.edu
Maria Rauschenberger
Web Science and Social Computing
Research Group
Universitat Pompeu Fabra
Barcelona, Spain
maria.rauschenberger@upf.edu
Abdullah Ali
Information School
University of Washington
Washington, USA
xyleques@uw.edu
Kristin Williams
HCI Institute
Carnegie Mellon University
Pittsburgh, USA
krismawil@cs.cmu.edu
Jerey P. Bigham
HCI & LTI Institutes
Carnegie Mellon University
Pittsburgh, USA
jbigham@cs.cmu.edu
Nancy Cushen White
Department of Pediatrics
University of California San Francisco
San Francisco, USA
nancycushen.white@ucsf.edu
ABSTRACT
More than 10% of the population has dyslexia, and most are di-
agnosed only after they fail in school. This work seeks to change
this through early detection via machine learning models that pre-
dict dyslexia by observing how people interact with a linguistic
computer-based game. We designed items of the game taking into
account (i) the empirical linguistic analysis of the errors that peo-
ple with dyslexia make, and (ii) specic cognitive skills related to
dyslexia: Language Skills, Working Memory, Executive Functions,
and Perceptual Processes. . Using measures derived from the game,
we conducted an experiment with 267 children and adults in order
to train a statistical model that predicts readers with and without
dyslexia using measures derived from the game. The model was
trained and evaluated in a 10-fold cross experiment, reaching 84.62%
accuracy using the most informative features.
CCS CONCEPTS
Computers and Society
;
Social Issues
;
Assistive Tech-
nologies for persons with disabilities;
KEYWORDS
Dyslexia, screening, early detection, diagnosis, linguistics, serious
games, machine learning
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
DH’18, April 23–26, 2018, Lyon, France
©2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6493-5/18/04.. .$15.00
https://doi.org/10.1145/3194658.3194675
ACM Reference Format:
Luz Rello, Enrique Romero, Maria Rauschenberger, Abdullah Ali, Kristin
Williams, Jerey P. Bigham, and Nancy Cushen White. 2018. Screening
Dyslexia for English Using HCI Measures and Machine Learning. In DH’18:
2018 International Digital Health Conference, April 23–26, 2018, Lyon, France.
ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3194658.3194675
1 INTRODUCTION
More than 10% of the population has dyslexia [
11
,
26
]. The DSM-V
[
1
] denes dyslexia as a specic learning disorder with a neurologi-
cal basis. According to the World Federation of Neurology it occurs
in children who, despite conventional classroom experience, fail to
attain the language skills of reading, writing, and spelling commen-
surate with their intellectual abilities [
28
]. In summary, dyslexia
is frequent, universal and related to school failure. However, it re-
mains under-diagnosed. For instance, in the UK, a country that
eectively treats dyslexia as compared with other countries, only
5% of the individuals with dyslexia are diagnosed and given appro-
priate help, and [
7
]. It is estimated that over 85% of adult illiterates
have dyslexia [7].
Yet, early detection is crucial for addressing dyslexia and ef-
fective remediation. Often, students are under-diagnosed because
current procedures for diagnosis are expensive [
16
,
21
] and require
professional oversight [
3
,
8
]. Our goal is for anyone to know as
early as possible if they might have dyslexia in an inexpensive way.
To achieve this goal, we have created a computer game that
records a wide variety of web-page interaction measures to screen
dyslexia for English. We conducted a user study with 267 partici-
pants to collect data to train a machine learning model that is able
to correctly determine if a person has (or does not have) dyslexia
with 84.62% accuracy.
2 BACKGROUND AND RELATED WORK
The complexity of administering paper-based diagnostic tools, and
the time they require, have led educators to turn towards computer
based screening methods to derive a quick assessment.
2.1 Commercial Software
Among the available commercial software to detect dyslexia in
English there is Lexercise Screener [
13
] and Nessy [
17
]. We could
not nd studies behind these commercial applications, although
they are widely used in practice. To our knowledge, they are not
based on a machine learning model predictive of dyslexia, and we
have not found publication of their accuracy.
2.2 Computer Based Games
There are a number of computer games designed to screen for
dyslexia, but they do not use machine learning models.
Lyytinen et al. [
15
] created the computer game Literate, later
called GraphoGame [
14
], to identify children at risk in Finland. The
game was tested with 12 and 41 children between 6 and 7 years old
-with statistically signicant dierences.
There are three other on-going projects for early risk detection
of dyslexia that have not yet reported signicant results yet: one
approach for Italian tested with 24 pre-schoolers [
9
], and a language-
independent approach MusVis evaluated with German, English and
Spanish children [19, 24].
2.3 Machine Learning Approaches
Machine learning approaches to predict dyslexia are more recent. In
2015, the rst method to screen dyslexia in Spanish was introduced;
it used eye-tracking measures from 97 subjects (48 with dyslexia)
[
21
]. Later in 2016, eye-tracking measures were also used to predict
dyslexia for Swedish (185 subjects, 97 of then with high-risk of
dyslexia)[
2
]. Both methods used Support Vector Machines. Another
study detected dyslexia subtypes in the Hebrew language using
data derived from existing medical records [12].
The only approach we are aware of to predict risk of dyslexia
using features derived from computer-based measures is the game
Dytective for Spanish. The screener, Dytective, was rst evaluated
with 343 people (95 with diagnosed dyslexia) and attained 83% accu-
racy in a held-out test set with 100 participants using Support Vector
Machines [
23
]. Later, the model was improved by applying a neu-
ral network model (Long Short-Term Memory Networks (LSTMs)
[
10
]) to a larger dataset–4,335 participants (763 with professional
dyslexia diagnosis)–attaining 91.97% accuracy [
22
]. This model was
integrated into a free online tool Dytective which has been used
over 100,000 times.
1
An earlier study piloted Dytective’s screening
measures with 60 English speaking children and found the feature
set was promising, but the study did not fully incorporate machine
learning methods [25].
We advance these approaches by (i) extending these methods
to the English language and (ii) include a wider number of items
targeting cognitive indicators predictive of dyslexia.
1https://dytectivetest.org/
3 USER STUDY
We conducted a within-subject study (267 participants) with all
participants exposed to the same linguistic items integrated into
an online game Dytective.
3.1 Procedure and Ethics Statement
Participants completed the experiment remotely, through a com-
puter at home, school, or in a specialized center in the USA (mainly
from the states of Pennsylvania, New York and Texas). All partici-
pants agreed to participate through an online consent form, and chil-
dren provided assent along with their parent or legal guardian fol-
lowing protocols approved by our institutional review board (IRB).
Parents/legal guardians were specically warned that they could
not help their children complete the study exercises. When schools
and specialized centers oversaw participation, parental/legal guardian
consent was obtained in advance, and the study was supervised by
the school counselor or therapist.
The rst part of the study consisted of a questionnaire collecting
demographic data. This questionnaire was completed by the par-
ticipant’s supervisor (school counselor or therapist) in cases when
the participant was under 18 years of age. Then, following oral
instructions, participants were given 20 minutes to complete the
test exercises.
3.2 Participants
We recruited 267 participants from one specialized center, three
schools, and from individuals with dyslexia who knew about our
study through our public call online.
Subjects ranged in age from 7 to 60 years old. We classied
these participants into three groups. Of the participants, 52 were
diagnosed with dyslexia -Class D (dyslexia)- (28 female, 24 male,
M=
11
.
16
,SD =
6
.
31) and 206 without a diagnosis of dyslexia
served as a control group -Class N (Not-Dyslexia)- (94 female, 112
male,
M=
11
.
89
,SD =
5
.
11). There were 9 participants at risk of
having dyslexia or suspected of having dyslexia -Class M (Maybe)-
(4 female, 5 male, M=17.66,SD =16.17).2
The rst language of all participants was English, although 84
participants spoke another language (mostly Spanish in the Texas
area). A total of 224 participants reported having trouble with lan-
guage classes at school.
3.3 Dependent Measures
Participants’ performance was measured using the following de-
pendent measures for each of the exercises: (i) Number of Clicks
per item; (ii) Hits (i.e., the number of correct answers); (iii) Misses
(i.e., the number of incorrect answers); (iv) Score (i.e., the sum of
correct answers for each stage’s problem type); (v) Accuracy (i.e.,
the number of Hits divided by the number of Clicks; and (vi) Miss
Rate (i.e., the number of Misses divided by the number of Clicks).
We later used these performance measures together with the
demographic data as features of our prediction model’s dataset (see
Section 4).
2
All were either adults or children under observation by professionals, the step before
having an ocial diagnosis.
Language Skills Working Memory
Alphabetic Awareness Visual (alphabetical)
Phonological Awareness Auditory (phonology)
Syllabic Awareness Sequential (auditory)
Lexical Awareness Sequential (visual)
Morphological Awareness Executive Functions
Syntactic Awareness Activation and Attention
Semantic Awareness Sustained Attention
Orthographic Awareness Simultaneous Attention
Perceptual Processes
Visual Discrimination and Categorization
Auditory Discrimination and Categorization
Table 1: Indicators used for the design of the test items.
3.4 Materials
We integrated test items into a software game to serve as the pri-
mary material of our study.
3.4.1 Design and Implementation. Dytective is a cross-platform
web-based game built in HTML5,CSS,JavaScript and a PHP server
and a MySQL database. It was designed with a high level of abstrac-
tion to make it easily portable for future native implementations.
The interface design of the game implements the guidelines that,
according to the latest ndings in accessibility research, ensure
the best on-screen text readability for this target group. Text is
presented in black using a mono-spaced typeface Courier and a
minimum font size of 14 points [20].
3.4.2 Playing Dytective. At each phase, the player’s goal is to
accumulate points by solving a linguistic problem type as many
times as possible in a 25-second time window. For example, the
player hears the target, non-word crench and then a board is shown
on screen containing the target non-word as well as distractors that
are particularly dicult for people with dyslexia to dierentiate
(See Figure 1 (a)). After each time window, the player continues on
to the next item corresponding to a new linguistic problem type.
3.4.3 Content Design. The test items are composed of a set of
attention and linguistic exercises addressing three or more of the
following indicators belonging to dierent types of Language Skills,
Working Memory, Executive Function, and Perceptual Processes. These
indicators are related to dyslexia [5, 6, 27].
The exercises were designed according to linguistic knowledge
and the expertise of dyslexia therapists (specic to the English
language). In addition, to assist item selection (exercises) we used
the following criteria:
(i)
linguistic analyses of 833 confusion sets
3
, created from the
errors of people with dyslexia writing in English [18]; and
(ii)
Performance measures from the linguistic exercises of an
online game called Piruletras.
4
This game is part of previous
work targeting children with dyslexia to improve spelling
performance [24]. We selected exercises that were more
challenging for the players (those with higher error rates
and need for more time to be solved) since those exercises
were more likely to manifest dyslexia diculties.
3
A confusion set is a small group of words that are likely to be confused with one
another–such as weather and whether.
4https://itunes.apple.com/us/app/dyseggxia/id534986729?mt=8
4 DATASET
The dataset is composed of 226 features per participant (i.e., total of
60,342 data points. Each participant from the dataset was marked
as Dif the participant has dyslexia, Nif not, and M(maybe) if the
participant suspects that he or she has dyslexia but is not diagnosed.
From the dataset we extracted the following features:
1 Gender
of the participant. A binary feature with two values,
female and male.
2 Age of the participant ranging from 7 to 60 years old.
3 Second language
. A binary feature with two values, no and
yes, when the participant had a second language in case of
bilingualism.
4 Language subject
. This is a binary feature with two values,
no and yes, when the participant declares that she has trouble
with language classes at school.
Features from
5
to
226
are
performance measures
; they cor-
respond with the six dependent measures (Clicks, Hits, Misses, Score,
Accuracy, and Missrate) per level played (37 levels).
These features target some of the skills presented in Table 1. Note
that all the exercises involve attention, so all these features target
the executive functions
activation and attention
, and
sustained
attention
. In addition, some of them also target
simultaneous
attention
when the participant pays attention to a number of
sources of incoming information at the same time.
5-28
These features are performance measures related to
alpha-
betic awareness
and
visual discrimination and catego-
rization
. For these tasks the participant hears the name of a
letter, e.g., d, and identies it from among the distractors (or-
thographic and phonetically similar letters, e.g. b, q, p) within
a time frame, using a Whac-A-Mole-style game interaction.
29-52
These features relate to
phonological awareness
and
audi-
tory discrimination and categorization
. The participant
listens to the sound (phoneme) of a letter and identies it
from among distractors. For example, the participant hears
the phoneme /n/ and then a board is shown containing the
target <n> as well as distractors. We use distractors that are
particularly dicult for people with dyslexia to dierentiate
(i.e., other phonemes that share phonetic features, such as.
nasal and sound consonants.
53-88
These features target
syllabic awareness
and
auditory
discrimination and categorization
. The players hear the
pronunciation of a syllable (e.g., /prin/) and identify its
spelling from among orthographic distractors <pren> <prein>,
<prain>, <prean>, and <pryn>.
89-112
These features correspond to a set of exercises where par-
ticipants identify a word’s spelling after hearing its pro-
nunciation (e.g., /greet/ by discriminating among phoneti-
cally and orthographically similar words and/or non-words
(e.g., <create>, <greate>, <great>, <grete>, <greit>, <creet>,
<crete>, <creat>. These features target
lexical awareness
,
auditory working memory
, and
auditory discrimina-
tion and categorization.
113-136
These performance features correspond to exercises target-
ing
visual discrimination and categorization
, by requir-
ing participants to nd as many dierent letters as possible
Figure 1: Screenshots of the exercises requiring the player to click on the target non-word listed among the distractors; (a)
select the dierent letter; (b) build a correct word by substituting a letter, (c) selecting a letter, (d) or deleting a letter (e).
within a time frame in a visual search task (e.g. E/F, g/q, c/o,
b/d or p/q). See Figure 1, exercise (b).
137-160
These features were extracted from a set of exercises requir-
ing players to listen to a non-word and choose its spelling
(e.g. /lurled/) from among distractors (e.g. <rurled>, <larled>,
<lurded>, <lurleb>, <lorled> ). These features target
sequen-
tial auditory working memory
, and
auditory discrimi-
nation and categorization. See Figure 1, exercise (a).
161-172
These performance features target
lexical
,
phonological
,
and
orthographic awareness
; They are derived from exer-
cises requiring participants to supply a missing letter [161-
166] or delete an extra letter in a target word [167-172]. See
Figure 1, exercises (d) and (e), respectively.
173-178
These performance features target
morphological
and
se-
mantic awareness
. They are collected from exercises re-
quiring participants to nd a morphological error in a sen-
tence when there is also a semantic error. For example, in
the sentence, The aect of the wind was to cause the boat’s
sails to billow. (The word aect should be eect).
179-184
These features relate to
syntactic awareness
. Participants
nd an error in a sentence related to a grammatical or func-
tion word that changes, (e.g., of instead of on in “Smoking is
prohibited of the entire aircraft”).
185-190
This set of features relates to
phonological
,
lexical
, and
orthographic awareness
. These exercises require to nd
an error in a sentence and correct it by choosing a letter
from a set of distractors. See Figure 1, exercise (c).
191-202
This set of features -
phonological, lexical
and
ortho-
graphic awareness
(
Features 191-196
)-require partici-
pants to rearrange letters to spell a real word (e.g., b e c
uase) or to rearrange syllables to spell a real word (e.g., /na/
/na/ /ba) -
syllabic, lexical,
and
ortho- graphic awareness
(Features 197-202).
203-208
This set of features, addressing
phonological, lexical
and
orthographic awareness
requires players to separate
words to make a meaningful sentence, e.g. Change sher-
anupthehill to e.g. she ran up the hill.
209-214
This set of features targets
sequential visual working
memory
and
visual discrimination and categorization
since they are gathered from exercises where players see a
Score
Accuracy 84.62%
Precision – Class D (Dyslexia) 63.76%
Recall – Class D (Dyslexia) 80.24%
Precision – Class N (Not-Dyslexia) 93.88%
Recall – Class N (Not-Dyslexia) 85.83%
Table 2: Classier accuracy in the cross validation experi-
ment, using the optimized feature set.
sequence of letters for 3 seconds and then write the sequence
discriminating targets from distractors.
215-226
This set of features
(215-220)
targets
lexical
and
ortho-
graphic awareness
and requires participants to listen and
write a word (e.g., /make/) or targets
sequential auditory
working memory
and
phonological awareness (221-
226)
and requires participants to listen and write a non-word
(e.g., /smay/).
5 RESULTS AND DISCUSSION
To determine whether it is feasible to detect whether a user may
have dyslexia, we set up a machine learning experiment. We car-
ried out an experiment with a binary classier of LIBSVM [
4
] in
the Gaussian Support Vector Machine (SVM) setup. An SVM is a
method for supervised machine learning that analyzes data and
nds patterns for classication. As other Machine Learning algo-
rithms, given a set of training examples, each marked as belonging
to a category, an SVM training algorithm builds a model that assigns
new examples into the categories. The particular bias of SVMs is
that of constructing a hyperplane (either in the original space or in
a transformed one) for the classication output. This hyperplane
is constructed by combininig the original input examples with the
aim of maximizing the functional margin. Our SVM is trained on
the dataset as the one described in Section 4.
We performed a 10-fold cross validation experiment by dividing
the data into 10 dierent roughly equal subsets (10% of the data
in each subset). Then, we trained a statistical model on the rest of
the data (90%) and tested on the corresponding fold by iterating
10 times; at the end, all data was tested independently. We used
10-fold cross validation because it is normally recommended for
smaller datasets when a single train-development test split might
not be informative enough.
We randomized the data and used stratied sampling to ensure
a similar distribution of data categories in all folds. Participants
marked as M (Maybe) were assigned to the class D (Dyslexia). Out-
liers’ values in the number of Clicks and Misses were limited to a
maximum xed value. Subsequently, the data were scaled to zero
mean and unit variance.
We analyzed the data for features whose distributions were dif-
ferent between dyslexic and non-dyslexic participants. To that end,
aKolmogorov-Smirnov test was performed. The number of Hits and
Misses showed dierent distributions for a number of exercises.
Table 2 shows the accuracy of the SVM model (Gaussian kernel).
This result suggests that the model is able to predict players with
dyslexia quite accurately with a nal result of 80.24% by using a
subset of informative features. Note that the baseline (the percent-
age of subjects assigned to the class Dyslexia in the data set) is
22.85%.
The most informative features were a set of 10 features composed
of Hits and Misses,Misses being the most informative ones at an
individual level. These features are performance measures belong-
ing to exercises that target
Alphabetic Awareness
,
Phonolog-
ical Awareness
,
Visual Discrimination and Categorization
and
Auditory Discrimination and Categorization
. More con-
cretely, these features come from exercises where the participant
was required to map (or associate) a letter name or a letter sound
with a grapheme (letter or letters). This is consistent with previous
literature on dyslexia that focus on the decit on the phonological
component in dyslexia [26, 27].
6 CONCLUSIONS AND FUTURE WORK
We presented a method to screen for risk of dyslexia among Eng-
lish speakers that combines machine learning and web-based in-
teraction data collected from a linguistic game. The method was
evaluated with 267 participants and attained 84.62% accuracy on
its prediction. These results build on earlier ndings from the rst
version of Dytective [23], where only Spanish was considered.
These results should be taken as preliminary, since the model
was trained on a small dataset. Further experiments with more
participants under other less controlled conditions are needed. Our
next step will be to conduct a large-scale study. With positive results,
we will integrate the model in a tool to screen risk of dyslexia
online. Since estimations of dyslexia are much higher than the
actual diagnosed population, we believe this method has potential
to make a signicant impact.
ACKNOWLEDGMENTS
This paper was developed under a grant from the US Department
of Education, NIDRR grant number H133A130057; and a grant from
the National Science Foundation (#IIS-1618784).
We thank the Valley Speech Language and Learning Center in
Brownsville in Texas, and the schools Winchester Thurston School
(Pittsburgh, PA), and Ellis School (Pittsburgh, PA). . We thank the vol-
unteers who participated (Adam Brownold, Elsa Cárdenas-Hagan,
Anne Fay, Susan Freudenberg) in supervising the participants. Thanks
to Lola Álvarez, Susanne Burger, Debbie Meyer, and Regina Rash
for their help with recruiting participants
REFERENCES
[1]
American Psychiatric Association. 2013. Diagnostic and statistical manual of
mental disorders, (DSM-V). American Psychiatric Publishing, Arlington, VA.
[2]
Mattias Nilsson Benfatto, Gustaf Öqvist Seimyr, Jan Ygge, Tony Pansell, Agneta
Rydberg, and Christer Jacobson. 2016. Screening for Dyslexia Using Eye Tracking
during Reading. PloS one 11, 12 (2016), e0165508.
[3]
Elizabeth Carrow-Woolfolk. 1995. OWLS, Oral and Written Language Scales. NCS
Pearson Incorporated.
[4]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector
machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–
27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[5]
F. Cuetos and F. Valle. 1988. Modelos de lectura y dislexias (Reading models and
dyslexias). Infancia y Aprendizaje (Infancy and Learning) 44 (1988), 3–19.
[6]
Robert Davies, Javier Rodríguez-Ferreiro, Paz Suárez, and Fernando Cuetos. 2013.
Lexical and sub-lexical eects on accuracy, reaction time and response duration:
impaired and typical word and pseudoword reading in a transparent orthography.
Reading and Writing 26, 5 (2013), 721–738.
[7]
Dyslexia Research Institute. 2015. Dyslexia, Identication. http://www.
dyslexia-add.org/issues.html. ( January 2015).
[8]
Angela Fawcett and Rod Nicolson. 2004. The Dyslexia Screening Test: Junior
(DST-J). Harcourt Assessment.
[9]
Ombretta Gaggi, Giorgia Galiazzo, Claudio Palazzi, Andrea Facoetti, and Sandro
Franceschini. 2012. A serious game for predicting the risk of developmental
dyslexia in pre-readers children. In Proc. ICCCN’12. IEEE, 1–5.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
Computation 9, 8 (1997), 1735–1780.
[11]
Interagency Commission on Learning Disabilities. 1987. Learning Disabilities: A
Report to the U.S. Congress. Government Printing Oce, Washington DC.
[12]
Yair Lakretz and Michal Rosen-zvi. 2015. Probabilistic Graphical Models of
Dyslexia. In Proc. SIGKDD’15. 1919–1928.
[13]
Lexercise. 2016. Dyslexia Test - Online from Lexercise. http://www.lexercise.
com/tests/dyslexia-test. (2016). [Online; accessed 18-September-2017].
[14]
Heikki Lyytinen, Jane Erskine, Janne Kujala, Emma Ojanen, and Ulla Richardson.
2009. In search of a science-based application: A learning tool for reading
acquisition. Scandinavian journal of psychology 50, 6 (2009), 668–675.
[15]
Heikki Lyytinen, Miia Ronimus, Anne Alanko, Anna-Maija Poikkeus, and Maria
Taanila. 2007. Early identication of dyslexia and the use of computer game-based
practice to support reading acquisition. Nordic Psychology 59, 2 (2007), 109.
[16]
Dennis L Molfese. 2000. Predicting dyslexia at 8 years of age using neonatal
brain responses. Brain and language 72, 3 (2000), 238–245.
[17]
Nessy. 2011. Dyslexia Screening - Nessy UK. https://w ww.nessy.com/uk/product/
dyslexia-screening/. (2011). [Online; accessed 18-September-2017].
[18]
J. Pedler. 2007. Computer Correction of Real-word Spelling Errors in Dyslexic Text.
Ph.D. Dissertation. Birkbeck College, London University.
[19]
Maria Rauschenberger, Luz Rello, Ricardo Baeza-Yates, Emilia Gomez, and Jef-
frey P. Bigham. 2017. Towards the Prediction of Dyslexia by a Web-based Game
with Musical Elements. In W4A’17. 4–7. https://doi.org/10.1145/3058555.3058565
[20]
L. Rello and R. Baeza-Yates. 2013. Good Fonts for Dyslexia. In Proc. ASSETS’13.
ACM Press, Bellevue, Washington, USA.
[21]
L. Rello and M. Ballesteros. 2015. Detecting Readers with Dyslexia Using Machine
Learning with Eye Tracking Measures. In Proc. W4A ’15. ACM, Florence, Italy.
[22]
L. Rello and M. Ballesteros. 2017. Data Processing System to Detect
Neurodevelopmental-Specic Learning Disorders. United States Patent and Trade-
mark Oce led in on April 20, 2017, as application 15/493,060 (2017).
[23]
L. Rello, M. Ballesteros, A. Ali, M. Serra, D. Alarcón, and J. P. Bigham. 2016.
Dytective: Diagnosing Risk of Dyslexia with a Game. In Proc. Pervasive Health’16.
Cancun, Mexico.
[24]
Luz Rello, Enrique Romero, Maria Rauschenberger,Ab dullah Ali, Kristin Williams,
Jerey P. Bigham, and Nancy Cushen White. 2018. Towardslanguage independent
detection of dyslexia with a web-based game. In W4A’18. Lyon, France. https:
//doi.org/10.1145/3192714.3192816
[25]
Luz Rello, Kristin Williams, Abdullah Ali, Nancy Cushen White, and Jerey P.
Bigham. 2016. Dytective: Towards Detecting Dyslexia Across LanguagesUsing
an Online Game. In Proc. W4A’16. ACM Press, Montreal, Canada.
[26]
S.E. Shaywitz, M.D. Escobar, B.A. Shaywitz, J.M. Fletcher, and R. Makuch. 1992.
Evidence that dyslexia may represent the lower tail of a normal distribution of
reading ability. New England Journal of Medicine 326, 3 (1992), 145–150.
[27]
F. R. Vellutino, J. M. Fletcher, M. J. Snowling, and D. M. Scanlon. 2004. Specic
reading disability (dyslexia): What have we learned in the past four decades?
Journal of Child Psychology and Psychiatry 45, 1 (2004), 2–40.
[28]
WorldFe deration of Neurology(WFN). 1968. Report of research group on dyslexia
and world illiteracy. Dallas: WFN. (1968).
... Although early detection of dyslexia is crucial for effective remediation, conventional techniques are expensive and require professional oversight [4]. Recently, machine learning techniques have become popular in predicting Dyslexia from the data collected through online tests, such as reading and writing exercises, online games, tracking eye movement, or collecting EEG scans and MRI data while the participants engage in reading or writing tasks [5]. ...
... Recently, machine learning techniques have become popular in predicting Dyslexia from the data collected through online tests, such as reading and writing exercises, online games, tracking eye movement, or collecting EEG scans and MRI data while the participants engage in reading or writing tasks [5]. Among these techniques, online gamified tests have recently gained popularity among researchers as they are cost-effective, easier to conduct, and can cover a wide participant base [2,4,6]. In this case, participants engage in online games while information about their performance is collected and later analyzed for detecting dyslexia. ...
... Researchers have used different types of tests including reading [15][16][17][18], writing [19,20] and online games [2,4,6] to collect different types of data, such as text [4,6,17], image [18,20], video [18], Eye movement tracking [15], MRI scans [16] and EEG scans [19]. Afterward, the collected data is processed and analyzed using different machine learning techniques to predict and classify dyslexic and non-dyslexic participants. ...
Article
Full-text available
Developmental Dyslexia is a learning disorder often discovered in school-aged children who face difficulties while reading or spelling words even though they may have average or above-average levels of intelligence. This ultimately results in anger, frustration, low self-esteem, and other negative feelings. Early detection of Dyslexia can be highly beneficial for dyslexic children as their learning needs can be properly addressed. Researchers have used several testing techniques for early discovery where the data is collected from reading and writing tests, online games, Magnetic reasoning imaging (MRI) and Electroencephalography (EEG) scans, picture and video recording. Several Machine learning techniques have also been used in this regard recently. However, existing works did not focus on the problem of the imbalanced dataset where the percentage of dyslexic participants is much higher compared to non-dyslexic participants, which is expected to be the case for pre-screening among a random population. This paper addresses the imbalanced dataset obtained from dyslexia pre-screening tests and proposes an oversampling and ensemble-based machine learning technique for the detection of Dyslexia. Simulation results show that the proposed approach improves the detection accuracy of the minority class, i.e., dyslexic patients from 80.61% to 83.52%.
... Finally,we have game development also contributing to the informal detection of LD [46,47,48,49,50] The game DytectiveR48 was first tested on 243 people and achieved 83 percent accuracy using Support Vector Machines in a held-out test set with 100 participants. The authors increased the performance by expanding the game's scope to include HCI measurements, which were covered by 226 performance measures. ...
... The authors increased the performance by expanding the game's scope to include HCI measurements, which were covered by 226 performance measures. The model was trained using the SVM classifier, and it has an accuracy of 84.62 percent [50]. ...
Article
Full-text available
Learning Disabilities (LD) are a type of disability that affects people of normal or above-average intelligence. The ability to learn is harmed, and this could last a lifetime. Some children may have a single learning problem, while others may have multiple learning disorders that overlap. Learning disability may include disabilities in various areas related to reading, language and mathematics. Learning disabled children are a broad collection of kids who may face challenges in a variety of areas. For example, one child with a learning disability may have major reading challenges, whereas another may have no reading difficulty at all but struggles with written communication. Learning difficulties are developmental abnormalities that commonly appear during the course of a child’s schooling. These limitations cause a considerable gap between an individual’s genuine potential and day-to-day performance. The purpose of this study is to provide a taxonomy of the many learning qualities of LD, as well as the types of characteristics that cater to which learning disabilities, and to identify the modalities in which a particular learning disability can be captured. Based on these characteristics , we design an e-learning system to detect the presence of learning disability using machine learning.
... In 2018, L. Rello et al. proposed a novel method of screening dyslexia for English language using human-computer interaction (HCI) measures and (ML) machine learning [10]. The dataset consisted of 10 features which were analyzed for different features, and SVM with Gaussian kernel was applied. ...
Article
Full-text available
Dyslexia is the hidden learning disability, neurobiological in origin wherein students face hard time in accurate or fluent word recognition, connecting letters to the sounds. In India, index of dyslexia is increasing exponentially. The level of difficulty of dyslexic children varies from person to person. Their brain is normal; often very “intelligent,” but with strengths and capabilities in areas other than the language area. Henceforth, such students are suffering from low self-esteem, are bipolar in nature, have negative feelings and depression. Therefore, early detection and evaluation of dyslexic students is very important and need of the hour. In this review paper, the authors have summed up various research dimensions toward dyslexia detection. This paper principally focuses on the machine learning techniques for dyslexia screening which includes applications covering different machine learning-based approaches, game-based techniques and image processing techniques for designing various assessments and assistive tools to support and ease the problems encountered by dyslexic people. This review paper identifies various knowledge gaps, current issues and future challenges in this research domain. It mainly focuses on various machine learning applications toward detection of dyslexia.
... These conventional methods based on behavioural aspects are highly time-consuming and tiresome, and even the variability in symptoms among individuals make the analysis a challenging task. Several techniques have been proposed by researchers for detecting developmental dyslexia like reading/writing text [49], web-based word games [72], eye tracking [8], MRI scans [65], EEG scans [73], video and image capturing [32], etc. Usage of different detecting methods depends on varying attributes to be diagnosed for dyslexia like grey matter deficit using structural magnetic resonance imaging and reduced neural activities in specific brain zone can be demonstrated using MRI scans and EEG scans [84]. EEG scans are successfully exploited for dyslexia detection by identifying unique brain activation patterns specific to brain activities. ...
Article
Full-text available
Electroencephalography (EEG) is the commonly employed electro-biological imaging technique for diagnosing brain functioning. The EEG signals are used to determine head injury, ascertain brain cell functioning, and monitor brain development. EEG can add multiple dimensions towards the identification of learning disability being an abnormality of the brain. Early and accurate detection of brain diseases can significantly reduce the mortality rate with a lesser treatment cost. The machine learning techniques can examine, classify, and process EEG signals to accurately understand brain activities and disorders. This paper is a comprehensive review of the application of machine learning techniques in the classification of EEG signals of dyslexia and analysis of an improved framework to extemporize the classifier’s performance and accuracy in discriminating between dyslexics and controls. The presence of noises and artefacts often reduces the performance of classifiers and hampers results. This study reviews input pre-processing, feature selection, feature extraction techniques and machine learning algorithms for the early detection of disorder. The SVM was found to be outperforming other machine learning techniques for the classification of EEG signals.
... In fact, as a novel approach for solving a variety of problems such as classification, machine learning has been increasingly applied to identify people with and without dyslexia across languages (for a review, see Kaisar, 2020;Usman et al., 2021). For example, after applying an SVM model to identify English-speaking participants with dyslexia by simultaneously considering a wide range of perceptual processes (e.g., auditory and visual discrimination) and cognitive skills (e.g., phonological and orthographic awareness, working memory, and executive function), Rello et al. (2018) achieved an accuracy of 84.62% using the most informative features, namely, phonological awareness and auditory discrimination. Furthermore, machine learning algorithms trained using anatomical and neurophysiological data can objectively identify individuals with dyslexia (e.g., Perera et al., 2018;Tamboer et al., 2016). ...
Article
Purpose Dyslexia is characterized by its diverse causes and heterogeneous manifestations. Chinese children with dyslexia exhibit orthographic, phonological, and semantic deficits across character and radical levels when writing. However, whether character dictation can be used to distinguish children with dyslexia from their typically developing peers remains unexplored. Method A dataset of written characters from 1,015 Chinese children with and without dyslexia from Grades 2–6 was used to train multiple machine models with different learning algorithms. Results The multi-level multidimensional model reached a predictive accuracy of 78.0%, with stroke, grade, lexicality, and character configuration manifesting as the most predictive features. The accuracy of the model improved to 80.0% when only these features were included. Conclusion These results not only provide evidence for the multidimensional causes of Chinese dyslexia, but also highlight the utility of machine learning in distinguishing children with dyslexia from their peers via Chinese dictation, which elucidates a promising area of future research.
... Rello, Luz, et al [15] aimed to change this by detecting dyslexia early using machine learning models that predict dyslexia by analysing how people engage with an etymological editor@iaeme.com computer game the game's pieces were created using (I) an empirical etymological analysis of the errors that persons with dyslexia make, and (ii) specific dyslexia-related cognitive functions in mind: Language Skills, Working Memory, Executive Functions, and Perceptual Processes. ...
Article
Full-text available
Learning disorders such as dysgraphia, dyslexia, dyspraxia, and others obstruct academic progress while also having long-term implications that extend beyond academic time. It is well acknowledged that this type of disability affects between 5% and 10% of the overall population. Children must complete a battery of tests in order to be assessed for such disabilities in early life. These assessments are scored by human professionals, who determine if the youngsters require special education strategies depending on their results. The evaluation can be time-consuming, costly, and emotionally draining. Dyslexia is a learning disability marked by a lack of reading and/or writing skills, as well as difficulties with fast word identifying and spelling. Dyslexics have a hard time reading and understanding words and letters. Different methodologies are used in research to distinguish dyslexics from non-dyslexics, such as machine learning, image processing, studying cerebrum behaviour through brain science, and pondering the variations in life systems of mind. E-learning technologies have been increasingly important in higher education in recent years, particularly in improving learning experiences for those with learning disabilities. However, many professionals involved in the creation and deployment of e-learning tools fail to consider the needs of dyslexic pupils. In this research, a comprehensive literature review is conducted on machine learning algorithms for dyslexia prediction and e-learning for learning and cognitive disorders.
Conference Paper
Full-text available
Detecting dyslexia is important because early intervention is key to avoid the negative effects of dyslexia such as school failure. Most of the current approaches to detect dyslexia require expensive personnel (i.e. psychologists) or special hardware (i.e. eye trackers or MRI machines). Also, most of the methods can only be used when children are learning how to read but not before, necessarily delaying needed early intervention. In this work, we present a study with 178 participants speaking different languages (Spanish, German, English, and Catalan) with and without dyslexia using a web-based game built with musical and visual elements that are language independent. The study reveals eighth game measures with significant differences for Spanish children with and without dyslexia, which could be used in future work as a basis for language independent detection. A web- based application like this could have a major impact on children all over the world by easily screening them and suggest the help they need.
Conference Paper
Full-text available
Current tools for screening dyslexia use linguistic elements, since most dyslexia manifestations are related to difficulties in reading and writing. These tools can only be used with children that have already acquired some reading skills and; sometimes, this detection comes too late to apply proper remediation. In this paper, we propose a method and present DysMusic, a prototype which aims to predict risk of having dyslexia before acquiring reading skills. The prototype was designed with the help of five children and five parents who tested the game using the think aloud protocol and being observed while playing. The advantages of DysMusic are that the approach is language independent and could be used with younger children, i.e., pre-readers.
Article
Full-text available
Dyslexia is a neurodevelopmental reading disability estimated to affect 5–10% of the population. While there is yet no full understanding of the cause of dyslexia, or agreement on its precise definition, it is certain that many individuals suffer persistent problems in learning to read for no apparent reason. Although it is generally agreed that early intervention is the best form of support for children with dyslexia, there is still a lack of efficient and objective means to help identify those at risk during the early years of school. Here we show that it is possible to identify 9–10 year old individuals at risk of persistent reading difficulties by using eye tracking during reading to probe the processes that underlie reading ability. In contrast to current screening methods, which rely on oral or written tests, eye tracking does not depend on the subject to produce some overt verbal response and thus provides a natural means to objectively assess the reading process as it unfolds in real-time. Our study is based on a sample of 97 high-risk subjects with early identified word decoding difficulties and a control group of 88 low-risk subjects. These subjects were selected from a larger population of 2165 school children attending second grade. Using predictive modeling and statistical resampling techniques, we develop classification models from eye tracking records less than one minute in duration and show that the models are able to differentiate high-risk subjects from low-risk subjects with high accuracy. Although dyslexia is fundamentally a language-based learning disability, our results suggest that eye movements in reading can be highly predictive of individual reading ability and that eye tracking can be an efficient means to identify children at risk of long-term reading difficulties.
Conference Paper
Full-text available
Reading is a complex cognitive process, errors in which may assume diverse forms. In this study, introducing a novel approach , we use two families of probabilistic graphical models to analyze patterns of reading errors made by dyslexic people: an LDA-based model and two Na¨ıveNa¨ıve Bayes models which differ by their assumptions about the generation process of reading errors. The models are trained on a large corpus of reading errors. Results show that a Na¨ıveNa¨ıve Bayes model achieves highest accuracy compared to labels given by clinicians (AU C = 0.801 ± 0.05), thus providing the first automated and objective diagnosis tool for dyslexia which is solely based on reading errors data. Results also show that the LDA-based model best captures patterns of reading errors and could therefore contribute to the understanding of dyslexia and to future improvement of the diagnostic procedure. Finally, we draw on our results to shed light on a theoretical debate about the definition and heterogeneity of dyslexia. Our results support a model assuming multiple dyslexia subtypes, that of a heterogeneous view of dyslexia.
Conference Paper
At least 10% of the global population has dyslexia. In the United States and Spain, dyslexia is associated with a large percentage of school drop out. Current methods to detect risk of dyslexia are language specific, expensive, or do not scale well because they require a professional or extensive equipment. A central challenge to detecting dyslexia is handling its differing manifestations across languages. To address this, we designed a browser-based game, Dytective, to detect risk of dyslexia across the English and Spanish languages. Dytective consists of linguistic tasks informed by analysis of common errors made by persons with dyslexia. To evaluate Dytective, we conducted a user study with 60 English and Spanish speaking children between 7 and 12 years old. We found children with and without dyslexia differed significantly in their performance on the game. Our results suggest that Dytective is able to differentiate school age children with and without dyslexia in both English and Spanish speakers.
Conference Paper
Worldwide, around 10% of the population has dyslexia, a specific learning disorder. Most of previous eye tracking experiments with people with and without dyslexia have found differences between populations suggesting that eye movements reflect the difficulties of individuals with dyslexia. In this paper, we present the first statistical model to predict readers with and without dyslexia using eye tracking measures. The model is trained and evaluated in a 10-fold cross experiment with a dataset composed of 1,135 readings of people with and without dyslexia that were recorded with an eye tracker. Our model, based on a Support Vector Machine binary classifier, reaches 80.18% accuracy using the most informative features. To the best of our knowledge, this is the first time that eye tracking measures are used to predict automatically readers with dyslexia using machine learning.
Conference Paper
Reading is a complex cognitive process, errors in which may assume diverse forms. In this study, introducing a novel approach, we use two families of probabilistic graphical models to analyze patterns of reading errors made by dyslexic people: an LDA-based model and two Naëve Bayes models which differ by their assumptions about the generation process of reading errors. The models are trained on a large corpus of reading errors. Results show that a Naëve Bayes model achieves highest accuracy compared to labels given by clinicians (AUC = 0.801 ± 0.05), thus providing the first automated and objective diagnosis tool for dyslexia which is solely based on reading errors data. Results also show that the LDA-based model best captures patterns of reading errors and could therefore contribute to the understanding of dyslexia and to future improvement of the diagnostic procedure. Finally, we draw on our results to shed light on a theoretical debate about the definition and heterogeneity of dyslexia. Our results support a model assuming multiple dyslexia subtypes, that of a heterogeneous view of dyslexia.
Article
LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.