PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Eye movements during text reading can provide insights about reading disorders. Via eye-trackers, we can measure when, where and how eyes move with relation to the words they read. Machine Learning (ML) algorithms can decode this information and provide differential analysis. This work developed DysLexML, a screening tool for developmental dyslexia that applies various ML algorithms to analyze fixation points recorded via eye-tracking during silent reading of children. It comparatively evaluated its performance using measurements collected in a systematic field study with 69 native Greek speakers, children, 32 of which were diagnosed as dyslexic by the official governmental agency for diagnosing learning and reading difficulties in Greece. We examined a large set of features based on statistical properties of fixations and saccadic movements and identified the ones with prominent predictive power, performing dimensionality reduction. Specifically, DysLexML achieves its best performance using linear SVM, with an a accuracy of 97 %, with a small feature set, namely saccade length, number of short forward movements, and number of multiply fixated words. Furthermore, we analyzed the impact of noise on the fixation positions and showed that DysLexML is accurate and robust in the presence of noise. These encouraging results set the basis for developing screening tools in less controlled, larger-scale environments, with inexpensive eye-trackers, potentially reaching a larger population for early intervention.
Content may be subject to copyright.
DysLexML: Screening Tool for Dyslexia Using
Machine Learning
Thomais Asvestopoulou∗†, Victoria Manousaki∗†, Antonis Psistakis,
Ioannis Smyrnakis†‡ , Vassilios Andreadakis , Ioannis M. Aslanides§and Maria Papadopouli∗†
Department of Computer Science, University of Crete, Heraklion, Greece
Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece
Optotech Ltd., Heraklion, Crete, Greece
§Emmetropia Eye Institute, Heraklion, Greece
Abstract—Eye movements during text reading can provide
insights about reading disorders. Via eye-trackers, we can mea-
sure when, where and how eyes move with relation to the
words they read. Machine Learning (ML) algorithms can decode
this information and provide differential analysis. This work
developed DysLexML, a screening tool for developmental dyslexia
that applies various ML algorithms to analyze fixation points
recorded via eye-tracking during silent reading of children. It
comparatively evaluated its performance using measurements
collected in a systematic field study with 69 native Greek
speakers, children, 32 of which were diagnosed as dyslexic by
the official governmental agency for diagnosing learning and
reading difficulties in Greece. We examined a large set of features
based on statistical properties of fixations and saccadic move-
ments and identified the ones with prominent predictive power,
performing dimensionality reduction. Specifically, it achieves its
best performance using linear SVM model, with an accuracy of
97%, over a small feature set, namely saccade length, number
of short forward movements, and number of multiply fixated
words. Furthermore, we analyzed the impact of noise on the
fixation positions and showed that DysLexML is accurate and
robust in the presence of noise. These encouraging results set the
basis for developing screening tools in less controlled, larger-scale
environments, with inexpensive eye-trackers, potentially reaching
a larger population for early intervention.
Index Terms—dyslexia, reading difficulty, children, eye-
tracking, machine learning, screening
I. INTRODUCTION
Dyslexics manifest significant and persistent reading dif-
ficulties [6], which often involve difficulty in reading due
to word decoding (relating sounds with written phrases, i.e.,
graphemes to phonemes) [11]. Early intervention can be
effective in alleviating the symptoms of the disability. How-
ever, screening large populations of children is rather time-
consuming and expensive [17]. For example, a differential
diagnosis of dyslexia can take up to 14 months [1]. It has
been known that the eye movements during text reading can
be particularly revealing [12]–[14], [16], [18]. For example,
dyslexics exhibit more aberrant eye movements than normal
readers at the same age level [6], although it is unlikely
that the primary cause of dyslexia is erratic eye movements.
Fixations, i.e., maintaining the visual gaze on a single location,
and saccadic movements, i.e., quick simultaneous movements
Contact author: Maria Papadopouli (mgp@ics.forth.gr)
of the eyes between fixations, are important characteristics
for screening dyslexia. Readers with developmental dyslexia
generate different eye movements than typical readers during
text reading: longer and more frequent fixations, shorter sac-
cade lengths, more backward refixations than typical readers
[7], [13], [14], [16]. Furthermore, readers with dyslexia have
difficulty in reading long words, lower skipping rate of short
words, and high gaze duration (total fixation duration at first
visit) on many words. Nonetheless, it is still an open question
whether it is possible to build a screening tool that can
reliably identify readers who may be of high risk for dyslexia
by analyzing these distinctive oculomotor patterns collected
during reading and can be robust under noise.
This work develops DysLexML, a screening tool for
dyslexia, that employs various ML classifiers, such as SVM,
Na¨
ıve Bayes, and comparatively evaluates their performance
using data collected in the field study of RADAR [1]. To
examine its robustness, we assessed its accuracy under various
fixation position noise levels, introduced by the eye-tracking
technology or the small screen size (e.g., the small size of the
text when a mobile device is used). For that, Gaussian noise of
increasing standard deviation is added to the fixation positions
of the dataset. This work demonstrates that DysLexML can
achieve high accuracy (of 97%) and is robust in the presence
of noise. It performs dimensionality reduction, achieving the
aforementioned performance using only a small number of
features, namely the mean and median saccade length, number
of short forward movements, and number of multiply fixated
words.
The innovative contributions of this work are the analysis of
the robustness in the presence of noise, the high accuracy using
only a small set of features, and the comparative evaluation
with other screening tools/algorithms. The paper overviews
briefly the field study in Section II. Section III presents the
DysLexML and Section IV evaluates its performance. Section
VI summarizes our key findings and future work plans.
II. BAC KG RO UN D AND FI EL D STUDY
The field study of RADAR was performed in Greece and
included 69 children, 32 of which were diagnosed as dyslexic
by the official governmental agency for diagnosing learning
and reading difficulties in Greece. Participants age span is
arXiv:1903.06274v1 [cs.CY] 14 Mar 2019
between 8.5 and 12.5 years old. The children were instructed
to read two passages, at their own pace. It was also emphasized
that the purpose was to understand the texts in order to
answer five comprehension questions at the end. Both texts
were written by a special education teacher in Greek. The
first passage (baseline text) consists of 181 words, many
of which multi-syllable. A second passage, simpler than the
first one, targeting to younger participants, was also given to
the subjects. It included 143 words, mostly of one or two
syllables. The experimental procedure consisted of recording
the eye movements of the participants, while they were silently
reading the texts in front of a computer monitor.
A custom-made eye-tracker, developed by Medotics AG was
employed. It consists of two steady cameras that can record
images up to 60Hz with a resolution of 1600×1200 pixels.
Cameras are positioned between the screen and participant
with a viewing field from down towards the participant’s face.
While the participant performs a reading task, the cameras
record the participant’s face. The images extracted are then
used to detect pupil and corneal reflection coordinates. Based
on the collected raw gazing measurements, the fixations were
identified according to a dispersion algorithm [20]. More
information about the field study, e.g., the inclusion and
exclusion criteria, texts, and data collection, can be found in
[1]. A dataset that includes for each fixation, its x- and y-axis
coordinates, its starting and termination time, as well as the
Region of Interest (ROI) (i.e word) the subject is looking at,
is provided as input to DysLexML for analysis.
III. DYSLE XM L SYS TE M
The main modules of the DysLexML algorithm include
the feature extraction, the feature selection for identifying
the dominant features, and its classifiers that employ these
dominant features. DysLexML extracts general (non-word-
specific) features and word-specific ones that take into account
the word the subject is looking at. Examples of non-word
specific features are the number of fixations on the screen,
mean and median duration of fixations and related to saccades,
the mean and median length of saccades, i.e., the Euclidean
distance between consecutive fixations, and characterization
of the types of eye movements. DysLexML creates a feature
vector of 35 features in total.
People with reading difficulties tend to perform back and
forth movements (saccades) on the text line as they proceed as
a result of difficulty to focus or understand [13]. Thus the iden-
tification of such movements and definition of features based
on them can provide valuable information about the dyslexic
population. Typical readers tend to perform medium to large
movements (saccades) in terms of length, while readers with
reading difficulties ”generate” many choppy movements [7]. A
movement is labeled as short if the Euclidean distance between
its consecutive fixations is less than 100 pixels (about 5 letters
in the text). Most of the movements occur within words.
Given that the line of text was about 900 pixels, the threshold
for medium to long movements was set to be 400 pixels
(about half a line). With this threshold a medium backward
Fig. 1. Reading ”path” from a typical reader (top) and from a reader with
dyslexia (bottom). The blue circles are the fixations and the orange lines the
saccadic movements. The larger the circle, the longer the fixation (Figure
appeared in [1]).
movement includes re-reads of small groups of words but
not of entire phrases. That is, short movements are of less
than 100 pixels, long ones are of more than 400 pixels,
and medium movements in the range between 100 and 400
pixels. Change-of-line movements have been excluded from
both forward and backward movement sets. We also derive
information about the number of visits of each word, namely
the number of words that were not visited at all (skipped) and
number of words that where visited more than once during the
text reading. To identify the features with the most predictive
power, we employed the least absolute shrinkage and selection
operator (LASSO) [8], a particular case of penalized least
squares regression with L1-penalty function. LASSO finds the
minimum of the residual sum of squares, subject to the sum of
the absolute value of the coefficients being less than a constant.
The LASSO estimate can be defined by:
(1)
In practice, as λgets higher, less features are taken into
account. Specifically, the parameter λin LASSO regression
is estimated using 5-fold cross validation. Two values of λ
were examined, namely the λminMS E that corresponds to the
minimum mean cross-validation error MSE (vertical dotted
line in Fig. 2) and λ1SE which is one standard error of the
mean higher than λminM SE (vertical solid line). The purpose
of the addition of the 1SE is to reduce the number of regression
coefficients, while the mean square error remains close enough
(1SE) to λminM SE . Both variations were considered in our
analysis.
Fig. 2. Cross-Validated MSE of Lasso Fit for the baseline text.
DysLexML builds classifiers based on SVM, Na¨
ıve Bayes,
and K-means. The SVM with a linear kernel performs better
than the ones with Gaussian or Polynomial, so only the
performance of the linear kernel is reported here. The K-
Means-based classifier was built as follows: the subjects of the
training set are clustered using k-means, and a label is assigned
to each cluster based on the most frequent label within that
cluster. The distance of the test subject from the centroid of
each clusters was estimated. The classifier reports the label of
the cluster whose centroid has the shortest distance from the
test data.
IV. PERFORMANCE ANALYSIS
DysLexML consists of two phases: It first employs the
LASSO Regression five-fold cross-validation to identify the
dominant features. Based on the dominant features, it applies
various classification algorithms. For evaluation, RADAR uses
the Leave One Out Cross validation (LOOCV), an appropriate
choice given the relatively small size of the dataset. To
comparatively evaluate DysLexML with RADAR, we used
LOOCV and the same subject populations. Given that there
were subjects with missing values in the word specific features,
we filled in the missing values with the median of the
corresponding feature values of the training set.
The exclusion of the word-specific features from the feature
vector results to a lower average accuracy for the baseline
(difficult) text. The performance remains the same in the case
of the easier text, indicating that the word-specific features are
not useful when the text is not challenging for the reader.
DysLexML, with SVM and LASSO (λ1SE ), outperforms
RADAR: 97.10 % vs. 94.2 % for the baseline text (Table. I).
TABLE I
CLASSIFICATION PERFORMANCE,INCLUDING ALL SUBJECTS,TRE ATIN G
THE MISSING VALUES. TH E FIR ST C OL UM N COR RE SP ON DS T O TH E
BAS EL IN E TE XT,WHILE THE SECOND COLUMN TO THE EASIER TEXT.
Classifier LOOCV accuracy
K-means (k=2) , LASSO (λminMS E ) 86.95 89.39
K-means (k=3) , LASSO (λminMS E ) 91.30 84.84
K-means (k=4) , LASSO (λminMS E ) 81.15 84.84
K-means (k=2) , LASSO (λ1SE ) 89.85 78.78
K-means (k=3) , LASSO (λ1SE ) 86.95 84.84
K-means (k=4) , LASSO (λ1SE ) 89.85 83.33
Linear SVM, LASSO (λminMS E ) 94.20 80.30
Linear SVM, LASSO (λ1SE ) 97.10 87.87
Linear SVM, without feature selection 85.50 81.81
Na¨
ıve Bayes, LASSO (λminMS E ) 91.30 86.36
Na¨
ıve Bayes, LASSO (λ1SE ) 92.75 84.84
Trivial Accuracy 53.62 53.03
For the easy text, RADAR reports 87.9 % correct classifi-
cation, while DysLexML, with K-means with k equal to 2,
exhibits an accuracy of 89.39 %.
The dominant features (as selected by the LASSO) for both
texts are the mean saccade length and number of short forward
movements. In the case of the baseline (difficult) text, the
additional dominant features are the median saccade length,
and the number of multiply fixated words. Prior research has
also reported the important role of these dominant features
identified by LASSO (as discussed in Section I). The distri-
butions of the mean and the median saccade length for both
populations are significantly different (as shown in Fig. 3),
which explains their presence as separate dominant features.
Fig. 3. ECDF of saccade length features.
Note that there is diversity in the dyslexic population: not
all cases are equally severe. Moreover, remember the subjects
were instructed to not rush their reading and understand the
text in order to answer some comprehension questions at
the end. This may have prolonged the reading sessions even
for typical readers. The number of short forward movements
was expected to play a prominent role. Dyslexics have been
reported to perform more and shorter saccades during reading,
in their attempt to decode the text [14]. The number of
short forward movements of the dyslexic population has large
variance, as shown in the upper part of Fig. 4. 50 % of the
dyslexic subjects have more than twice total short progressive
movements than the control population.
Fig. 4. ECDF of number of short forward movements (top) and number of
multiply fixated words, i.e. the words that have been fixated more than once
during the reading session (bottom).
Dyslexics tend to revisit words more, especially those that
are long or difficult to read [5]. 90 % of the typical readers
have less than 100 words fixated more than once, while this is
the starting value for the dyslexic subjects (Fig. 4 (bottom)).
This illustrates the value of the word specific analysis of the
eye-tracking study.
To examine the robustness of DysLexML, we evaluated its
performance in the presence of noise in the form of small
displacements of the fixation points. For this analysis, we
considered only the children that had reliable data (according
to [1]) and no missing values. The noise follows a Gaussian
distribution with mean value equal to zero and standard
deviation varying from 10 to 100 pixels (with a step size
of 10). In the case of small displacement, only the saccadic
movement features changed. However, large displacements
result to significant changes in the word-specific features. For
each subject, the noise was added and the new feature vectors
were generated. Note that the shifted eye-movements result to
different feature vectors. The DysLexML was then evaluated
for this new dataset. Specifically, for each σ, we generated
10 synthetic datasets. We run the linear SVM LOOCV with
LASSO (λ1SE ) feature selection on 100 datasets. DysLexML
is robust under noise (Fig. 5 (top)). We then trained a linear
SVM model using the dominant features that were reported by
LASSO using the original dataset. For testing, we employed
the 10 synthetic datasets with noise for each given σ. Fig. 5
(bottom) presents the acquired results. The model exhibits a
robust performance for relatively small noise levels (up to σ
of 30 pixels), which corresponds to about 1 character on the
x-axis and 1/3 of the line on the y-axis. However, as the noise
level increases, the accuracy drops significantly. DysLexML
through the SVM, that behaves well under generalization,
addresses the noise in the fixation coordinates in a robust
manner.
Fig. 5. Performance of SVM model under noise. Training model with noisy
data (top) and with original data (bottom). The testing was performed on noisy
data (10 for each σvalue). The solid red line indicates the mean accuracy
over the 10 noisy synthetic datasets, while the gray area represents the range
between the lowest and the maximum accuracy achieved at each noise level.
V. REL ATED WO RK
Although dyslexia has been extensively studied the last three
decades with specialized eye-trackers, there is only a limited
number of eye-tracking-based screening systems, partially due
to the high cost of eye-trackers up to recently and the debate
of the primary cause of dyslexia [16]. The lack of extensive
datasets limits significantly the performance of deep-learning
architectures. On the other hand, SVM is powerful in case
of relatively small datasets. Recent studies have applied ML,
and more specifically SVM for classification of dyslexia on
data collected from eye-trackers [2], [3]. For example, Rello
and Ballesteros [2] performed a field study that included
97 Spanish language native speakers, aged 11-54 reading 12
different texts. They used a binary polynomial SVM classifier
and achieved classification accuracy of 80.18%. Their feature
vector included the age of the participant, the text number,
details about text stylistics, number of visits of a ROI, mean
time spent on a ROI, total reading time, mean of fixation
duration, number of fixations and sum of all fixation durations.
They reported that the reading time, the mean of fixation
duration, and the age of the participant have predictive power.
Benfatto et al. [3] also employed linear SVM with sequential
optimal optimization for screening dyslexia. Their field study
in Sweden included 185 children, 97 of them with high risk of
dyslexia, speaking Swedish as a first language. All the subjects
were reading from paper a short text adapted to their age
while their eye movements were recorded. The subjects were
equipped with head-mounted goggles with arrays of infrared
transmitters and detectors, arranged around each eye. A chin
and forehead rest were deployed to minimize head movements
and stabilize the viewing distance. Their feature set, produced
using a dynamic dispersion threshold algorithm, consisted of
168 features. They also distinguished saccades to progressive
and regressive ones. A recursive feature elimination algorithm
identified the dominant features. They achieved accuracy of
95.6%±4.5% using 48 features of the original feature space.
Al-Edaily et al. [4] developed Dyslexia Explorer in the Arabic
language and performed a study with 14 subjects, 7 of whom
with diagnosed dyslexia. Their system is designed to help
specialists analyze visual patterns of reading and provide
insights into understanding differences between readers with
and without dyslexia. Their measurements included fixation
duration in each/ all ROI, mean fixation duration in each/
all ROI, total fixation count for each/ all ROI and backward
saccades. Unlike the above ML-based approaches, Smyrnakis
et al. [1] developed statistical Bayesian classifiers, using vari-
ous thresolds and taking into consideration binary correlations.
They focused on small age span, critical for dyslexia diagnosis.
The size and font of the two texts used was standardized
so as to achieve maximal classification accuracy, unlike in
[2]. The parameters used for classification involved not only
direct eye-tracking parameters, but also relations between eye-
tracking parameters and word properties in the texts read.
These parameters extend the parameter set used in [3]. In-
cluding this set of parameters, it is possible to evaluate word
anticipation, which is often problematic in dyslexics [19].
Our work employs the same dataset as in [1]. However,
DysLexML applies and evaluates various ML classifiers. As
mentioned, the classifier with the best accuracy on noise-free
data is the linear SVM classifier on features selected by the
LASSO regression at λ1SE . Furthermore, it exhibits a robust
performance under fixation position noise (added artificially).
Its robustness and ability to perform dimensionality reduction
are the two innovative aspects of this work.
VI. CONCLUSION
Feature selection, here via LASSO with λof 1 standard er-
ror, enabled dimensionality reduction, without compromising
the accuracy.
The mean and median saccade length, the number of short
forward movements, and the number of multiply fixated words
are the four features with the most prominent predictive power
for the baseline text, while for the easier text only the mean
saccade length and the number of short forward movements
were selected. The text difficulty does play an important role in
the diagnosis: Easy, less challenging, text, reduces the power
of the word-specific features, as they do not appear in the
dominant feature set. The text choice has to be relevant to the
subjects age and so far acquired reading skills. The selected
features are easily interpreted and capture the prior knowledge
about eye movements of dyslexic children. To the best of our
knowledge, DysLexML uses the smallest feature set, compared
to the other related studies. We envision the development of
a system that can operate in a less controlled, larger-scale
environment (e.g., potentially in kindergartens or homes) with
commercial eye-trackers, reaching a larger population. As a
first step towards this objective, here we added synthetic
noise at the fixation positions and assessed its impact on the
accuracy. For noise levels smaller than σequal to 40 pixels
the performance of the system remains robust. Encouraged by
the robustness under noise, the team has performed a follow-
up larger-scale field study using inexpensive non-specialized
eye-trackers in a more diverse setting (in different countries
and under silent and out-loud reading). We aim to identify
the different classes of reading difficulties. This work sets the
basis for developing a screening tool that can reach a larger
more diverse population, in less controlled environments, for
early intervention and potentially larger social impact.
REFERENCES
[1] I. Smyrnakis, V. Andreadakis, V. Selimis, M. Kalaitzakis, T. Bachourou,
G. Kaloutsakis, G. D. Kymionis, S. Smirnakis, I. M. Aslanides, ”RADAR:
A novel fast-screening method for reading difficulties with special focus
on dyslexia.”, PLoS ONE, 2017.
[2] L. Rello and M. Ballesteros, Detecting readers with dyslexia using
machine learning with eye tracking measures.,ACM PW4A, 2015.
[3] M. Nilsson Benfatto, G. ¨
Oqvist Seimyr, J. Ygge, T. Pansell, A. Rydberg,
et al., ”Screening for dyslexia using eye tracking during reading.”,PLoS
ONE, 2016.
[4] A. Al-Edaily, A. Al-Wabil, Y. Al-Ohali, Yousef, ”Dyslexia Explorer: A
Screening System for Learning Difficulties in the Arabic Language Using
Eye Tracking”, Human Factors in Computing& Informatics, Springer,
2013.
[5] J. Hy ¨
on¨
a and R.K. Olson, ”Eye fixation patterns among dyslexic and
normal readers: effects of word length and word frequency.”, Journal of
Experimental Psychology: Learning, Memory, &Cognition, Vol. 21(6),
1995.
[6] T. Høien, I. Lundberg, ”Dyslexia: From Theory to Intervention”, Part of
the Neuropsychology & Cognition book series, Vol. 18, Springer, 2000.
[7] M. De Luca, E. Di Pace, A. Judica, D. Spinelli, P. Zoccolotti, ”Eye
movement patterns in linguistic & non-linguistic tasks in developmental
surface dyslexia”, Neurophysiologia, Vol. 37, 1999.
[8] R. Tibshirani(1996), ”Regression Shrinkage and Selection via the Lasso”,
Journal of the Royal Statistical Society. Series B, Vol. 58, No.1, 1996.
[9] V. Fonti and E. N. Belitser, ”Paper in Business Analytics Feature Selection
using LASSO”, VU Amsterdam, 2017.
[10] R. Muthukrishnan and R. Rohini, ”LASSO: A Feature Selection Tech-
nique In Predictive Modeling For Machine Learning”, IEEE Int’ Conf.
on Advances in Computer Applications, 2016.
[11] C. Hulme and MJ. Snowling, ”Reading disorders and dyslexia.”, Current
Opinion in Pediatrics, 28(6), pp. 731–735, 2016.
[12] S. Bellocchi, M. Muneaux, M. Bastien-Toniazzo, S. Ducrot, ”I can read
it in your eyes: What eye movements tell us about visuo-attentional
processes in developmental dyslexia”, Research in Developmental Dis-
abilities, Vol. 34, Iss. 1, 2013.
[13] G.F. Eden, H.M. Wood, F.B. Wood, ”Differences in eye movements &
reading problems in dyslexic & normal children”, Vision Research, Vol.
34, Iss. 10, 1994.
[14] F.J. Martos and J. Villa, ”Differences in eye movements control among
dyslexic, retarded & normal readers in the Spanish population.”, Reading
& Writing, 2(2), 1990.
[15] K. Rayner, ”Eye movements in reading & information processing.”,
Psychological Bulletin 85, 1978.
[16] K. Rayner, ”Eye movements in reading & information processing: 20
years of research.”, Psychological Bulletin 1, 1998.
[17] A. Casale, ”Identifying Dyslexic Students: The need for computer-based
dyslexia screening in higher education”,Estro: Essex Student Research
Online, Vol1(1), 2010.
[18] RD. Elterman, LA. Abel, RB. Daroff, LF. Dell’Osso, JL. Bornstein, ”Eye
movement patterns in dyslexic children.”, Journal of learning disabilities,
Vol. 13, 1980.
[19] F. Huettig and S. Brouwer, ”Delayed anticipatory spoken language
processing in adults with dyslexia–evidence from eyetracking”, Dyslexia,
Vol. 21, 2015.
[20] D. Salvucci, J. Goldberg, ”Identifying fixations and saccades in eye-
tracking protocols.”, Symposium on Eye tracking research & applications,
2000.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Dyslexia is a developmental learning disorder of single word reading accuracy and/or fluency, with compelling research directed towards understanding the contributions of the visual system. While dyslexia is not an oculomotor disease, readers with dyslexia have shown different eye movements than typically developing students during text reading. Readers with dyslexia exhibit longer and more frequent fixations, shorter saccade lengths, more backward refixations than typical readers. Furthermore, readers with dyslexia are known to have difficulty in reading long words, lower skipping rate of short words, and high gaze duration on many words. It is an open question whether it is possible to harness these distinctive oculomotor scanning patterns observed during reading in order to develop a screening tool that can reliably identify struggling readers, who may be candidates for dyslexia. Here, we introduce a novel, fast, objective, non-invasive method, named Rapid Assessment of Difficulties and Abnormalities in Reading (RADAR) that screens for features associated with the aberrant visual scanning of reading text seen in dyslexia. Eye tracking parameter measurements that are stable under retest and have high discriminative power, as indicated by their ROC curves, were obtained during silent text reading. These parameters were combined to derive a total reading score (TRS) that can reliably separate readers with dyslexia from typical readers. We tested TRS in a group of school-age children ranging from 8.5 to 12.5 years of age. TRS achieved 94.2% correct classification of children tested. Specifically, 35 out of 37 control (specificity 94.6%) and 30 out of 32 readers with dyslexia (sensitivity 93.8%) were classified correctly using RADAR, under a circular validation condition where the individual evaluated was not included in the test construction group.
Article
Full-text available
Dyslexia is a neurodevelopmental reading disability estimated to affect 5–10% of the population. While there is yet no full understanding of the cause of dyslexia, or agreement on its precise definition, it is certain that many individuals suffer persistent problems in learning to read for no apparent reason. Although it is generally agreed that early intervention is the best form of support for children with dyslexia, there is still a lack of efficient and objective means to help identify those at risk during the early years of school. Here we show that it is possible to identify 9–10 year old individuals at risk of persistent reading difficulties by using eye tracking during reading to probe the processes that underlie reading ability. In contrast to current screening methods, which rely on oral or written tests, eye tracking does not depend on the subject to produce some overt verbal response and thus provides a natural means to objectively assess the reading process as it unfolds in real-time. Our study is based on a sample of 97 high-risk subjects with early identified word decoding difficulties and a control group of 88 low-risk subjects. These subjects were selected from a larger population of 2165 school children attending second grade. Using predictive modeling and statistical resampling techniques, we develop classification models from eye tracking records less than one minute in duration and show that the models are able to differentiate high-risk subjects from low-risk subjects with high accuracy. Although dyslexia is fundamentally a language-based learning disability, our results suggest that eye movements in reading can be highly predictive of individual reading ability and that eye tracking can be an efficient means to identify children at risk of long-term reading difficulties.
Article
Full-text available
Purpose of review: We review current knowledge about the nature of reading development and disorders, distinguishing between the processes involved in learning to decode print, and the processes involved in reading comprehension. Recent findings: Children with decoding difficulties/dyslexia experience deficits in phoneme awareness, letter-sound knowledge and rapid automatized naming in the preschool years and beyond. These phonological/language difficulties appear to be proximal causes of the problems in learning to decode print in dyslexia. We review data from a prospective study of children at high risk of dyslexia to show that being at family risk of dyslexia is a primary risk factor for poor reading and children with persistent language difficulties at school entry are more likely to develop reading problems. Early oral language difficulties are strong predictors of later difficulties in reading comprehension. Summary: There are two distinct forms of reading disorder in children: dyslexia (a difficulty in learning to translate print into speech) and reading comprehension impairment. Both forms of reading problem appear to be predominantly caused by deficits in underlying oral language skills. Implications for screening and for the delivery of robust interventions for language and reading are discussed.
Article
Full-text available
Most studies today agree about the link between visual-attention and oculomotor control during reading: attention seems to affect saccadic programming, that is, the position where the eyes land in a word. Moreover, recent studies show that visuo-attentional processes are strictly linked to normal and impaired reading. In particular, a large body of research has found evidence of defective visuo-attentional processes in dyslexics. What do eye movements tell us about visuo-attentional deficits in developmental dyslexia? The purpose of this paper is to explore the link between oculomotor control and dyslexia, taking into account its heterogeneous manifestation and comorbidity. Clinical perspectives in the use of the eye-movements approach to better explore and understand reading impairments are discussed.
Conference Paper
Feature selection is one of the techniques in machine learning for selecting a subset of relevant features namely variables for the construction of models. The feature selection technique aims at removing the redundant or irrelevant features or features which are strongly correlated in the data without much loss of information. It is broadly used for making the model much easier to interpret and increase generalization by reducing the variance. Regression analysis plays a vital role in statistical modeling and in turn for performing machine learning tasks. The traditional procedures such as Ordinary Least Squares (OLS) regression, Stepwise regression and partial least squares regression are very sensitive to random errors. Many alternatives have been established in the literature during the past few decades such as Ridge regression and LASSO and its variants. This paper explores the features of the popular regression methods, OLS regression, ridge regression and the LASSO regression. The performance of these procedures has been studied in terms of model fitting and prediction accuracy using real data and simulated environment with the help of R package.
Conference Paper
Worldwide, around 10% of the population has dyslexia, a specific learning disorder. Most of previous eye tracking experiments with people with and without dyslexia have found differences between populations suggesting that eye movements reflect the difficulties of individuals with dyslexia. In this paper, we present the first statistical model to predict readers with and without dyslexia using eye tracking measures. The model is trained and evaluated in a 10-fold cross experiment with a dataset composed of 1,135 readings of people with and without dyslexia that were recorded with an eye tracker. Our model, based on a Support Vector Machine binary classifier, reaches 80.18% accuracy using the most informative features. To the best of our knowledge, this is the first time that eye tracking measures are used to predict automatically readers with dyslexia using machine learning.
Article
It is now well established that anticipation of upcoming input is a key characteristic of spoken language comprehension. It has also frequently been observed that literacy influences spoken language processing. Here, we investigated whether anticipatory spoken language processing is related to individuals' word reading abilities. Dutch adults with dyslexia and a control group participated in two eye-tracking experiments. Experiment 1 was conducted to assess whether adults with dyslexia show the typical language-mediated eye gaze patterns. Eye movements of both adults with and without dyslexia closely replicated earlier research: spoken language is used to direct attention to relevant objects in the environment in a closely time-locked manner. In Experiment 2, participants received instructions (e.g., 'Kijk naar deCOM afgebeelde pianoCOM ', look at the displayed piano) while viewing four objects. Articles (Dutch 'het' or 'de') were gender marked such that the article agreed in gender only with the target, and thus, participants could use gender information from the article to predict the target object. The adults with dyslexia anticipated the target objects but much later than the controls. Moreover, participants' word reading scores correlated positively with their anticipatory eye movements. We conclude by discussing the mechanisms by which reading abilities may influence predictive language processing. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Article
Dyslexic university students can only be provided with support if their disability is identified. However, diagnosis is expensive and time consuming. Quality screening tools, which are generally short and easy to administer, provide robust indications of whether or not a person is likely to be dyslexic. Administering free screening to all students would allow those at risk to be identified and diagnostic testing to be provided in a cost-effective, targeted manner. However, HE students differ significantly from the general adult population: dyslexic students are highly intelligent and most have developed advanced compensatory strategies that effectively mask their disability on screening tests developed for use in the general adult population. Moreover, for a screening test to be made freely available to all students, it must be delivered in a computer-based format. Existing instruments have insufficient discriminatory power for the HE population, or are unsuitable for delivery to all students, which is only possible (due to resource implications) with a computer-based test. There is a pressing need for a test specifically targeted at students, which can be used for widespread, cost-effective dyslexia screening Current Context UK Higher Education Statistics Agency figures reveal that around 5.5% of university students are disabled; dyslexia represents around 40% of this subset (i.e. 2.2% of all students are dyslexic). Compared with a 4% estimated incidence in the general population (DSM-IV) 1 , dyslexics 2 appear to be under-represented in UK universities 3 . From September 2002, the Disability Discrimination Act requires that UK educational institutions make reasonable adjustments to allow students with disabilities to study without disadvantage compared to non-disabled students. These provisions are vital in allowing current dyslexic students to succeed, while encouraging other dyslexics to consider higher education (HE). In order for universities to 1 However, the British Dyslexia Association (2006) states that approximately 10% of the UK population is dyslexic, of which 4% are severely affected (The Dyslexia Handbook 2006). As discussed in the second section of this paper, different definitions of dyslexia give rise to different estimates of incidence in the general population and also in specific sub-groups, such as students. 2 A recent discussion by a forum for professionals in dyslexia research and support, indicated that the use of the term 'dyslexic' to describe an individual with dyslexia was often preferred by dyslexics themselves. Therefore, the terms 'dyslexics' and 'dyslexic students' will be used interchangeably. 3 Although the article refers primarily to statistics and procedures in the UK, the need for a computer-based screening tool targeted at university students exists in many other countries.