Identification of first-grade students at risk of developing
mathematical difficulties through online measures in arithmetic and
pattern tasks: A study using error rates and response times
Lukas Baumanns1, Demetra Pitta-Pantazi2, Eleni Demosthenous2, Constantinos Christou2,
Achim J. Lilienthal3 and Maike Schindler1
1University of Cologne, Germany; firstname.lastname@example.org, email@example.com
2University of Cyprus, Cyprus; firstname.lastname@example.org, email@example.com,
3Örebro University, Sweden; firstname.lastname@example.org
For researchers and practitioners, it is important to identify students at risk of developing
mathematical difficulties. The aim of this pilot study was to investigate whether it is possible to
identify first-grade students who are at risk of developing mathematical difficulties (RMD) through
online measures in arithmetic and pattern tasks. In our study, 54 first-grade students worked on 75
tasks in twelve sets on a computer screen. We also carried out a standardized mathematics test to
identify students as RMD. We then investigated if error rates and response times as online measures
allow to replicate the identification of students as RMD. Using a logistic regression model, we found
that the error rates and response times allow identifying students as RMD with acceptable accuracy.
We also found that tasks on symbolic number comparison, completing color patterns, and
enumeration of small sets were particularly informative to identify students as RMD.
Keywords: mathematical difficulties, early identification, digital tools.
Mathematical difficulties often begin early, before primary education and can—due to insufficient
support—cascade into severe mathematical problems (Geary, 2013; Moser Opitz, 2013; Sasanguie
et al., 2012). Longitudinal studies confirm that students who enter school with mathematical
difficulties do generally not overcome these during primary school (e.g., Viesel-Nordmeyer et al.,
2019). It is significant for teachers to be aware of such difficulties, be able to identify them, and to
provide adequate support for students. Yet, identifying student difficulties at an early age is
challenging—among other reasons, because young students often lack the ability to report their
difficulties (Klein et al., 2010). There are numerous tests for identifying students at risk of developing
mathematical difficulties at an early age (Hellstrand et al., 2020). These tests, some of which are
conducted individually, require a considerable amount of time for both conducting the tests and
evaluating results. Also, they are often not feasible to use by practitioners at school. A digital
screening offers the possibility to identify, also in practice, a large number of children who are at risk
to develop mathematical difficulties. Digital tools are particularly suitable as both the collection of
data and its evaluation can be conducted on the same digital device. The Erasmus+ project DIDUNAS
builds on this idea. It aims to develop an app, which incorporates tasks in the domain of early
arithmetic and pattern tasks, that identifies first-grade students in need of support. For this
identification, the app uses data such as error rates and response times.
This pilot study aims to investigate whether it is possible to identify first graders who are at risk of
developing mathematical difficulties through online measures (response times and error rates) of
students’ work on tasks in the domain of early arithmetic and pattern tasks.
Early identification of students with mathematical difficulties
A variety of reliable and valid standardized tests are commonly used in identifying mathematics
difficulties. To develop these tests, researchers focus on identifying variables that appear to be good
predictors. Some tests assess speed and accuracy with which students can identify object sets (e.g.,
Geary et al., 2009). Early numeracy skills, such as quantity comparison, number identification, and
counting, appear to have much predictive power for mathematical difficulties (Gersten et al., 2005).
Most of these tests include tasks on: Object counting, number comparison, sequencing, connecting
numbers to quantities, number recognition, and counting back and forth. Fewer tests include number
calculations such as additions and subtractions with symbols or word problems. Even fewer tests
include patterning tasks such as copying or extending a pattern or identifying the pattern unit.
However, a growing number of researchers (e.g., Verschaffel, et al., 2017) argue that that for
investigating mathematical abilities of young students, patterning tasks also need to be considered.
Students’ number sense and awareness of structure and patterns contribute substantially to students’
later success in mathematics (Wijns et al., 2019). Pittalis et al. (2018) in a longitudinal study with
first-grade students suggested that that the growth rate of algebraic arithmetic has a direct effect on
the growth rate of conventional arithmetic, and subsequently the growth rate of conventional
arithmetic predicts the growth rate of elementary arithmetic.
Error rates and response times as measures to identify mathematics difficulties
Several studies investigated differences in error rates and response times of students with and without
mathematical difficulties. Some of these studies addressed students’ enumeration ability and
indicated that students with mathematical difficulties have longer response times within the subitizing
range as compared to students without such difficulties (Moeller et al., 2009; Schindler et al., 2020;
Schleifer & Landerl, 2011). Other studies addressed students’ comparison of symbolic and non-
symbolic numbers and found no significant differences in error rates and response times between
children with or without mathematical difficulties (Mussolin et al., 2010). However, the slope of
response times was significantly steeper for students with mathematical difficulties. Other studies
found that response times and error rates for non-symbolic number line estimation are significant
predictors of mathematical achievements (Sasanguie et al., 2012). These studies indicate that error
rates and response times for suitable tasks can be a predictor of mathematical difficulties. Thus, we
intend to investigate: To what extent can error rates and response times of early arithmetic and
pattern tasks be used to identify students that may be at risk of developing mathematical difficulties?
The study was conducted with 54 first-grade students (age: M = 7.38; SD = 0.55) from two primary
schools in Germany. In the German federal state that the study took place in, a social index classifies
schools into levels from 1 to 9 which is based on factors such as child and youth poverty, family
language, and special educational needs of students. Index 1 represents the most favorable conditions.
The two participating schools had an index of 7 and 6 which means they tended to have a higher
number of students in need of support. Of the students, 34.6% had German as their mother tongue.
Procedure and tasks
For the study we used two tests: (1) the standardized ZAREKI-K test to identify students at risk of
mathematical difficulties and (2) a self-developed computer screen test with twelve sets of arithmetic
and pattern tasks (75 tasks in total).
Standardized mathematics test: ZAREKI-K is a standardized test for identifying children at the
transition from kindergarten to primary school of being at risk of developing mathematical difficulties
(von Aster et al., 2009). The test battery is constructed as an individual procedure and consists of 18
subtests. For the present study, an adaptation of ZAREKI-K was used, which requires only six
subtests: (a) Counting up to 30, (b) Numbers that precede or follow, (c) Word problems, (d) Visual
calculation, (e) Number conservation, and (f) Writing numbers. This adaptation has been shown to
yield excellent prediction rates for identifying students at risk of developing mathematical difficulties
(Walter, 2020). The students took an average of 14.6 minutes to complete the ZAREKI-K.
Early arithmetic and pattern tasks: The students worked on twelve sets presented on a computer
screen (Fig. 1). Every set had an example task to get the students acquainted with it. For sets (1), (4),
and (5), students were to determine the number of objects. For set (2), students were asked to
determine a number on a number line. Set (3) asked for the number behind the sun. Set (6) asked how
many dots needed to be added or subtracted to make it equal to the number shown on the right. In set
(7), the largest number was to be determined. In set (8), the number of persons’ legs behind the wall
was to be determined. In set (9), students were to determine the number of bricks of the tower behind
the white blob. In set (10), the result of an addition/subtraction problem was to be determined. In set
(11), the students were to compare quantities. For set (12), a color pattern was to be completed.
Students could skip a task if they found it difficult by saying “next”. There are no identical tasks
between these sets and ZAREKI-K. However, both include cardinal and ordinal aspects of numbers.
Additionally, set (12) is a pattern task from early algebra, which is not included in ZAREKI-K.
Students answered by tapping on the computer screen. The answers to each task were given with a
single tap on the screen. For sets (1)–(10), a number field with the buttons labeled with 1 to 20 was
shown at the bottom of the screen. Set (11) had a yellow, a blue, and an equal (“=”) button for
answering. Set (12) had a yellow, a blue, and a red button for answering. It took students an average
of 20.8 minutes to complete these tasks (including all instructions, explanations, and trial tasks).
Figure 1: Example tasks of the twelve sets
(1) Enumeration (2) Number line (3) Sun
(4) 10-ﬁeld (5) Objects
(7) Biggest number (9) Towers (10) Calculations(8) Hidden legs
(11) Quantity comparision
(12) Dot patterns
We use the following data sets:
(1) Identification of mathematical difficulties at risk: ZAREKI-K identifies whether a student
is at risk (RMD) or not at risk (¬RMD) of developing mathematical difficulties. We used an
Excel spreadsheet provided by Walter (2020) for entering the students’ individual points
achieved in each subtest, which then calculated the risk of students to develop mathematical
difficulties. Of the 54 students, ZAREKI-K identified 18 as RMD and 36 as ¬RMD.
(2.a) Error rates: Mean error rates were calculated for all 75 tasks in total as well as each of the
twelve sets separately. We considered tasks, where the students answered wrongly or did not
answer at all, as being not solved correctly. We considered all tasks, which were not being
solved correctly, as error.
(2.b) Response times: For each task, the time from when the stimulus was first shown to when the
student typed the response on the computer screen was measured. Only response times of
correctly answered tasks were taken into account, since tasks that were not understood by the
students sometimes were quickly skipped and since students partially rashly guessed wrong
answers. Mean response times were calculated over all tasks that were answered correctly as
well as for each of the twelve sets separately.
We followed the guidelines of logistic regression analysis and reporting by Peng et al. (2002).
Logistic regression was performed using SPSS 27 in order to calculate a probability value between 0
and 1 for each student using mean error rate and mean response time over all tasks. For different cut-
off values p between 0 and 1, students are identified as RMD or ¬RMD. A ROC (receiver operating
characteristic) curve was then plotted, which indicates the sensitivity (true positive rate) and
specificity (true negative rate) for all cut-off values p as an indicator of the overall classification
accuracy. The area under this curve (AUC) is a measure of the classification quality.
Next, we ask which of the twelve mean error rates or the twelve mean response times are most
informative for identifying students as RMD, according to ZAREKI-K. We thus carried out a
backwards selection, subsequently for both mean error rate and mean response time for all twelve
task sets in early arithmetic and patterns using our multiple-logistic regression model.
We conducted a t-test to compare mean differences of error rates and response times between students
identified as RMD and students identified as ¬RMD. Using the Shapiro-Wilk test, the normal
distribution of mean error rates (W(54) = .974, p > .05) and mean response times (W(54) = .967,
p > .05) was checked. Using the Levene test, the homogeneity of variances of mean error rates
(p > .05) and mean response times (p > .05) were checked. Thus, variance homogeneity exists
between the groups. With regard to the mean error rate, the 18 students identified as RMD had a
significantly higher mean error rate (M = .29, SD = .12) as compared to the 36 students identified as
¬RMD (M = .19, SD = .09; t(52) = –3.48, p < .05). With an effect size of r = .43, this is a medium
effect. With regard to the mean response time, the 18 students identified as RMD did not have a
significantly higher mean response time (M = 7.04s, SD = 1.63s) as compared to the 36 students
identified as ¬RMD (M = 6.65s, SD = 1.29s; t(52) = –.957, p = .34, r = .13).
The Likelihood ratio test indicates that the logistic regression model is significantly more effective
than the null model (constant only) (χ²(2) = 11.67, p < .05). Goodness-of-fit was assessed using the
Hosmer-Lemeshow test, indicating a fit of the logistic model (χ²(8) = 5.15, p > .05). Wald test
indicates that mean error rate of the 75 tasks is a significant classifier of RMD (χ²(1) = 8.53, p < .05).
The mean response time is not a significant classifier in this regard (χ²(1) = .987, p > .05).
The logistic model calculates a probability value between 0 and 1 for each student based on the error
rates and response times. The cut-off value p then defines at which probability value a student is
identified to have RMD or ¬RMD based on the logistic regression model. Choosing cut-off values of
p thus means to trade off sensitivity (true positive rate) and specificity (true negative rate) as they
change diametrically. Table 1 shows the sensitivity and specificity for different cut-off values p. The
total accuracy is computed as the number of all correctly identified results in relation to all results.
Table 1: Sensitivity, specificity, and total accuracy of the model for different cut-off values p
Cut-off value p
Total accuracy (%)
For the identification of students at risk of developing mathematical difficulties at an early age, a high
sensitivity is often desirable even at the expense of a decreased specificity. A high sensitivity would
ensure that only a few students with mathematical difficulties at risk are missed. However, this has
the consequence that the specificity decreases and students with mathematical difficulties at risk are
not detected. For the cut-off value p = .307, a reasonably high sensitivity of 72.2% is achieved at a
still high specificity of 72.2%. Table 2 displays the classification of the students identified to be RMD
and to be ¬RMD through the participants’ error rates and response times compared to the students
identified as RMD and ¬RMD from the standardized ZAREKI-K for cut-off value p = .307.
Table 2: Classification tablea
acut-off value p = .307
The ROC curve (see Figure 2) is the generalization of a single classification table (see Table 2). Each
point of the ROC curve indicates sensitivity and (1–specificity) for a given cut-off value p. The drawn
diagonal would be expected if the classification was purely random. A measure of the classification
quality of the model is the area under the ROC curve (AUC). Following Hosmer et al. (2013), the
classification accuracy can be considered “acceptable” with an AUC = .761.
Figure 2: ROC curve of the general model (left; AUC = .761; ellipse marks cut-off value p = .307)
and the reduced model (right; AUC = .841; ellipse marks cut-off value p = .351)
To identify those sets whose error rates and/or response times are particularly good for the
identification of students at the risk of developing mathematical difficulties, logistic regression was
performed through backwards selection. This backwards selection is first done with all twelve mean
error rates of the sets. Step by step, the twelve mean error rates are removed from the model, starting
with the one that has the lowest significance for predicting the ZAREKI-K outcome. All variables
that are significant to replicate the classification based on the ZAREKI-K outcome at the p < .1 level
remain included according to the Wald test. At the same time, the Likelihood ratio statistic is used to
check whether the model would improve by adding another variable. After eleven steps, the mean
error rates of sets 7 (symbolic number comparison) and 12 (completing color patterns) (see Figure 1)
remained. For the response times, the mean values of set 9 (completing growing number patterns)
could not be included, since the number of incorrect answers were too high for which response times
were not considered. After eleven steps, only the response time of set 1 (enumeration of small sets)
remained. Applying logistics regression onto these three variables identified through backwards
selection, Likelihood ratio test indicates that the logistic regression model is significantly more
effective than the null model (constant only) (χ²(3) = 22.99, p < .05). Goodness-of-fit was assessed
using the Hosmer-Lemeshow test, indicating a good model fit; χ²(8) = 3.58, p > .89. Furthermore,
Wald test indicates that the mean error rate of set (12) (χ²(1) = 6.88, p < .05) and the mean response
time of set (1) (χ²(1) = 6.91, p < .05) are significant classifiers of developing mathematical
difficulties. The mean error rate of set (7) is not a significant classifier in this regard
(χ²(1) = 3.256, p > .05), but since p < .1 it remained in the model. Following Hosmer et al. (2013),
the classification accuracy can be considered “excellent” with AUC = .841. With a cut-off value of p
= .351, this model has a higher specificity of 80.6% compared to the previous model at a sensitivity
of 72.2%. The total accuracy of this model is 77.8%.
The results of our study should be viewed and interpreted against the backdrop of the following
limitations: Logistic regression requires sufficiently many training samples, i.e., RMD and ¬RMD
cases. Having only 18 students identified as RMD and 36 students identified as ¬RMD limits the
certainty of the learned logistics regression model. A larger sample could provide further certainty.
In addition, we optimized the classification threshold for the logistic regression model on a single
data set and did not evaluate the classification accuracy on an independent test set. In practice, the
classification threshold needs to be learned on a training set, which would likely decrease the
classification accuracy on independent data.
This pilot study addressed the question to what extent error rates and response times of correctly
solved tasks as online measures in early arithmetic and pattern tasks can identify students that may
be at risk of developing mathematical difficulties (RMD). Using logistics regression, we found that
the mean error rate across all 75 tasks is a strong classifier of RMD, whereas the mean response time
was a weaker classifier. Combining error rates and response times in our study yielded an acceptable
discrimination of the model of AUC = .761. Furthermore, we investigated to what extent the error
rates and response times of twelve sets can be used separately to identify students’ RMD. We found
that the error rates of two sets (symbolic number comparison, completing color patterns) and the
response time of one set (enumeration of small sets) appeared to be particularly informative. The
influence of the set about complementing color patterns is especially noteworthy since the
standardized mathematics test focused on early arithmetic, not patterns. Combining error rates of
symbolic number comparison and completing color patterns with response times of small set
enumeration yielded a discrimination of the model of AUC = .841.
Since the DIDUNAS project aims to develop an app that identifies first-grade students in need of
support, what can we learn from these results regarding the app development? In our pilot study, we
found that it is possible to use online measurements of certain tasks or a set of them (here: symbolic
number comparison, completing color patterns, enumeration of small sets) to achieve a reasonably
reliable identification of students at risk of developing mathematical difficulties in grade 1. This is
promising for future developments. In the future, we will build on these results for developing the
DIDUNAS app, which then can be used by teachers around the world to help identify students with
risks of mathematical difficulties early on. As the app will require less effort in the conduction and
evaluation, compared with some standardized mathematics tests, the app will enable a resource-
saving use for teachers. The study presented in this paper is an important first step in that direction.
This project has received funding by the Erasmus+ grant program of the European Union under grant
agreement No 2020-1-DE03-KA201-077597.
Geary, D. C. (2013). Early foundations for mathematics learning and their relations to learning
disabilities. Current Directions in Psychological Science, 22, 23–27.
Geary, D. C., Bailey, D. H., & Hoard, M. K. (2009). Predicting mathematical achievement and
mathematical learning disability with a simple screening tool. Journal of Psychoeducational
Assessment, 27(3), 265–279. https://doi.org/10.1177/0734282908330592.
Gersten, R., Jordan, N. C., & Flojo, J. R. (2005). Early identification and interventions for students
with mathematics difficulties. Journal of Learning Disabilities, 38(4), 293–304.
Hosmer, D. W., Lemeshow, S., & Sturdivan, R. (2013). Applied Logistic Regression (3rd ed.). Wiley.
Hellstrand, H., Korhonen, J., Räsänen, P., Linnanmäki, K., & Aunio, P. (2020). Reliability and
validity evidence of the early numeracy test for identifying children at risk for mathematical
learning difficulties. International Journal of Educational Research, 102, 101580.
Klein, P., Adi-Japha, E., & Hakak-Benizri, S. (2010). Mathematical thinking of kindergarten boys
and girls. Educational Studies in Mathematics, 73, 233–246. https://doi.org/10.1007/s10649-009-
Moser Opitz, E. (2013). Rechenschwäche/Dyskalkulie. Theoretische Klärungen und empirische
Studien an betroffenen Schülerinnen und Schülern (2nd ed.). Haupt.
Moeller, K., Neuburger, S., Kaufmann, L., Landerl, K., & Nuerk, H.-C. (2009). Basic number
processing deficits in developmental dyscalculia: Evidence from eye tracking. Cognitive
Development, 24(4), 371–386. https://doi.org/10.1016/j.cogdev.2009.09.007.
Mussolin, C., Mejias, S., & Noël, M.-P. (2010). Symbolic and nonsymbolic number comparison in
children with and without dyscalculia. Cognition, 115, 10–25.
Peng, C.-Y. J., Lee, K., & Ingersoll, G. M. (2002). Introduction to logistic regression analysis and
reporting. The Journal of Educational Research, 96(1), 3–14.
Pittalis, M., Pitta-Pantazi, D., & Christou, C. (2018). A longitudinal study revisiting the notion of
early number sense: algrebraic arithmetic a catalyst for number sense development. Mathematical
Thinking and Learning, 20(3), 222–247. https://doi.org/10.1080/10986065.2018.1474533.
Sasanguie, D., Van den Bussche, E., & Reynvoet, B. (2012). Predictors for mathematics
achievement? Evidence from a longitudinal study. Mind, Brain, and Education, 6(3), 119–128.
Schindler, M., Schovenberg, V., & Schabmann, A. (2020). Enumeration processes of children with
mathematical difficulties: An explorative eye-tracking study on subitizing, groupitizing, counting,
and pattern recognition. Learning Disabilities: A Contemporary Journal, 18(2), 192–211.
Schleifer, P., & Landerl, K. (2011). Subitizing and counting in typical and atypical development.
Developmental Science, 14(2), 280–291. https://doi.org/10.1111/j.1467-7687.2010.00976.x.
Verschaffel, L., Torbeyns, J., & De Smedt, B. (2017). Young childen’s early mathematical
competencies: Analysis and stimulation. In T. Dooley, & G. Gueudet (Eds.) Proceedings of the
Tenth Congress of the European Society for Research in Mathematics Education. ERME.
Viesel-Nordmeyer, N., Schurig, M., Bos, W., & Ritterfeld, U. (2019). Effects of pre-school
mathematical disparities on the development of mathematical and verbal skills in primary
school children. Learning Disabilities, 17, 149–164.
von Aster, M. G., Bzufka, M. W., & Horn, R. (2009). ZAREKI-K. Neuropsychologische Testbatterie
für Zahlenverarbeitung und Rechnen bei Kindern – Kindergartenversion. Pearson.
Walter, J. (2020). Ein Screening-Verfahren zur Prognose von Rechenschwierigkeiten in der
Grundschule. Zeitschrift für Heilpädagogik, 71, 238–253.
Wijns, N., Torbeyns, J., Bakker, M., De Smedt, B., & Verschaffel, L. (2019). Four-year olds’
understanding of repeating and growing patterns and its association with early numerical ability.
Early Childhood Research Quarterly, 49, 152–163. https://doi.org/10.1016/j.ecresq.2019.06.004.