Content uploaded by Lukas Baumanns

Author content

All content in this area was uploaded by Lukas Baumanns on Sep 19, 2022

Content may be subject to copyright.

Identification of first-grade students at risk of developing

mathematical difficulties through online measures in arithmetic and

pattern tasks: A study using error rates and response times

Lukas Baumanns1, Demetra Pitta-Pantazi2, Eleni Demosthenous2, Constantinos Christou2,

Achim J. Lilienthal3 and Maike Schindler1

1University of Cologne, Germany; lukas.baumanns@uni-koeln.de, maike.schindler@uni-koeln.de

2University of Cyprus, Cyprus; dpitta@ucy.ac.cy, demosthenous.eleni@ucy.ac.cy,

edchrist@ucy.ac.cy

3Örebro University, Sweden; achim.lilienthal@oru.se

For researchers and practitioners, it is important to identify students at risk of developing

mathematical difficulties. The aim of this pilot study was to investigate whether it is possible to

identify first-grade students who are at risk of developing mathematical difficulties (RMD) through

online measures in arithmetic and pattern tasks. In our study, 54 first-grade students worked on 75

tasks in twelve sets on a computer screen. We also carried out a standardized mathematics test to

identify students as RMD. We then investigated if error rates and response times as online measures

allow to replicate the identification of students as RMD. Using a logistic regression model, we found

that the error rates and response times allow identifying students as RMD with acceptable accuracy.

We also found that tasks on symbolic number comparison, completing color patterns, and

enumeration of small sets were particularly informative to identify students as RMD.

Keywords: mathematical difficulties, early identification, digital tools.

Introduction

Mathematical difficulties often begin early, before primary education and can—due to insufficient

support—cascade into severe mathematical problems (Geary, 2013; Moser Opitz, 2013; Sasanguie

et al., 2012). Longitudinal studies confirm that students who enter school with mathematical

difficulties do generally not overcome these during primary school (e.g., Viesel-Nordmeyer et al.,

2019). It is significant for teachers to be aware of such difficulties, be able to identify them, and to

provide adequate support for students. Yet, identifying student difficulties at an early age is

challenging—among other reasons, because young students often lack the ability to report their

difficulties (Klein et al., 2010). There are numerous tests for identifying students at risk of developing

mathematical difficulties at an early age (Hellstrand et al., 2020). These tests, some of which are

conducted individually, require a considerable amount of time for both conducting the tests and

evaluating results. Also, they are often not feasible to use by practitioners at school. A digital

screening offers the possibility to identify, also in practice, a large number of children who are at risk

to develop mathematical difficulties. Digital tools are particularly suitable as both the collection of

data and its evaluation can be conducted on the same digital device. The Erasmus+ project DIDUNAS

builds on this idea. It aims to develop an app, which incorporates tasks in the domain of early

arithmetic and pattern tasks, that identifies first-grade students in need of support. For this

identification, the app uses data such as error rates and response times.

This pilot study aims to investigate whether it is possible to identify first graders who are at risk of

developing mathematical difficulties through online measures (response times and error rates) of

students’ work on tasks in the domain of early arithmetic and pattern tasks.

Related work

Early identification of students with mathematical difficulties

A variety of reliable and valid standardized tests are commonly used in identifying mathematics

difficulties. To develop these tests, researchers focus on identifying variables that appear to be good

predictors. Some tests assess speed and accuracy with which students can identify object sets (e.g.,

Geary et al., 2009). Early numeracy skills, such as quantity comparison, number identification, and

counting, appear to have much predictive power for mathematical difficulties (Gersten et al., 2005).

Most of these tests include tasks on: Object counting, number comparison, sequencing, connecting

numbers to quantities, number recognition, and counting back and forth. Fewer tests include number

calculations such as additions and subtractions with symbols or word problems. Even fewer tests

include patterning tasks such as copying or extending a pattern or identifying the pattern unit.

However, a growing number of researchers (e.g., Verschaffel, et al., 2017) argue that that for

investigating mathematical abilities of young students, patterning tasks also need to be considered.

Students’ number sense and awareness of structure and patterns contribute substantially to students’

later success in mathematics (Wijns et al., 2019). Pittalis et al. (2018) in a longitudinal study with

first-grade students suggested that that the growth rate of algebraic arithmetic has a direct effect on

the growth rate of conventional arithmetic, and subsequently the growth rate of conventional

arithmetic predicts the growth rate of elementary arithmetic.

Error rates and response times as measures to identify mathematics difficulties

Several studies investigated differences in error rates and response times of students with and without

mathematical difficulties. Some of these studies addressed students’ enumeration ability and

indicated that students with mathematical difficulties have longer response times within the subitizing

range as compared to students without such difficulties (Moeller et al., 2009; Schindler et al., 2020;

Schleifer & Landerl, 2011). Other studies addressed students’ comparison of symbolic and non-

symbolic numbers and found no significant differences in error rates and response times between

children with or without mathematical difficulties (Mussolin et al., 2010). However, the slope of

response times was significantly steeper for students with mathematical difficulties. Other studies

found that response times and error rates for non-symbolic number line estimation are significant

predictors of mathematical achievements (Sasanguie et al., 2012). These studies indicate that error

rates and response times for suitable tasks can be a predictor of mathematical difficulties. Thus, we

intend to investigate: To what extent can error rates and response times of early arithmetic and

pattern tasks be used to identify students that may be at risk of developing mathematical difficulties?

Methods

Participants

The study was conducted with 54 first-grade students (age: M = 7.38; SD = 0.55) from two primary

schools in Germany. In the German federal state that the study took place in, a social index classifies

schools into levels from 1 to 9 which is based on factors such as child and youth poverty, family

language, and special educational needs of students. Index 1 represents the most favorable conditions.

The two participating schools had an index of 7 and 6 which means they tended to have a higher

number of students in need of support. Of the students, 34.6% had German as their mother tongue.

Procedure and tasks

For the study we used two tests: (1) the standardized ZAREKI-K test to identify students at risk of

mathematical difficulties and (2) a self-developed computer screen test with twelve sets of arithmetic

and pattern tasks (75 tasks in total).

Standardized mathematics test: ZAREKI-K is a standardized test for identifying children at the

transition from kindergarten to primary school of being at risk of developing mathematical difficulties

(von Aster et al., 2009). The test battery is constructed as an individual procedure and consists of 18

subtests. For the present study, an adaptation of ZAREKI-K was used, which requires only six

subtests: (a) Counting up to 30, (b) Numbers that precede or follow, (c) Word problems, (d) Visual

calculation, (e) Number conservation, and (f) Writing numbers. This adaptation has been shown to

yield excellent prediction rates for identifying students at risk of developing mathematical difficulties

(Walter, 2020). The students took an average of 14.6 minutes to complete the ZAREKI-K.

Early arithmetic and pattern tasks: The students worked on twelve sets presented on a computer

screen (Fig. 1). Every set had an example task to get the students acquainted with it. For sets (1), (4),

and (5), students were to determine the number of objects. For set (2), students were asked to

determine a number on a number line. Set (3) asked for the number behind the sun. Set (6) asked how

many dots needed to be added or subtracted to make it equal to the number shown on the right. In set

(7), the largest number was to be determined. In set (8), the number of persons’ legs behind the wall

was to be determined. In set (9), students were to determine the number of bricks of the tower behind

the white blob. In set (10), the result of an addition/subtraction problem was to be determined. In set

(11), the students were to compare quantities. For set (12), a color pattern was to be completed.

Students could skip a task if they found it difficult by saying “next”. There are no identical tasks

between these sets and ZAREKI-K. However, both include cardinal and ordinal aspects of numbers.

Additionally, set (12) is a pattern task from early algebra, which is not included in ZAREKI-K.

Students answered by tapping on the computer screen. The answers to each task were given with a

single tap on the screen. For sets (1)–(10), a number field with the buttons labeled with 1 to 20 was

shown at the bottom of the screen. Set (11) had a yellow, a blue, and an equal (“=”) button for

answering. Set (12) had a yellow, a blue, and a red button for answering. It took students an average

of 20.8 minutes to complete these tasks (including all instructions, explanations, and trial tasks).

Figure 1: Example tasks of the twelve sets

(1) Enumeration (2) Number line (3) Sun

(4) 10-ﬁeld (5) Objects

(6) Difference

(7) Biggest number (9) Towers (10) Calculations(8) Hidden legs

(11) Quantity comparision

(12) Dot patterns

Measures

We use the following data sets:

(1) Identification of mathematical difficulties at risk: ZAREKI-K identifies whether a student

is at risk (RMD) or not at risk (¬RMD) of developing mathematical difficulties. We used an

Excel spreadsheet provided by Walter (2020) for entering the students’ individual points

achieved in each subtest, which then calculated the risk of students to develop mathematical

difficulties. Of the 54 students, ZAREKI-K identified 18 as RMD and 36 as ¬RMD.

(2.a) Error rates: Mean error rates were calculated for all 75 tasks in total as well as each of the

twelve sets separately. We considered tasks, where the students answered wrongly or did not

answer at all, as being not solved correctly. We considered all tasks, which were not being

solved correctly, as error.

(2.b) Response times: For each task, the time from when the stimulus was first shown to when the

student typed the response on the computer screen was measured. Only response times of

correctly answered tasks were taken into account, since tasks that were not understood by the

students sometimes were quickly skipped and since students partially rashly guessed wrong

answers. Mean response times were calculated over all tasks that were answered correctly as

well as for each of the twelve sets separately.

Statistical analysis

We followed the guidelines of logistic regression analysis and reporting by Peng et al. (2002).

Logistic regression was performed using SPSS 27 in order to calculate a probability value between 0

and 1 for each student using mean error rate and mean response time over all tasks. For different cut-

off values p between 0 and 1, students are identified as RMD or ¬RMD. A ROC (receiver operating

characteristic) curve was then plotted, which indicates the sensitivity (true positive rate) and

specificity (true negative rate) for all cut-off values p as an indicator of the overall classification

accuracy. The area under this curve (AUC) is a measure of the classification quality.

Next, we ask which of the twelve mean error rates or the twelve mean response times are most

informative for identifying students as RMD, according to ZAREKI-K. We thus carried out a

backwards selection, subsequently for both mean error rate and mean response time for all twelve

task sets in early arithmetic and patterns using our multiple-logistic regression model.

Results

We conducted a t-test to compare mean differences of error rates and response times between students

identified as RMD and students identified as ¬RMD. Using the Shapiro-Wilk test, the normal

distribution of mean error rates (W(54) = .974, p > .05) and mean response times (W(54) = .967,

p > .05) was checked. Using the Levene test, the homogeneity of variances of mean error rates

(p > .05) and mean response times (p > .05) were checked. Thus, variance homogeneity exists

between the groups. With regard to the mean error rate, the 18 students identified as RMD had a

significantly higher mean error rate (M = .29, SD = .12) as compared to the 36 students identified as

¬RMD (M = .19, SD = .09; t(52) = –3.48, p < .05). With an effect size of r = .43, this is a medium

effect. With regard to the mean response time, the 18 students identified as RMD did not have a

significantly higher mean response time (M = 7.04s, SD = 1.63s) as compared to the 36 students

identified as ¬RMD (M = 6.65s, SD = 1.29s; t(52) = –.957, p = .34, r = .13).

The Likelihood ratio test indicates that the logistic regression model is significantly more effective

than the null model (constant only) (χ²(2) = 11.67, p < .05). Goodness-of-fit was assessed using the

Hosmer-Lemeshow test, indicating a fit of the logistic model (χ²(8) = 5.15, p > .05). Wald test

indicates that mean error rate of the 75 tasks is a significant classifier of RMD (χ²(1) = 8.53, p < .05).

The mean response time is not a significant classifier in this regard (χ²(1) = .987, p > .05).

The logistic model calculates a probability value between 0 and 1 for each student based on the error

rates and response times. The cut-off value p then defines at which probability value a student is

identified to have RMD or ¬RMD based on the logistic regression model. Choosing cut-off values of

p thus means to trade off sensitivity (true positive rate) and specificity (true negative rate) as they

change diametrically. Table 1 shows the sensitivity and specificity for different cut-off values p. The

total accuracy is computed as the number of all correctly identified results in relation to all results.

Table 1: Sensitivity, specificity, and total accuracy of the model for different cut-off values p

Cut-off value p

.05

.1

.15

.2

.25

.3

.307

.35

.4

.5

.6

Sensitivity (%)

100.0

100.0

88.9

83.3

72.2

72.2

72.2

61.1

61.1

55.6

33.3

Specificity (%)

5.6

13.9

25.0

44.4

61.1

69.4

72.2

77.8

80.6

91.7

94.4

Total accuracy (%)

37.0

42.6

46.3

57.4

64.8

70.4

72.2

72.2

74.1

79.6

74.1

For the identification of students at risk of developing mathematical difficulties at an early age, a high

sensitivity is often desirable even at the expense of a decreased specificity. A high sensitivity would

ensure that only a few students with mathematical difficulties at risk are missed. However, this has

the consequence that the specificity decreases and students with mathematical difficulties at risk are

not detected. For the cut-off value p = .307, a reasonably high sensitivity of 72.2% is achieved at a

still high specificity of 72.2%. Table 2 displays the classification of the students identified to be RMD

and to be ¬RMD through the participants’ error rates and response times compared to the students

identified as RMD and ¬RMD from the standardized ZAREKI-K for cut-off value p = .307.

Table 2: Classification tablea

Identification

Percentage correct

¬RMD

RMD

ZAREKI-K

¬RMD

26

10

72.2%

specificity

RMD

5

13

72.2%

sensitivity

Overall percentage

72.2%

acut-off value p = .307

The ROC curve (see Figure 2) is the generalization of a single classification table (see Table 2). Each

point of the ROC curve indicates sensitivity and (1–specificity) for a given cut-off value p. The drawn

diagonal would be expected if the classification was purely random. A measure of the classification

quality of the model is the area under the ROC curve (AUC). Following Hosmer et al. (2013), the

classification accuracy can be considered “acceptable” with an AUC = .761.

Figure 2: ROC curve of the general model (left; AUC = .761; ellipse marks cut-off value p = .307)

and the reduced model (right; AUC = .841; ellipse marks cut-off value p = .351)

To identify those sets whose error rates and/or response times are particularly good for the

identification of students at the risk of developing mathematical difficulties, logistic regression was

performed through backwards selection. This backwards selection is first done with all twelve mean

error rates of the sets. Step by step, the twelve mean error rates are removed from the model, starting

with the one that has the lowest significance for predicting the ZAREKI-K outcome. All variables

that are significant to replicate the classification based on the ZAREKI-K outcome at the p < .1 level

remain included according to the Wald test. At the same time, the Likelihood ratio statistic is used to

check whether the model would improve by adding another variable. After eleven steps, the mean

error rates of sets 7 (symbolic number comparison) and 12 (completing color patterns) (see Figure 1)

remained. For the response times, the mean values of set 9 (completing growing number patterns)

could not be included, since the number of incorrect answers were too high for which response times

were not considered. After eleven steps, only the response time of set 1 (enumeration of small sets)

remained. Applying logistics regression onto these three variables identified through backwards

selection, Likelihood ratio test indicates that the logistic regression model is significantly more

effective than the null model (constant only) (χ²(3) = 22.99, p < .05). Goodness-of-fit was assessed

using the Hosmer-Lemeshow test, indicating a good model fit; χ²(8) = 3.58, p > .89. Furthermore,

Wald test indicates that the mean error rate of set (12) (χ²(1) = 6.88, p < .05) and the mean response

time of set (1) (χ²(1) = 6.91, p < .05) are significant classifiers of developing mathematical

difficulties. The mean error rate of set (7) is not a significant classifier in this regard

(χ²(1) = 3.256, p > .05), but since p < .1 it remained in the model. Following Hosmer et al. (2013),

the classification accuracy can be considered “excellent” with AUC = .841. With a cut-off value of p

= .351, this model has a higher specificity of 80.6% compared to the previous model at a sensitivity

of 72.2%. The total accuracy of this model is 77.8%.

Discussion

The results of our study should be viewed and interpreted against the backdrop of the following

limitations: Logistic regression requires sufficiently many training samples, i.e., RMD and ¬RMD

cases. Having only 18 students identified as RMD and 36 students identified as ¬RMD limits the

certainty of the learned logistics regression model. A larger sample could provide further certainty.

In addition, we optimized the classification threshold for the logistic regression model on a single

data set and did not evaluate the classification accuracy on an independent test set. In practice, the

classification threshold needs to be learned on a training set, which would likely decrease the

classification accuracy on independent data.

This pilot study addressed the question to what extent error rates and response times of correctly

solved tasks as online measures in early arithmetic and pattern tasks can identify students that may

be at risk of developing mathematical difficulties (RMD). Using logistics regression, we found that

the mean error rate across all 75 tasks is a strong classifier of RMD, whereas the mean response time

was a weaker classifier. Combining error rates and response times in our study yielded an acceptable

discrimination of the model of AUC = .761. Furthermore, we investigated to what extent the error

rates and response times of twelve sets can be used separately to identify students’ RMD. We found

that the error rates of two sets (symbolic number comparison, completing color patterns) and the

response time of one set (enumeration of small sets) appeared to be particularly informative. The

influence of the set about complementing color patterns is especially noteworthy since the

standardized mathematics test focused on early arithmetic, not patterns. Combining error rates of

symbolic number comparison and completing color patterns with response times of small set

enumeration yielded a discrimination of the model of AUC = .841.

Since the DIDUNAS project aims to develop an app that identifies first-grade students in need of

support, what can we learn from these results regarding the app development? In our pilot study, we

found that it is possible to use online measurements of certain tasks or a set of them (here: symbolic

number comparison, completing color patterns, enumeration of small sets) to achieve a reasonably

reliable identification of students at risk of developing mathematical difficulties in grade 1. This is

promising for future developments. In the future, we will build on these results for developing the

DIDUNAS app, which then can be used by teachers around the world to help identify students with

risks of mathematical difficulties early on. As the app will require less effort in the conduction and

evaluation, compared with some standardized mathematics tests, the app will enable a resource-

saving use for teachers. The study presented in this paper is an important first step in that direction.

Acknowledgement

This project has received funding by the Erasmus+ grant program of the European Union under grant

agreement No 2020-1-DE03-KA201-077597.

References

Geary, D. C. (2013). Early foundations for mathematics learning and their relations to learning

disabilities. Current Directions in Psychological Science, 22, 23–27.

https://doi.org/10.1177/0963721412469398.

Geary, D. C., Bailey, D. H., & Hoard, M. K. (2009). Predicting mathematical achievement and

mathematical learning disability with a simple screening tool. Journal of Psychoeducational

Assessment, 27(3), 265–279. https://doi.org/10.1177/0734282908330592.

Gersten, R., Jordan, N. C., & Flojo, J. R. (2005). Early identification and interventions for students

with mathematics difficulties. Journal of Learning Disabilities, 38(4), 293–304.

https://doi.org/10.1177/00222194050380040301.

Hosmer, D. W., Lemeshow, S., & Sturdivan, R. (2013). Applied Logistic Regression (3rd ed.). Wiley.

Hellstrand, H., Korhonen, J., Räsänen, P., Linnanmäki, K., & Aunio, P. (2020). Reliability and

validity evidence of the early numeracy test for identifying children at risk for mathematical

learning difficulties. International Journal of Educational Research, 102, 101580.

https://doi.org/10.1016/j.ijer.2020.101580.

Klein, P., Adi-Japha, E., & Hakak-Benizri, S. (2010). Mathematical thinking of kindergarten boys

and girls. Educational Studies in Mathematics, 73, 233–246. https://doi.org/10.1007/s10649-009-

9216-y.

Moser Opitz, E. (2013). Rechenschwäche/Dyskalkulie. Theoretische Klärungen und empirische

Studien an betroffenen Schülerinnen und Schülern (2nd ed.). Haupt.

Moeller, K., Neuburger, S., Kaufmann, L., Landerl, K., & Nuerk, H.-C. (2009). Basic number

processing deficits in developmental dyscalculia: Evidence from eye tracking. Cognitive

Development, 24(4), 371–386. https://doi.org/10.1016/j.cogdev.2009.09.007.

Mussolin, C., Mejias, S., & Noël, M.-P. (2010). Symbolic and nonsymbolic number comparison in

children with and without dyscalculia. Cognition, 115, 10–25.

https://doi.org/10.1016/j.cognition.2009.10.006.

Peng, C.-Y. J., Lee, K., & Ingersoll, G. M. (2002). Introduction to logistic regression analysis and

reporting. The Journal of Educational Research, 96(1), 3–14.

https://doi.org/10.1080/00220670209598786.

Pittalis, M., Pitta-Pantazi, D., & Christou, C. (2018). A longitudinal study revisiting the notion of

early number sense: algrebraic arithmetic a catalyst for number sense development. Mathematical

Thinking and Learning, 20(3), 222–247. https://doi.org/10.1080/10986065.2018.1474533.

Sasanguie, D., Van den Bussche, E., & Reynvoet, B. (2012). Predictors for mathematics

achievement? Evidence from a longitudinal study. Mind, Brain, and Education, 6(3), 119–128.

https://doi.org/10.1111/j.1751-228X.2012.01147.x.

Schindler, M., Schovenberg, V., & Schabmann, A. (2020). Enumeration processes of children with

mathematical difficulties: An explorative eye-tracking study on subitizing, groupitizing, counting,

and pattern recognition. Learning Disabilities: A Contemporary Journal, 18(2), 192–211.

Schleifer, P., & Landerl, K. (2011). Subitizing and counting in typical and atypical development.

Developmental Science, 14(2), 280–291. https://doi.org/10.1111/j.1467-7687.2010.00976.x.

Verschaffel, L., Torbeyns, J., & De Smedt, B. (2017). Young childen’s early mathematical

competencies: Analysis and stimulation. In T. Dooley, & G. Gueudet (Eds.) Proceedings of the

Tenth Congress of the European Society for Research in Mathematics Education. ERME.

Viesel-Nordmeyer, N., Schurig, M., Bos, W., & Ritterfeld, U. (2019). Effects of pre-school

mathematical disparities on the development of mathematical and verbal skills in primary

school children. Learning Disabilities, 17, 149–164.

von Aster, M. G., Bzufka, M. W., & Horn, R. (2009). ZAREKI-K. Neuropsychologische Testbatterie

für Zahlenverarbeitung und Rechnen bei Kindern – Kindergartenversion. Pearson.

Walter, J. (2020). Ein Screening-Verfahren zur Prognose von Rechenschwierigkeiten in der

Grundschule. Zeitschrift für Heilpädagogik, 71, 238–253.

Wijns, N., Torbeyns, J., Bakker, M., De Smedt, B., & Verschaffel, L. (2019). Four-year olds’

understanding of repeating and growing patterns and its association with early numerical ability.

Early Childhood Research Quarterly, 49, 152–163. https://doi.org/10.1016/j.ecresq.2019.06.004.