Available via license: CC BY 4.0

Content may be subject to copyright.

Cognitive Development 57 (2021) 100968

Available online 29 January 2021

0885-2014/© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license

(http://creativecommons.org/licenses/by/4.0/).

Follow-up questions inuence the measured number knowledge in

the Give-a-number task

Attila Krajcsi

ELTE, E¨

otv¨

os Lor´

and University, Institute of Psychology, Department of Cognitive Psychology, 1064, Budapest, Izabella utca 46, Hungary

ARTICLE INFO

Keywords:

Give-a-number task

Cardinality principle

Subset-knowers

Preschooler number knowledge

ABSTRACT

The Give-a-number task is one of the most frequently used tests to measure the number knowl-

edge of preschoolers at the time they acquire the meaning of symbolic numbers. In the task, an

experimenter asks for a specic number of objects from a child. The literature utilizes several

versions of this task, and usually it is assumed that the different versions are equivalent and that

they do not have an effect on the measured number knowledge. In the present study, the specic

potential effect of the follow-up questions posed after a trial on the measured number knowledge

is investigated. Three versions of follow-up questions are compared. The results demonstrate that

different versions affect the measured number knowledge of children. These results highlight that

follow-up questions should be considered in studies using the Give-a-number task, and more

generally, various versions of the Give-a-number task may have an essential effect on the

measured number knowledge, thereby partly accounting for conicting ndings in the literature.

1. Introduction

Understanding numbers and math is one of the pillars of our culture. Research on how symbolic numbers are initially understood

and what representations are utilized is key to understanding how humans understand symbolic numbers. Human-specic, symbolic

number understanding is acquired at approximately the age of 4 years. Currently, the most important tool with which the phases of this

initial symbolic number learning is investigated among preschoolers is the Give-a-number (GaN) task (Wynn, 1990, 1992). In the GaN

task, children are asked to provide a specic number of objects. This number starts at 1 and upon a successful response, the next larger

number is asked, while in the case of an incorrect response, the preceding smaller number is asked again. The experimenter may go

back and forth between the largest known number and the smallest unknown number three times to establish the limit of the child’s

number knowledge.

Using the GaN task, Karen Wynn (1990, 1992) described the sequential phases of symbolic number acquisition. Initially, while

children know the series of number words (i.e., they can recite the “one-two-three” etc. list), they are unable to give any number of

objects reliably, which children are termed pre-knowers (or non-knowers). At approximately the age of 3 years, children can give 1

object, but not more, termed one-knowers. A few months after this, they are also able to give 2 objects, termed two-knowers, then after

another few months, they become three-knowers, and subsequently, four-knowers. One-, two-, three-, and occasionally four-knowers

are termed subset-knowers, because they can utilize only a subset of their counting list. In other words, even if children know the

counting word series up to an appropriately high value (e.g., up to 10), in the GaN task, they are unable to give all of these values

correctly. Typically at approximately the age of 4 years, children start to give any amount that is in their counting list, termed

E-mail address: krajcsi@gmail.com.

Contents lists available at ScienceDirect

Cognitive Development

journal homepage: www.elsevier.com/locate/cogdev

https://doi.org/10.1016/j.cogdev.2020.100968

Received 6 April 2020; Received in revised form 14 October 2020; Accepted 15 October 2020

Cognitive Development 57 (2021) 100968

2

cardinality-principle-knowers (CP-knowers). It is assumed that at this point their symbolic number knowledge becomes general,

because they understand the cardinality principle, i.e., the principle that after counting a set of objects, the last number word denotes

the amount of objects in that set.

The description of this development is essential because this is the data that models attempt to account for (see a selection of the

most prominent theories of symbolic number acquisition in Carey, 2004, 2009; Carey & Barner, 2019; Piazza, 2010; vanMarle et al.,

2018). Furthermore, in many studies, other developmental numerical cognition phenomena are contrasted against the development

described using the GaN task (for example, see Davidson, Eng, & Barner, 2012; Le Corre, 2014; Le Corre & Carey, 2007; Sella &

Lucangeli, 2020), thus, the GaN task is considered the gold standard for measuring changes in initial symbolic number understanding.

While the GaN task and the concept of cardinality-principle knowledge are key components in the cognitive description of initial

symbolic number acquisition in preschoolers, there are both conceptual and methodological issues related to these descriptions. On a

conceptual level, while the concept of cardinality-principle knowledge supposes a meaningful understanding of the counting pro-

cedure, together with general knowledge about numbers, several phenomena seemingly contradict this idea. For example, not all CP-

knowers are able to estimate the cardinality of a set that includes more than 4 items (Le Corre & Carey, 2007); not all CP-knowers are

able to compare symbolic numbers (Davidson et al., 2012; Le Corre, 2014); not all CP-knowers know the cardinality of a set when an

item is added to a set with known cardinality (Davidson et al., 2012); or not all CP-knowers know the cardinality of a set when an item

is removed from a set with known cardinality (Sella & Lucangeli, 2020). Relatedly, it is also possible that the GaN task can be solved

mechanically without a rich understanding of numbers (Davidson et al., 2012). On a methodological level, several versions of the GaN

task are used in the literature without ensuring that the different versions are equivalent in the sense that they give the same mea-

surement result. For example, while 4-knowers are usually not considered CP-knowers, in other cases they are categorized as

CP-knowers, e.g., in Wynn (1992), or as an opposite example, even 5-knowers are not considered as CP-knowers (Le Corre, 2014). As

another example, when specifying the threshold of knowing a specic number, most research uses the criterion that the child must give

2 correct answers out of 3 responses (67 %), however, others use the criterion of 2-out-of-2 (100 %) or 1-out-of-2 responses (50 %) (J.

B. Wagner & Johnson, 2011), or may use an entirely different approach, e.g., employing Bayesian modeling (Negen, Sarnecka, & Lee,

2011). (See additional examples about the follow-up question below.) The methodological and conceptual issues in this regard may be

related. In its simplest form, if the various versions of the tasks are not equivalent, then some versions must be imprecise or even

invalid. Obviously, using partially invalid methods may lead to seemingly contradicting evidence (see similar examples of how

methodological variations can profoundly inuence conclusions in Gunderson, Spaepen, & Levine, 2015; K. Wagner, Chu, & Barner,

2019).

Because many different versions of the GaN task are considered equivalent without empirical conrmation of these assumptions, it

is not known whether these different versions truly offer the same measurement results. The aim of the present study is to empirically

verify the supposed equivalence of one of these variations, i.e., the use of various follow-up questions in the GaN task. The broader aim

of this study is to partly understand how variations in the GaN task can produce different observed number knowledge and how these

differences can contribute to seemingly contradictory phenomena in the development of symbolic number knowledge.

1.1. Follow-up questions in the Give-a-number task trials

When administering the GaN task, it is critical to differentiate between competence- and performance-based errors. Children may

give incorrect responses either because they cannot identify and use the numerical meaning of a number word, or because, although

they are able to identify and use the value, they are unable to execute the task correctly, for example, they may accidentally skip a

number word or miss an object, even if, in most instances they solve the task correctly. To avoid these performance errors, Wynn (1990,

1992) asked children to check the given set at the end of the trials; if they had not counted the set, they were asked explicitly to recount

it. If the result of the checking or the result of the recount was different from the required number, the experimenter reminded the

children of the required number and asked the children to x the error.

In the literature, several variations of these follow-up questions are utilized. In some versions, the children are always asked to

recount (e.g., Le Corre & Carey, 2007) or to recount the set only if they have not counted the given set (e.g., Le Corre, Van de Walle,

Brannon, & Carey, 2006), in other versions, children are only asked to check their responses but do not have to recount their given set

(e.g., Sarnecka & Lee, 2009), and there are variations in which follow-up questions are not utilized at all (e.g., Mussolin, Nys, Leybaert,

& Content, 2012; Posid & Cordes, 2018). Other combinations of subtle details regarding the follow-up question are also observable in

the literature, for example, xing a response is asked only if the result of the child’s counting differed from the required amount (Le

Corre et al., 2006).

While the GaN task is used in various forms in the literature, the differences are not justied or explained explicitly in those works.

Importantly, it is possible that various types of follow-up questions might have different effects. Since checking and recounting in-

structions are used to avoid performance errors (as supposed in Wynn, 1990, 1992), if a lack of these instructions or different variations

of instructions truly introduce more performance errors, then different versions of the task should reveal different performance. This

also means that various versions of the task may categorize the same child into different number-knowledge categories, thus, intro-

ducing a systematic bias and questioning the validity of some of the GaN task versions.

One source of information on whether the follow-up questions at the end of the GaN task trial inuence performance is how

frequently children correct their original response after the question. Unfortunately, the data available in the literature are not

conclusive on whether follow-up questions inuence the identied number knowledge level. According to Sarnecka and Lee (2009), at

the end of the trials, it was asked whether the given set is N, and the correction was extremely low (0.2 %). Le Corre et al. (2006) asked

children to recount the set if counting was not employed in their original response. They reported the proportion of correct xes and

A. Krajcsi

Cognitive Development 57 (2021) 100968

3

the proportion of xes in the correct direction (i.e., in contrast to a lack of corrections or corrections that made the response even more

erroneous) and found an important difference between subset-knowers and CP-knowers: While subset-knowers usually left the

incorrect response unxed or made the response even worse (approximately only 20 % of the trials in which children were asked to x

were corrected), CP-knowers were more successful (approximately 80 % of the to-be-xed trials were corrected). Note that the pro-

portion values of the two studies cannot be compared directly because, in the Sarnecka and Lee (2009) study, the base number for the

reported proportion is the number of all trials, while in the Le Corre et al. (2006) study, it is only the number of trials where a x was

asked for (i.e., the number of trials when the result of the child’s counting differed from the required amount); additionally, the latter

study considered a response correct even if the difference of the correct and given response was 1 (e.g., 4 was accepted as correct

response for 3). In summary, it is not straightforward whether children frequently correct their responses, and whether the follow-up

questions have an important effect on measured performance.

Note that another potential mechanism may inuence the effect of follow-up questions at the end of GaN task trials. Although

follow-up questions were intended to avoid performance errors, an explicit request to recount the set and the comparison of the given

set with the required amount may have a training effect. First, in a more subtle form, even a simple question as to whether the given

response is correct may hint to children that they may have made a mistake and may encourage them to look for other solutions or

strategies. Conversely, it is also possible that after a recount or other follow-up instruction, children may suppose that their response

had been incorrect, even if the response was correct, consequently, because of the correction, they may modify an otherwise correct

response to be incorrect. In these latter cases, the follow-up question may cause worse performance than the task versions without

follow-up questions. Second, in line with the training hypothesis, in a training study, Kelly Mix and her colleagues demonstrated that

counting and labeling a set at the same time (e.g., “Look, this page has three crackers. Can you say it with me? Three crackers. Now let’s

count them 1, 2, 3!”) results in larger number knowledge measured with the GaN task in three-and-half-year old children (Mix,

Sandhofer, Moore, & Russell, 2012). In the same study, it was found that using counting alone or using labeling alone did not improve

number knowledge when compared with a control group. Therefore, the follow-up questions may induce training and children may

show better number knowledge with those questions compared with a more passive version of the GaN task. In another study, a similar

counting and labeling instruction induced a training effect even after a 5-minute-long practice (Posid & Cordes, 2018). (Note that there

may be important differences between the recounting version of the GaN task and these training studies. For example, the training task

has a different context compared with the GaN measurement task. Therefore, while the recounting version of the GaN task may have a

training effect, this effect cannot be taken for granted.) Similar feedback-based performance change in sequence learning has been

described in various conditions (Lange-Küttner, Averbeck, Hirsch, Wießner, & Lamba, 2012). To the current author’s knowledge, the

literature has to date not considered the potential effect of this type of feedback at the end of the GaN task trials, and the consequences

of this potential effect remain unknown. It is not clear how much the follow-up questions could help children to avoid performance

errors or how much they might inuence responses, either by teaching more effective strategies or conceptual knowledge, or by

worsening initially correct responses.

To summarize, while various follow-up questions are utilized at the end of the GaN task trials, it is not known whether these

versions measure different level of number knowledge among children, potentially leading to validity issues in some versions of the

GaN task. Relatedly, while these follow-up questions were originally applied to avoid performance errors, it is not known whether

these questions have a training effect.

1.2. Aim of the study

The aim of the present study is to compare the effect of various follow-up questions on the performance of children in the GaN task.

More specically, three GaN task versions are contrasted here: the task without any follow-up questions; the task with a question about

whether the response is correct; and the task with the request to recount the given set. It is investigated whether the three conditions

inuence the measured number knowledge of children and the number of corrections after their original response in the task. Notice

that it is assumed throughout the entire study that only the observed number knowledge is available in the present data, but not the

“real” ability of children. Note that these task versions do not cover all the variations that can be found in the literature; nonetheless,

they cover some of the main versions, where it is reasonable to suppose that differences may arise. Additionally, no combinations of

these follow-up questions (such as in Wynn, 1990, 1992) are tested here because the primary interest is whether the follow-up

questions have an effect in their purest forms. Relatedly, the present work does not use the classic task version as described in

Wynn (1990, 1992) as a starting point. Rather, similar to many versions in the literature, the present work uses a version that is

modied in several ways (see specic details and references in the Task and procedure section). Still, the present version serves as an

appropriate test case for whether equality of various GaN task versions could be granted. Finally, note that the main aim of the present

study was not to investigate whether the follow-up questions help to avoid performance error or whether they induce training or both;

rather, the aim was to reveal whether the follow-up question versions have an effect on the children’s performance in any way.

If there are differences between the effects of the follow-up question versions, then it is essential not only to describe the exact

method a study follows, but also to consider these differences in reviews, meta-analyses and validation works. Also, the difference

between the effect of follow-up questions is important because it only makes sense to investigate whether there are performance errors

or training effects if differences between the follow-up questions truly exist (unless we suppose that an equally strong performance

error and training effects annul the observable effect). On the other hand, if no differences between the present versions of the follow-

up questions (including the no-question version) could be observed, then follow-up questions may be omitted from the GaN task,

leading to a much shorter version of the task which could be easier to administer and could be less demanding for the children.

A. Krajcsi

Cognitive Development 57 (2021) 100968

4

2. Methods

Three- and four-year-old preschoolers solved different follow-up question versions of the GaN task. The study was approved by the

ethics committee of the Faculty of Education and Psychology, E¨

otv¨

os Lor´

and University, Hungary.

2.1. Participants

The study initially included 210 preschool participants. Among them, children who could not complete the tasks (N=9) or whose

counting list was not longer than their measured number knowledge (N=4) were excluded from the analysis. The nal sample

comprised 197 preschoolers, 99 girls and 98 boys, with a mean age of 4;1 years, SD of 0.5;, ranging between 3;1 and 5; years. Seventeen

Hungarian preschools were involved in the measurement, 8 of them in the capital, and 9 of them in a country town, all of them mostly

receiving children of middle-class families.

2.2. Tasks and procedure

2.2.1. Give-a-number task

The children were presented with a pile of sponge balls (approximately 30 balls, with a diameter of approximately 2 cm) and the

experimenter asked them to provide a specic number of balls. “In this task, I want to see if you could pick these balls appropriately.

Could you give me x balls? Just place them in front of me.” The numbers were asked in the following pseudorandom order: 1, 3, 2, 6, 4,

5, 7, 9, and 8. This order was used to overcome the predictive nature of an increasing series. The entire series of these values was

repeated four times, resulting in 36 trials. The numbers provided by children were recorded. (Further details of this task version are

given below.)

Three versions of the GaN task with three follow-up question conditions at the end of the trials were used. (1) In the “No follow-up”

version, no follow-up question was asked after a trial, and the children were not encouraged to recount the set when an incorrect

answer was given. (2) In the “Is it N” version, after the rst response, the experimenter asked the child, “Is this N balls?”, using the

required number in the question. If the child answered no, then the child was asked again to give N balls. (3) In the “Recount” task

version at the end of the trials, the child was asked to recount the given set. If the result was different from the required amount, the

required amount was asked for again.

Participants were randomly assigned to the three follow-up question groups. The nal sample included 65, 67 and 65 children in

the “No follow-up”, “Is it N” and “Recount” groups, respectively. The age of the three groups did not differ (age as an interval variable

was compared across the three follow-up question groups; all three group means were 4;1 years,

ω

2

=-0.008, Kruskal–Wallis test:

χ

2

(2,

N=197) =0.92, p=0.631).

The GaN task is employed in various forms in the literature. Since it has not been formerly investigated whether different versions

of the task measure number knowledge differently or whether various details interact (i.e., the presence or strength of a version-based

effect may inuence another version-based effect), it is not possible to tell whether specic details of the current task inuence the

result. The minimal supposition of the present work is that if, in any version used in the literature, an effect of different versions can be

observed, then it should be taken as a cautionary sign that the various versions must be considered and further investigated. In the

present paragraph, the specic properties of the task are summarized, for which properties different versions can be observed in the

literature. (1) In the present version, 30 balls were used as an initial set. Various set sizes are used in the literature (e.g., 15 objects in

Gunderson et al., 2015; 10 objects in K. Wagner et al., 2019) and this set size may inuence the response when children give incorrectly

all the available objects (Figs. 1 and 2 in Gunderson et al., 2015; Figs. 1 and 6 in. K. Wagner et al., 2019). However, the latter in-

formation is not used in the present study. (2) The classic GaN task uses the titration method: After a successful trial, the next larger

number is asked, and after an unsuccessful trial, the preceding smaller number is asked, until the rst number that is given incorrectly 2

out of 3 times is found (Wynn, 1990, 1992). Instead of this titration procedure, in the present work, all numbers between 1 and 9 are

asked in the order described above (see similar methods e.g., in Marchand & Barner, 2020; Mussolin et al., 2012; Sarnecka & Gelman,

2004; Sella & Lucangeli, 2020; Wagner & Johnson, 2011). The non-titration version was chosen to investigate additional properties of

the GaN task, which results are reported elsewhere. Importantly, the number knowledge calculation method used here mimics the

titration method (see the Analysis section below for more details). (3) While, in the titration method, numbers are initially asked in

increasing order, and subsequently, the order depends on the responses of the children, the present version uses pseudorandom order,

where randomization makes it impossible to rely on the order of the numbers in the task (see similar methods in Marchand & Barner,

2020). (4) Because it is assumed that once they understand 5, preschoolers are able to give any numbers from their counting list,

usually numbers larger than 5 are not tested, although several studies ask for numbers beyond 5 (e.g., see Almoammer et al., 2013;

Barner, Libenson, Cheung, & Takasaki, 2009; Cheung, Slusser, & Shusterman, 2016; Marchand & Barner, 2020; Mussolin et al., 2012;

Wagner & Johnson, 2011). The present dataset includes numbers larger than 5 to investigate other specic properties of the GaN task,

which results are reported elsewhere. Nonetheless, in the present study, in line with this widespread assumption, children who know at

least number 5 are categorized as CP-knowers (see more details in the Analysis section below) and data on larger numbers are not

analyzed here. (5) In the original GaN task version, all investigated numbers were asked at least once or twice, and the numbers at the

edge of participants’ number knowledge were asked 3 times (Wynn, 1990, 1992). In the present version, all numbers were asked 4

times, according to which a more precise measurement is expected.

A. Krajcsi

Cognitive Development 57 (2021) 100968

5

2.2.2. Counting list knowledge

Counting list knowledge was measured to ensure that the number knowledge measured using the GaN task is not constrained by the

children’s counting list: Only children who had larger counting list than their number knowledge (for CP-knowers the counting list

should go beyond 6) were included in the analysis. In the counting list task, the child was asked to continue the verbal list: “Could you

continue this series? I will start it, and you continue! One, two, three…” The task was given twice and the highest numbers in the

correctly recited series were recorded.

The two tasks were measured in a single session. First, the GaN task and then the Counting list knowledge task was given. A session

lasted approximately 15−20 min. Data were collected by nine experimenters, all of whom were blind to the aim of the study.

2.3. Analysis

2.3.1. Calculating number knowledge in the Give-a-number task

The average performance (proportion of the correctly given trials) was calculated for each child and each number. For a child, a

number was considered known according to two criteria. First, the proportion of correct responses must be higher than 50 % (i.e., 75 %

or 100 %). This calculation method is comparable with the routinely used titration method, where it was either tested whether “a child

succeeded […] reliably” (Wynn, 1992), or a number was evaluated as not known when the number was not given at least two out of

three times (Le Corre & Carey, 2007), and when the previous number was given at least two out of three times (Cheung et al., 2016).

Second, a number was considered as known only if the same set size was not given for other number words at least twice (e.g., if 3 was

given correctly when asking for 3, and 3 was also given at least twice when asking for 4, then 3 is not considered as known). (If the

criterion is set to a much stricter and somewhat unrealistic expectation of 100 % performance, and if giving a number to other number

words even once would make the number to be unknown, then the analysis still provides the same pattern of results as presented in the

Results section below.)

To mimic the classic titration method, number knowledge was specied as the largest known number after which the rst unknown

number follows. In other words, number knowledge is the smallest unknown number minus 1. For example, if the smallest unknown

number is 4, then the number knowledge of that child is 3. This calculation method imitates the titration method because in the

titration method, when an unknown number is reached, no further numbers are asked, and the previous number is considered as the

limit of the child’s knowledge. In other words, even if the present method measured the entire range between 1 and 9, the number

knowledge calculation method imitated the titration method, in which data collection is interrupted after reaching the rst unknown

number.

If children’s measured number knowledge was at least 5, they were considered a CP-knower.

2.3.2. Calculating counting list knowledge

Counting list knowledge was calculated as the mean of the highest numbers recited correctly from the two counting list repetitions.

The mean was considered a compromise between minimum reliable performance (the minimum of the two values, i.e., that number

was reached consistently) and the maximum peak performance (the maximum of the two values, i.e., a value that is not always

reached). Because numbers should be recited starting from 4, the smallest upper limit of the counting list knowledge could be 4.

Analyses were performed using LibreOfce Calc spreadsheet, the CogStat (version 2.0) data analysis software (Krajcsi, 2020), and

the Jamovi statistical software (The jamovi project, 2020) packages. In CogStat, after specifying the analysis task with the appropriate

variables, analyses are run automatically based on the properties of the data. In that software, the selection of the hypothesis tests is

based on the common textbook suggestions extended with additional considerations found in the methodological literature. For the

specic decision tree in CogStat, see the online documentation of the software.

2.3.3. Measurement level of the number knowledge

In most of the following analyses, number knowledge was handled as an interval variable. While number knowledge in the subset-

knowers’ range may be considered an interval variable, pre-knowers and CP-knowers may violate the equal intervals precondition, and

it may be more appropriate to handle number knowledge as an ordinal number. However, the mean could be a more useful descriptive

statistic for the expected value than the median (e.g., see Fig. 1 and 2). Several further details ensure that handling number knowledge

as an interval variable does not invalidate the present ndings. (1) Because, in all relevant hypothesis tests, the assumptions for the

parametric tests were violated, the same hypothesis test results would be expected if the number knowledge was handled as an ordinal

variable. (2) The full distributions of the data are also presented (see Fig. 1 and 2) where measurement level is irrelevant. (3) Some of

the analyses comparing the three follow-up question groups rely on nominal variables (e.g., see the left panel of Fig. 2 and the related

results), which analyses reveal the same pattern as observed in the interval variable-based analyses.

3. Results and discussion

The raw data are available at https://osf.io/584y3/.

Because the counting list knowledge may limit the number knowledge measured using the GaN task, children whose counting list

knowledge was not higher than their measured number knowledge were excluded from further analysis (N=4). Note again that the

counting list knowledge task was not a pre-test, but it was given after the GaN task.

A. Krajcsi

Cognitive Development 57 (2021) 100968

6

3.1. Observed number knowledge

The measured number knowledge differed between the three follow-up question groups (see Fig. 1; measured number knowledge

as an interval variable was compared across the three follow-up question groups,

ω

2

=0.035, Kruskal–Wallis test (normality

assumption for ANOVA was violated):

χ

2

(2, N=197) =9.42, p=0.009). Specically, number knowledge of the Recount condition

was signicantly higher than the number knowledge in the other two conditions (post hoc Dunn’s test, ps <0.04). If pre-knowers are

coded as 0, and CP-knowers are coded as 5-knowers, the average number knowledge is 3.0 for the No follow-up condition, 2.7 for the Is

it N condition and 3.6 for the Recount condition, therefore, based on the descriptives, the recounting procedure causes an approxi-

mately 0.5–1.0 higher number knowledge index.

This difference in the observed number knowledge could be caused by at least two sources. First, it is possible that, in the Recount

condition, the average number knowledge is higher because there are more CP-knowers than in the other two conditions (e.g., because

the recount instruction may have a training effect, causing a qualitative change in number knowledge). Second, independent of the

previous possibility, average number knowledge in the Recount condition may be higher than in the other conditions because subset-

knowers (i.e., children who are not yet CP-knowers) know larger numbers (e.g., more 3- or 4-knowers) compared with the subset-

knowers in the other two conditions, while they do not become CP-knowers. These two possibilities are different both mathemati-

cally (i.e., whether mean is increased by more 5 values or by more values below 5) and conceptually (i.e., whether seeming change

occurs within subset-knower category or not). It is also possible that both of these factors contribute to the observed difference

simultaneously.

To investigate the rst possibility, i.e., whether there are more CP-knowers in the Recount condition compared with the other two

conditions, the proportion of CP-knowers and subset-knowers were compared across the three follow-up question conditions (see

Fig. 2, left panel). The results revealed that there were signicant differences in the proportion of the CP-knowers in the three groups

(number knowledge as a dichotomous nominal variable (i.e., CP-knower or subset-knower, where subset-knower category also

included the pre-knowers) was compared across the three follow-up question groups, Cram´

er’s V measure of association: φ

c

=0.212;

Pearson’s chi-squared test:

χ

2

(2, N=183) =8.196, p=0.017), and the descriptive data suggested that in the Recount condition there

were more CP-knowers than in the other two conditions (while there were 19 (32 %) and 17 (27 %) CP-knowers in the N follow-up and

in the Is it N conditions, respectively, there were 31 (51 %) CP-knowers in the Recount condition).

To investigate the second possibility, i.e., whether there were differences between the conditions in the subset-knowing range

(including the pre-knowers), the average number knowledge of the subset-knowers (i.e., excluding the CP-knowers from the sample)

between the three conditions was compared (Fig. 2, middle and right panels). The results showed no signicant differences between

the three follow-up question groups in the observed number knowledge of subset-knowers (among subset-knowers the number

knowledge as an interval variable was compared across the three follow-up question groups,

ω

2

=0.011, Kruskal–Wallis test

(normality assumption for ANOVA was violated):

χ

2

(2, N=116) =3.33, p=0.189). The same non-signicant result was found if the

number of the specic number-knowers were compared between the three conditions (among subset-knowers the number knowledge

as a nominal variable was compared across the three follow-up question groups, Cram´

er’s V measure of association: φ

c

=0.214;

Pearson’s chi-squared test:

χ

2

(6, N=116) =10.580, p=0.102). (Note that, in the latter analysis, the Cram´

er’s V measure of asso-

ciation was in the same order of magnitude as in the previous analysis for the proportion of subset- and CP-knowers. While the former

association proved to be signicant, it was not signicant in the latter analysis. It is possible that the follow-up questions also had an

effect on the number of specic subset-knowers, but because of the smaller sample size, the analysis lacked sufcient statistical power.

Nonetheless, the present data lacks evidence for different proportions of specic subset-knowers between the follow-up conditions).

Fig. 1. Number knowledge as a function of the three follow-up conditions, showing individual data and boxplots (left) and point and interval

estimations of the mean (right). In the charts, 0 number knowledge means that the child is a pre-knower, and 5 number knowledge means the child

is a CP-knower. Note that the condence intervals were calculated with the assumption of normal distribution, which was not the case in the present

data (e.g., see the individual data on the left chart), therefore, these interval estimations can only be considered as approximations. The dotted lines

of the x-axes denote that those variables are nominal.

A. Krajcsi

Cognitive Development 57 (2021) 100968

7

In summary, the follow-up questions used at the end of the GaN task inuenced the observed number knowledge, displaying higher

number knowledge for the Recount follow-up condition than for the Is it N follow-up or without a follow-up question. Importantly, the

difference was mainly rooted in the higher proportion of CP-knowers with the Recount follow-up, and not in the number knowledge of

subset-knowers. In other words, the Recount version mainly had an effect among the CP-knowers.

3.2. Number of corrections

The follow-up questions had a signicant effect on the number of corrections, i.e., the number of trials in which the children

modied their original response (Fig. 3; number of corrections as an interval variable was compared across the follow-up question

groups,

ω

2=0.302, Kruskal–Wallis test (normality and homogeneity of variance assumptions for ANOVA were violated):

χ

2(2,

N=197) =84.4, p <0.001). When no follow-up question was asked, hardly any corrections were observed (mean number of

correction was 0.2, 0.6 %); for the Is it N follow-up the mean number of corrections (3.2, 8.9 %) was signicantly higher; and the

Recount follow-up question caused the highest mean number of corrections (9.1, 25.3 %; all pairwise post hoc comparisons (Dunn’s

test) were signicant, ps<.001). In parallel with these results, the proportion of children that corrected their responses at least once

differed between the three groups: 6% in the No follow-up question group, 45 % in the Is it N group, and 77 % in the Recount group (the

presence of at least one correction as a dichotomous nominal variable was compared across the follow-up question groups, Pearson’s

chi-squared test:

χ

2(2, N =197) =66.739, p <0.001).

When the number of corrections was compared between children with different observed number knowledge (Fig. 4), the number-

knowledge had a signicant effect on the number of corrections (number of corrections as an interval variable was compared across the

various number knowledge groups,

ω

2=0.0751, Kruskal–Wallis test (normality and homogeneity of variance assumptions were

violated):

χ

2 (5, N =197) =19.3, p =0.002). According to the post hoc pairwise comparisons (Dunn’s test), pairs pre-knower vs 2,

pre-knower–4, 1–2, 1–4, 1–CP, 2–3, and 3–4 differed signicantly, suggesting that 2- and 4-knowers primarily made the most

corrections.

Comparing the present results with existing reports on the proportion of corrections, we nd that Sarnecka and Lee (2009) who

used a condition similar to the present Is it N condition, found a low 0.2 % proportion of corrections, closer to what we observed in the

No follow-up condition (Fig. 3). It is possible that as-of-yet non-recognized factors, or factors that are not documented at all, caused this

difference. Le Corre et al. (2006) used a task version similar to the present Recount version (although they did not ask for recounting if

Fig. 2. Mosaic plot of the proportion of subset-knowers and CP-knowers as a function of the three follow-up conditions (left panel). Number

knowledge as a function of the three follow-up question conditions among subset-knowers (middle and right panels). See additional notes in Fig. 1.

Fig. 3. The number of corrections as a function of the three follow-up question conditions. See additional notes in Fig. 1.

A. Krajcsi

Cognitive Development 57 (2021) 100968

8

children already used counting spontaneously in their responses). On the one hand, they found a similarly large proportion of cor-

rections (20 % for subset-knowers and 80 % for CP-knowers) as in the present study, but the difference between the subset-knowers

and CP-knowers was not observed in the present results (Fig. 4). Note again that there are other differences in the Le Corre et al. (2006)

study that make a direct comparison difcult. For example, the basis for the proportion is the number of to-be-xed trials; corrections

that were made in the wrong number direction were not considered corrections; incorrect responses with a difference of 1 were

accepted as correct responses. These difculties regarding a comparison of the studies and the difculties concerning the differences in

the methods used highlights again the need for a more detailed description of how different versions of the GaN task can inuence

performance.

3.3. Performance changes during the session

The present GaN task version took longer to complete than the classic version. Several properties of the present task contributed to

the increased length: Instead of the titration method, all numbers between 1 and 9 are asked; numbers beyond 5 are also asked; all

numbers are asked four times. It is possible that this relatively long task may have been more tiresome for children and affected their

performance negatively. On the contrary, the supposed training or practice effect may improve performance further during a longer

task. Any of those two opposing effects may inuence the performance. To investigate whether the longer task impacted the results, the

mean performance (proportion of the correctly solved trials) was calculated for each child for each of the four series in the GaN task.

The proportion of correctly solved trials did not change over time (the mean proportions of the four series were 60 %, 58 %, 59 %, and

57 %, respectively, Friedman test (normality assumption for ANOVA was violated):

χ

2(3, N=197) =6.87, p=0.076; and while

performance slightly decreased over time, the change was not signicant despite the relatively large sample size). In a special case, it is

possible that both the tiredness and the training effect work, and the two effects have similar effect sizes, then the two effects may

extinguish each other, leaving no visible effect on performance.

To check whether the performance change differed between the follow-up question groups, an additional mixed ANOVA was

conducted with the proportion of correct responses in the four series as the within-subject factor and using the follow-up question

groups as the between-subject factor. For the two main effects the results replicated the previous analyses above: i.e., the series effect

was not signicant (F(3, 582) =2.395, p=0.067), while the follow-up question group effect was signicant (F(2, 194) =3.90, p=

0.022). Critically, the interaction of the two factors was not signicant (F(6, 582) =0.952, p=0.457), demonstrating no difference in

performance change between the follow-up question groups. However, descriptive results of the groups suggested that, in the Recount

group, the performance decreased. The change was monotonic across all 4 series, hinting that the change may not have been due to

noise, but there may be an existing effect with small effect size for which the present sample size did not provide sufcient statistical

power. By checking the effect of the series separately in the three groups, a signicant performance change was observed across the

series in the Recount group, but not in the other two groups: Running separate analyses for the three groups, the mean performance of

the 4 series as repeated measures interval variables was compared, Friedman test (normality assumption was violated):

χ

2(3,

N=65) =9.75, p=0.021 for the Recount group,

χ

2(3, N=65) =2.83, p=0.418 for the no No follow-up group, and

χ

2(3,

N=67) =2.11, p=0.549 for the Is it N group (Fig. 5).

Even if a performance change was present in the Recount group (note that the interaction of the primary ANOVA above was not

signicant), the decrease was moderate: The performance changed from 69 % (series 1) to 63 % (series 4). This was similar to the

results of Marchand and Barner (2020) where, in two studies, using the recount version of the GaN task, there were more children for

whom performance decreased between two GaN administrations compared with children whose performance increased; however, the

difference in change direction was not signicant. This pattern is counter to a training effect, the outcome of which predicts an

increased performance over time; or at least the training effect is weaker than the tiredness effect. Contrastingly, one may infer that the

Fig. 4. Number of corrections as a function of number knowledge. See additional notes in Fig. 1. Note that number knowledge is denoted as nominal

variable on the x-axis, even if this variable can be considered at least an ordinal variable, because the current analysis handles that variable as a

nominal variable (i.e., comparing the groups of children with different number knowledge).

A. Krajcsi

Cognitive Development 57 (2021) 100968

9

recount instruction may prevent performance error, although this prevention effect may decrease over time. The supposed tiredness

effect is also in line with the observed number of corrections: The tiredness effect was observable in the group where the children made

a substantial number of corrections, whereas in the other two groups, where corrections were much rarer, the supposed tiredness effect

was not observable. Importantly, the observed potential change in performance over time showed that even if longer administration of

the task may have slightly inuenced performance, the Recount follow-up question still demonstrated higher observed number

knowledge.

4. General discussion

The Give-a-number task is likely the most important task for measuring the initial acquisition of symbolic numbers among pre-

schoolers. At the end of the trials follow-up questions aim to x potential performance errors; however, various versions of these

follow-up questions are used in the literature, and it is not known whether the different questions can have a different effect on

performance. The present study compared the effect of three different follow-up questions, and found that the different questions have

effects on the measured performance. First, it was found that the Recounting instruction caused better performance in preschoolers

than the No follow-up question and the Is it N versions. This result is in line with an existing training study in which children showed

better performance in the GaN task after counting-and-labeling training (similar to the present Recounting condition) compared with

counting training, labeling training or control group (similar to the present Is it N or and the No follow-up question conditions) (Mix

et al., 2012). This result is also consistent with the explicit intention of including the recount instruction to prevent performance errors

in the GaN task (Wynn, 1990, 1992). Second, detailed analyses revealed that improved performance in the Recount follow-up con-

dition had been caused mainly by a higher proportion of CP-knowers and not by better number knowledge of the subset-knowers.

Third, and unsurprisingly, the follow-up question inuenced the number of corrections: There were hardly any spontaneous correc-

tions when no follow-up questions were used, there were more corrections in the Is it N condition and the most corrections were

observed for the Recount condition. Fourth, some aspects of the results hinted that performance in the Recount condition decreased

over time, supporting the notion that the recounting instruction prevented performance error through additional verication.

The results provided herein are essential because they demonstrate that different versions of the follow-up questions may cate-

gorize children differently in terms of whether they are subset- or CP-knowers (there were almost twice as many CP-knowers in the

Recount condition than in the Is it N and No follow-up conditions), a miscategorization that may have led to invalid data in several

studies. These results also highlight the need for the precise reporting of follow-up questions and the need to consider the type of

questions in reviews and meta-analyses. More generally, if there are further differences in the measured number knowledge caused by

additional variations in the GaN task, studies using different versions are incomparable, because preschoolers are categorized

inconsistently. One may argue that the present study did not use the GaN task version proposed by Wynn (1990, 1992), but a modied

version; therefore, the present results cannot be used to warn about GaN task versions in general. However, even if some of the version

related effects depend on other parameters of the task (i.e., even if different task version dimensions interact), it does not mean that the

GaN task version proposed by Wynn (1990, 1992) cannot work differently if only a single parameter of the task was modied. Rather,

the present result serve as a warning that various GaN task versions should not automatically be assumed as equivalent, but this

equality must be veried empirically.

The specic differences that were demonstrated here raise a next question: While the Recount follow-up question increased the

proportion of CP-knowers, it may not be clear whether recounting improved performance by avoiding performance error (as was

originally intended with the introduction of these follow-up questions), or whether recounting trained children, or whether both

effects worked in parallel. Some aspects of the results indicate that the recounting instruction may activate verication processes,

which may help to avoid performance errors. A former training study by Mix et al. (2012) hints that training at least plays a role in this

follow-up question effect. The possibility that performance may be altered by the follow-up instruction may also be related to chil-

dren’s partial number knowledge around their number knowledge limitation (Barner & Bachrach, 2010; Gunderson et al., 2015;

O’Rear, McNeil, & Kirkland, 2021; Wagner et al., 2019). The effect may also be in line and related to the fact that the reliability of

specic number knowledge is moderate (Marchand & Barner, 2020). Future studies may establish more specic sources of this

follow-up question effect.

In summary, while various versions of the Give-a-number task are used in the literature, and while they are supposed to be

equivalent, follow-up questions inuence the observed performance, which makes the results of the literature incomparable and

Fig. 5. The proportion of correct trials in the GaN task as a function of the series of numbers (i.e., time) and follow-up question groups.

A. Krajcsi

Cognitive Development 57 (2021) 100968

10

renders some of the task versions invalid or unreliable. Further validation studies may establish an optimal version for measuring the

number knowledge of preschoolers. The present result serves as an important caution that supposedly equivalent versions of the GaN

task may give rise to essential differences in observed performance, and further studies are needed to discover whether additional

variations of the GaN task may inuence the measured number knowledge. These clarications are essential because the conceptual

issues regarding initial symbolic numerical understanding can be resolved only if the GaN task, which is one of the most important

tests, measures the numerical ability validly.

Acknowledgments

I thank Marta Fedele, Edina Fintor, Petia Kojouharova and Tam´

as Sz˝

ucs for their comments on the manuscript. The work was

supported by the National Research, Development and Innovation Fund (NKFI 132165), CELSA Research Fund (CELSA/19/011), ELTE

E¨

otv¨

os Lor´

and University, Faculty of Education and Psychology, and Central European University, Department of Cognitive Sciences.

References

Almoammer, A., Sullivan, J., Donlan, C., Maruˇ

siˇ

c, F., ˇ

Zaucer, R., O’Donnell, T., et al. (2013). Grammatical morphology as a source of early number word meanings.

Proceedings of the National Academy of Sciences, 110(46), 18448–18453. https://doi.org/10.1073/pnas.1313652110

Barner, D., & Bachrach, A. (2010). Inference and exact numerical representation in early language development. Cognitive Psychology, 60(1), 40–62. https://doi.org/

10.1016/j.cogpsych.2009.06.002

Barner, D., Libenson, A., Cheung, P., & Takasaki, M. (2009). Cross-linguistic relations between quantiers and numerals in language acquisition: Evidence from

Japanese. Journal of Experimental Child Psychology, 103(4), 421–440. https://doi.org/10.1016/j.jecp.2008.12.001

Carey, S. (2004). Bootstrapping and the origin of concepts (pp. 59–68). Daedalus.

Carey, S. (2009). The origin of concepts (1st ed.). USA: Oxford University Press.

Carey, S., & Barner, D. (2019). Ontogenetic origins of human integer representations. Trends in Cognitive Sciences, 23(10), 823–835. https://doi.org/10.1016/j.

tics.2019.07.004

Cheung, P., Slusser, E., & Shusterman, A. A. (2016). 6-month longitudinal study on numerical estimation in preschoolers. Proceedings of the Cognitive Science Society.

https://pdfs.semanticscholar.org/a8b9/e015eb6710be668e6c9fa126f3676e73c4ce.pdf.

Davidson, K., Eng, K., & Barner, D. (2012). Does learning to count involve a semantic induction? Cognition, 123(1), 162–173. https://doi.org/10.1016/j.

cognition.2011.12.013

Gunderson, E. A., Spaepen, E., & Levine, S. C. (2015). Approximate number word knowledge before the cardinal principle. Journal of Experimental Child Psychology,

130, 35–55. https://doi.org/10.1016/j.jecp.2014.09.008

Krajcsi, A. (2020). CogStat – An automatic analysis statistical software (2.0.0) [Computer software]. https://www.cogstat.org.

Lange-Küttner, C., Averbeck, B. B., Hirsch, S. V., Wießner, I., & Lamba, N. (2012). Sequence learning under uncertainty in children: Self-reection vs. self-assertion.

Frontiers in Psychology, 3. https://doi.org/10.3389/fpsyg.2012.00127

Le Corre, M. (2014). Children acquire the later-greater principle after the cardinal principle. The British Journal of Developmental Psychology, 32(2), 163–177. https://

doi.org/10.1111/bjdp.12029

Le Corre, M., & Carey, S. (2007). One, two, three, four, nothing more: An investigation of the conceptual sources of the verbal counting principles. Cognition, 105(2),

395–438. https://doi.org/10.1016/j.cognition.2006.10.005

Le Corre, M., Van de Walle, G., Brannon, E. M., & Carey, S. (2006). Re-visiting the competence/performance debate in the acquisition of the counting principles.

Cognitive Psychology, 52(2), 130–169. https://doi.org/10.1016/j.cogpsych.2005.07.002

Marchand, E., & Barner, D. (2020). How reliable is the Give-a-number task? 42nd Annual Virtual Meeting of the Cognitive Science Society. https://

cognitivesciencesociety.org/cogsci20/papers/0095/index.html.

Mix, K. S., Sandhofer, C. M., Moore, J. A., & Russell, C. (2012). Acquisition of the cardinal word principle: The role of input. Early Childhood Research Quarterly, 27(2),

274–283. https://doi.org/10.1016/j.ecresq.2011.10.003

Mussolin, C., Nys, J., Leybaert, J., & Content, A. (2012). Relationships between approximate number system acuity and early symbolic number abilities. Trends in

Neuroscience and Education, 1(1), 21–31. https://doi.org/10.1016/j.tine.2012.09.003

Negen, J., Sarnecka, B. W., & Lee, M. D. (2011). An Excel sheet for inferring children’s number-knower levels from give-N data. Behavior Research Methods, 44(1),

57–66. https://doi.org/10.3758/s13428-011-0134-4

O’Rear, C. D., McNeil, N. M., & Kirkland, P. K. (2021). Partial knowledge in the development of number word understanding (n.d.) Developmental Science, n/a(n/a),

e12944. https://doi.org/10.1111/desc.12944.

Piazza, M. (2010). Neurocognitive start-up tools for symbolic number representations. Trends in Cognitive Sciences, 14(12), 542–551. https://doi.org/10.1016/j.

tics.2010.09.008

Posid, T., & Cordes, S. (2018). How high can you count? Probing the limits of children’s counting. Developmental Psychology, 54(5), 875–889. https://doi.org/

10.1037/dev0000469

Sarnecka, B. W., & Gelman, S. A. (2004). Six does not just mean a lot: Preschoolers see number words as specic. Cognition, 92(3), 329–352. https://doi.org/10.1016/

j.cognition.2003.10.001

Sarnecka, B. W., & Lee, M. D. (2009). Levels of number knowledge during early childhood. Journal of Experimental Child Psychology, 103(3), 325–337. https://doi.org/

10.1016/j.jecp.2009.02.007

Sella, F., & Lucangeli, D. (2020). The knowledge of the preceding number reveals a mature understanding of the number sequence. Cognition, 194, Article 104104.

https://doi.org/10.1016/j.cognition.2019.104104

The jamovi project. (2020). Jamovi (1.2) [Computer software]. https://www.jamovi.org.

vanMarle, K., Chu, F. W., Mou, Y., Seok, J. H., Rouder, J., & Geary, D. C. (2018). Attaching meaning to the number words: Contributions of the object tracking and

approximate number systems. Developmental Science, 21(1), Article e12495. https://doi.org/10.1111/desc.12495

Wagner, J. B., & Johnson, S. C. (2011). An association between understanding cardinality and analog magnitude representations in preschoolers. Cognition, 119(1),

10–22. https://doi.org/10.1016/j.cognition.2010.11.014

Wagner, K., Chu, J., & Barner, D. (2019). Do children’s number words begin noisy? Developmental Science, 22(1), Article e12752. https://doi.org/10.1111/desc.12752

Wynn, K. (1990). Children’s understanding of counting. Cognition, 36(2), 155–193. https://doi.org/10.1016/0010-0277(90)90003-3

Wynn, K. (1992). Children’s acquisition of the number words and the counting system. Cognitive Psychology, 24(2), 220–251. https://doi.org/10.1016/0010-0285(92)

90008-P

A. Krajcsi