# The development of children's rule use on the balance scale task.

**ABSTRACT** Cognitive development can be characterized by a sequence of increasingly complex rules or strategies for solving problems. Our work focuses on the development of children's proportional reasoning, assessed by the balance scale task using Siegler's (1976, 1981) rule assessment methodology. We studied whether children use rules, whether children of different ages use qualitatively different rules, and whether rules are used consistently. Nonverbal balance scale problems were administered to 805 participants between 5 and 19 years of age. Latent class analyses indicate that children use rules, that children of different ages use different rules, and that both consistent and inconsistent use of rules occurs. A model for the development of reasoning about the balance scale task is proposed. The model is a restricted form of the overlapping waves model (Siegler, 1996) and predicts both discontinuous and gradual transitions between rules.

**0**Bookmarks

**·**

**150**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**In studies on the development of cognitive processes, children are often grouped based on their ages before analyzing the data. After the analysis, the differences between age groups are interpreted as developmental differences. We argue that this approach is problematic because the variance in cognitive performance within an age group is considered to be measurement error. However, if a part of this variance is systematic, it can provide very useful information about the cognitive processes used by some children of a certain age but not others. In the current study, we presented 210 children aged 5 to 12years with serial order short-term memory tasks. First we analyze our data according to the approach using age groups, and then we apply latent class analysis to form latent classes of children based on their performance instead of their ages. We display the results of the age groups and the latent classes in terms of serial position curves, and we discuss the differences in results. Our findings show that there are considerable differences in performance between the age groups and the latent classes. We interpret our findings as indicating that the latent class analysis yielded a much more meaningful way of grouping children in terms of cognitive processes than the a priori grouping of children based on their ages.Journal of Experimental Child Psychology 06/2014; 126C:138-151. · 3.12 Impact Factor - SourceAvailable from: Bianca van Bers
##### Article: Preschoolers perform more informative experiments after observing theory-violating evidence

[Show abstract] [Hide abstract]

**ABSTRACT:**This study investigated the effect of evidence conflicting with preschoolers’ naive theory on the patterns of their free exploratory play. The domain of shadow size was used—a relatively complex, ecologically valid domain that allows for reliable assessment of children’s knowledge. Results showed that all children who observed conflicting evidence performed an unconfounded informative experiment in the beginning of their play, compared with half of the children who observed confirming evidence. Mainly, these experiments were directed at investigating a dimension that was at the core of children’s initial theory. Thus, preschoolers were flexible in the type of experiments they performed, but they were less flexible in the content of their investigations.Journal of Experimental Child Psychology 03/2015; 131. · 3.12 Impact Factor -
##### Article: The Actions Used by Children's and Their Underlying Theories whilst Engaged in Balance Tasks

[Show abstract] [Hide abstract]

**ABSTRACT:**This study attempted to analyze in detail the actions used by children and to uncover the theories used by those children whilst engaged in solving balance tasks. Sixty children, aged between 3 to 6 from "H" child care center located in Seoul were selected as the subjects. The children were asked to balance 8 different blocks by putting them on a bar one by one. Two of the 8 blocks were balanced by the center of the length of the block, two were unbalanced by the center of the length because another block is glued on the side of the bottom block, three blocks were unbalanced due to the insertion of a piece of metal in the side of the blocks, and one was completely unbalanced because it consisted of three layers of blocks glued obliquely. Fifteen actions undertaken by the children in solving the tasks were analyzed and divided into 6 categories : place, turning, push, press, support, and others. Children used three theories which were 'no theories', 'length centered theory' and 'considered both length and weight theory' whilst engaged in balance tasks.Korean Journal of Child Studies. 12/2012; 33(6).

Page 1

The Development of Children’s Rule Use on the

Balance Scale Task

Brenda R. J. Jansen and Han L. J. van der Maas

University of Amsterdam, The Netherlands

Cognitive development can be characterized by a sequence of increasingly complex

rules or strategies for solving problems. Our work focuses on the development of chil-

dren’s proportional reasoning, assessed by the balance scale task using Siegler’s (1976,

1981) rule assessment methodology. We studied whether children use rules, whether chil-

dren of different ages use qualitatively different rules, and whether rules are used consis-

tently. Nonverbal balance scale problems were administered to 805 participants between 5

and 19 years of age. Latent class analyses indicate that children use rules, that children of

different ages use different rules, and that both consistent and inconsistent use of rules

occurs. A model for the development of reasoning about the balance scale task is proposed.

The model is a restricted form of the overlapping waves model (Siegler, 1996) and predicts

both discontinuous and gradual transitions between rules.

Key Words: balance scale task; latent class analysis; overlapping waves model; strategy

switches; transitions; proportional reasoning.

© 2002 Elsevier Science (USA)

Proportional reasoning undergoes significant development over childhood

(Siegler, 1976). To succeed on a proportional reasoning task, it is necessary for a

child to identify the relevant task dimensions and to understand the multiplicative

relation between those dimensions. Although many tasks have been used to assess

proportional reasoning, undoubtedly the best known is the so-called balance scale

task in which children are asked to predict the movement of a balance scale. On

the two arms of the scale, pegs are situated at equal distances from each other and

from the fulcrum. Identical weights can be put on the pegs. Varying numbers of

weights are placed on each side at varying distances from the fulcrum. Children

are asked to predict or explain the consequent movement of the beam. Here we

focus on two questions that emerge from past research on the balance scale task.

First, it is questioned whether children use rules. If so, then do children use the

four rules that Siegler (1976, 1981) proposed, or do they also use additional rules?

383

Journal of Experimental Child Psychology 81, 383–416 (2002)

doi:10.1006/jecp.2002.2664, available online at http://www.idealibrary.com on

0022-0965/02 $35.00

© 2002 Elsevier Science (USA)

All rights reserved.

This research was supported by the Department of Psychology of the University of Amsterdam.

We thank Conor Dolan, Karen Farchaus Stein, and Maartje Raijmakers for their comments on ear-

lier drafts of this article and Maria Bezem and Saskia Vaske for their participation in this research.

Address correspondence and reprint requests to Brenda R. J. Jansen, University of Amsterdam,

Faculty of Psychology, Department of Developmental Psychology, Roetersstraat 15, 1018 WB

Amsterdam, The Netherlands. E-mail: op_jansen@macmail.psy.uva.nl.

Page 2

The second question concerns the consistency with which children employ the

rules. This question is related to the question of how children develop from one

rule to the next. Are the transitions between rules discrete and do children “jump”

from one rule to the next, or are the transitions gradual and do children switch

back and forth among several rules? Rule use is expected to be more consistent

in the first than in the second case. The importance of both questions is explained

in this introduction.

Siegler (1976) hypothesized that children employ rules for solving balance

scale items. The definition of the term rule is not unequivocal. Reese (1989) con-

tended that behavior should meet certain criteria before it can be ascribed to the

use of a rule: It should be regular, it should be consistent with expected behavior,

and the development should be discontinuous. Reese contended that the inference

about rule use is more persuasive when more than one kind of behavior is expect-

ed and observed and when the inferred rule is generalized to other behaviors and

tasks. A final criterion is that participants evidence awareness of rule use. In this

article, our focus is on whether we can infer a strategy and predict further per-

formance on a task from the strategy. We do not follow Reese’s criteria but rather

use the term rule and the term strategy interchangeably. Both refer to mental pro-

cedures that children follow to solve problems. Strategies may be learned explic-

itly or deduced by children from experience. Siegler (1976, 1981) organized chil-

dren’s use of rules when solving balance scale items into an invariant sequence,

characterized by an increasing integration of the weight and the distance dimen-

sions. First, children compare only the numbers of weights on both sides of the

fulcrum (Rule I). Next, they compare the distances at which the weights are

placed, but only when the numbers of weights on both sides are equal (Rule II).

Children next consider both dimensions but do not know how to combine them,

and so they guess or muddle through (Rule III). Finally, children learn to multi-

ply the dimensions and compare the products of both sides (Rule IV).

To tap these different rules, six different item types are used. Balance items are

those with an equal number of weights placed equidistant from the fulcrum,

weight items use an unequal number of weights placed equidistant from the ful-

crum; distance items use an equal number of weights placed at different distances

from the fulcrum, and conflict items use an unequal number of weights on each

side of the scale placed at different distances from the fulcrum. Conflict–weight

items are those in which the scale tips to the side with the largest number of

weights. On conflict–distance items, the scale tips to the side with the weights

placed at the greatest distance from the fulcrum. On conflict–balance items, the

scale remains in balance. Table 1 summarizes the predicted proportions of cor-

rectly answered items, given item type, for each of the four rules. Table 1 includes

the qualitative proportionality (QP) rule and the addition rule, which are

explained below.

Many experimenters have studied children’s behavior on the balance scale task

(Chletsos, De Lisi, Turner, & McGillicuddy-De Lisi, 1989; Ferretti & Butterfield,

1986; Klahr & Siegler, 1978; Kliman, 1987; Marini & Case, 1994; McFadden,

384

JANSEN AND VAN DER MAAS

Page 3

Dufresne, & Kobasigawa, 1987; Normandeau, Larivée, Roulin, & Longeot, 1989;

Richards & Siegler, 1981; Roth, 1991; Siegler & Chen, 1998; Surber & Gzesh,

1984; van Maanen, Been, & Sijtsma, 1989; Wilkening & Anderson, 1982).

Besides evidence for the rules that Siegler (1976, 1981) observed, the experi-

ments resulted in evidence for additional rules. For example, Normandeau et al.

(1989) proposed the addition rule, the qualitative proportionality rule, and Rule

IIIA. The addition rule involves comparing the sums of weight and distance on the

arms of the scale on conflict items. Children who use this rule predict that the

scale tips to the side with the largest sum in the case of unequal sums and that the

scale remains in balance in the case of equal sums. Children who use the QP rule

predict that the scale remains in balance on conflict items because they think that

the larger number of weights on one side of the scale compensates for the larger

distance on the other side of the scale. Children who use Rule IIIA base their

judgments of conflict items on a perceptual cue: On some conflict items the dis-

tance dimension seems more important, whereas on others the weight dimension

seems more important. Siegler and Chen (1998) observed rules that are more

complex than Rule I but less complex than Rule II.

Most experimenters have assigned children to rules according to Siegler’s

(1976, 1981) rule assessment methodology (RAM): A child is considered to use

a particular rule if the proportion of its responses that are consistent with that rule

exceeds a certain criterion. The match between the observed response pattern and

the expected response pattern does not have to be perfect; minor deviations are

CHILDREN’S RULE USE

385

TABLE 1

Predicted Proportion Correct for Rule Models by Item Type

Rule model

Item

type

Rule

I

Rule

II

Rule

III

Rule

IV

Addition

rule

QP

ruleExample

Balance1.001.001.001.001.001.00

Weight1.001.001.001.001.001.00

Distance.00a

1.001.001.001.001.00

Conflict–weight 1.001.00.33c

1.000.00a

0.00a

Conflict–distance.00b

.00b

.33c

1.001.000.00a

Conflict–balance.00b

.00b

.33c

1.001.001.00

Note. On this test, the addition rule results in the response “in balance” to all conflict–weight items

and in the correct response to all other conflict items.

aAnswers that scale will stay in balance.

bAnswers that scale will tip to side with more weights.

cGuesses or “muddles through.”

Page 4

allowed. However, this procedure may result in the spurious detection of rules

(Strauss & Levin, 1981) because the criterion is arbitrarily chosen and cannot be

tested statistically.

In this article, we apply latent class analysis (LCA) to children’s responses to

balance scale problems. LCA is a statistical technique that divides the sample into

a limited number of latent classes. A class is characterized by a pattern of proba-

bilities that indicate the chance of giving a certain response to an item. The pat-

tern of probabilities can be interpreted as stemming from the use of a certain cog-

nitive strategy or rule. LCA provides four important advantages over the RAM.

First, the technique offers statistical fit measures that indicate the suitability of a

given model. Because a latent class model associated with rule use can be tested

statistically, LCA can falsify the hypothesis concerning rule use. Second, LCA

can model children’s error processes. Although children possess the ability to per-

form a task, they may still answer some items incorrectly, for instance, because

of carelessness (Rindskopf, 1987). Because the probabilities of a correct response

can deviate between 0 and 1, the children are allowed to make some errors dur-

ing the test. Third, the deviation, and hence the criterion used to classify children,

is not arbitrary but rather can be subjected to statistical testing. In principle the

deviation is independent from the number of items administered. In the RAM, the

criterion does change when the number of items changes. With LCA, it is easier

to compare classifications based on different data sets collected with different

tests. Fourth, the exact rules do not have to be known beforehand because LCA

can detect clusters of unexpected response patterns that can be interpreted as

alternative rules.

This study overcomes some of the limitations of the study of Jansen and van

der Maas (1997), who also applied LCA to balance scale data. The data set in the

current study spans a wider age range. Moreover, we employ a trichotomous

response format (“left side down,” “in balance,” and “right side down”) that pro-

vides more information than the incorrect/correct scores that Jansen and van der

Maas (1997) employed. Finally, we relate the rules to the ages of the children.

The second question in this article concerns the development of children’s

behavior on the balance scale task. It is questioned whether transitions between

rules are discrete or gradual. The invariant sequence of rules that Siegler (1976,

1981) proposed stems from a so-called staircase model of development (Siegler,

1996). A rule corresponds to a stair, and on this stair a child has only one rule at

his or her disposal. The child uses it consistently and is unlikely to switch

between rules during the administration of a balance scale test. The transition to

a rule of a more advanced level of ability corresponds to a sudden spontaneous

shift to a subsequent stair. Siegler (1981, 1996) found considerable evidence for

this theory of development on the balance scale task. However, more contempo-

rary theoretical models pose continuous, more wavelike models of development.

Thelen and Smith (1994, p. xix) contended that although behavior and devel-

opment appear to be rule driven, there are in fact no rules. Siegler (1996) himself

proposed a developmental model that allows for individual variations: the over-

386

JANSEN AND VAN DER MAAS

Page 5

lapping waves model. It differs importantly from the staircase model. The distri-

bution of the use of a rule is represented by a “wave” in this model. Because the

waves can be overlapping, children can have several rules at their disposal and

can switch between rules. As children develop, their preference for a rule waxes

and wanes, and this results in a gradual change in the employment of rules. A rule

is not abruptly substituted for another rule, which is the case in the staircase

model. Siegler described development that followed from the overlapping waves

model as “a gradual ebbing and flowing of the frequencies of alternative ways of

thinking, with new approaches being added and old ones being eliminated as

well” (p. 86). The starting point for both the overlapping waves model and the

staircase model is that children use rules. Whereas the staircase model implies

consistency of rule use, the overlapping waves model implies switching back and

forth between rules. Because there are several empirical results that indicate that

children may switch between rules on the balance scale task, the overlapping

waves model possibly describes the development of performance on the task

accurately. It should be noted, however, that Siegler did not apply the model to

the balance scale task.

Jansen and van der Maas (1997) observed that the performance that was relat-

ed to Rule III was inconsistent and that some latent classes were difficult to inter-

pret. These complex results were unrevealed by the RAM. Jansen and van der

Maas contended that children might (spontaneously) learn during the adminis-

tration of the test. Just by presenting the test items, children discover new fea-

tures of the balance scale task and change their strategy accordingly. Ferretti and

Butterfield (1986) observed inconsistencies that were related to quantitative

characteristics of the items. As explained, Siegler’s (1976, 1981) categorization

of balance scale items into six types depends only on the qualitative relation of

the weight and the distance dimensions and not on quantitative variations (e.g.,

the absolute difference between the number of weights on both sides). However,

Ferretti and Butterfield (1986) observed that children were more likely to use a

more complex rule on items with a large product difference, which is the differ-

ence between the products of weight and distance on the two sides of the ful-

crum. Although Jansen and van der Maas (1997) argued that this conclusion is

based only on the responses to items with extreme product differences and that

Siegler’s (1981) assumption of insensitivity to quantitative variations within item

types is reasonable for variations that are not too extreme, the improved per-

formance on items with extreme variations is unexpected and remains an inter-

esting observation.

In this experiment, a balance scale test, consisting of comparable blocks of

items, is administered. Each block consists of one item of each type. An LCA of

each individual item type (of items from different blocks in the test) indicates

whether the items are homogeneous and whether children respond in the same

way to items of the same type. Consistent rule use should give rise to a well-fit-

ting latent class model, of which each class theoretically corresponds to the use

of a rule. It should be possible to restrict the probabilities of answering items of

CHILDREN’S RULE USE

387

Page 6

a single type to be equal within each latent class. Inconsistent rule use probably

results in a latent class model with differing probabilities of answering items of

one type correctly. The probabilities cannot be subjected to equality restrictions

without significantly worsening the accuracy of the description of the model. If

children use rules, then an LCA of each block of the test (of items of different

types) would result in a limited number of latent classes that all correspond to a

rule. If a staircase model can accurately describe development, then it is expect-

ed that children use the same rule in each block. However, if the overlapping

waves model provides a better description of children’s development on the bal-

ance scale task, then children are probably assigned to different rules in different

blocks. The construction of latent class models for responses to balance scale

items is further explained under Method after a description of the design and the

administration of the balance scale test.

METHOD

Participants

The balance scale test was administered to 805 participants. The participants

were middle-class children, recruited by sending letters to the parents of students

in four elementary and two secondary public schools in The Netherlands. The

final sample was comprised of children ranging between ages 5 and 19 years,

with sample sizes of 1, 15, 51, 65, 88, 99, 82, 77, 71, 77, 73, 56, 35, 12, and 3 for

the respective ages. Included were 397 male and 408 female participants. The

data from 15 children who omitted one or more items were excluded from the

analyses.

Material and Procedure

The administered paper-and-pencil version of the balance scale test was a

booklet that consisted of 30 pages. The balance scale depicted in the booklets

contained four pegs on each side of the fulcrum, and a maximum of six weights

were placed on one of the pegs on each side. As depicted, the scale was 15 cm

wide and 4 cm high. Two arrows pointing down and an equal sign, printed below

the balance scale, represented the three response possibilities. In explaining the

booklet to the children, Chletsos’s (1986) procedure was followed, which

Chletsos successfully applied with children age 8 years or over. The experi-

menters placed a wooden balance scale in a place where everybody could see it.

They explained that the pegs on the scale were placed at equal distances and that

all blocks weighed the same. They showed that a blocking pin prevented the scale

from tipping. They handed out the booklets and asked the children to fill in their

personal data on the front page of the booklet. The next page contained a picture

of a scale without weights. The experimenters explained the equivalence of the

wooden scale and this picture. The next three pages contained examples of items

that were meant to familiarize the children with the format of the test. The exam-

ples included a balance item and two items with weights placed on only one side

388

JANSEN AND VAN DER MAAS

Page 7

of the fulcrum. The experimenters and the children completed the three examples,

while the experimenters used the wooden scale to demonstrate the examples.

They explained that the children should circle the arrow under the left (or right)

arm of the balance if they thought that the scale tipped to the left (or right) and to

circle the equal sign under the fulcrum if they thought that the scale remained in

balance. The experimenters also explained how to correct an answer if a mistake

was made. The children were asked to work quietly and by themselves on the

remainder of the test. The explanation of the test and procedure took about 15

min, whereas the children needed 10 min, on average, to complete the test.

The actual test consisted of five comparable blocks of five items, one of each

type. Items were arranged in the same order in each block: weight, distance, con-

flict–weight, conflict–distance, and conflict–balance. Balance items were not

included because the expected responses to this item type do not differentiate

(van Maanen et al., 1989; see also Table 1). The conflict items were constructed

in such a way that the use of the addition rule resulted in the correct response to

conflict–distance and conflict–balance items but in the incorrect response “in bal-

ance” to conflict–weight items (see the Appendix for details of the items).

Latent Class Analysis

Children’s responses to the balance scale items were classified into rules by

means of latent class analysis. We give only a short introduction to LCA (for

reviews, see Clogg, 1995; Heinen, 1996; McCutcheon, 1987; Rindskopf, 1983,

1987). The program we employ for applying latent class analysis is PANMARK

(van de Pol, Langeheine, & De Jong, 1996). We use Goodman’s (1974) notation-

al system (see also McCutcheon, 1987) for the parameterization of the model.

The latent class model. The latent class model is a special case of the finite

mixture distribution model (McLachlan & Basford, 1988). A distinction is

made between manifest and latent variables. Manifest variables are the

observed behavioral measures. In this application, the manifest variables are

the balance scale items. The names of the item types are abbreviated as B (bal-

ance), W (weight), D (distance), CW (conflict–weight), CD (conflict–distance),

and CB (conflict–balance). The latent variable is the unobserved or underlying

variable, here the ability of proportional reasoning. We refer to it as X. In LCA,

the measurement level of both the manifest variables and the latent variable is

categorical. On the balance scale items, the response categories are “left side

down” (l), “in balance” (b) and “right side down” (r). We assume that the cat-

egories (or classes) of the latent variable represent several qualitatively distinct

levels of ability (rules) of proportional reasoning and that each class (t) of the

latent variable (X) corresponds to the use of a given rule. Each individual is

assumed to belong to one and only one latent class (Goodman, 1974). For

instance, a child uses Rule I and uses it on all items. This is the assumption of

stationarity.

Within each latent class, the manifest variables are statistically independent.

This so-called assumption of local independence means that the association

CHILDREN’S RULE USE

389

Page 8

between the manifest variables is explained by the classes of the latent variable

(McCutcheon, 1987). In the case of the balance scale task, the assumption of local

independence implies that the relation between the scores on balance scale items

can be explained by the assumption that the children use different cognitive

strategies. The relation between the balance scale items is explained solely by the

rule the child uses.

Construction of a latent class model. The responses to the balance scale items

are analyzed with a combination of exploratory and confirmatory latent class

analysis. The exploratory part determines the optimal number of latent classes

(T). Here, T refers to the number of rules comprising the latent ability of propor-

tional reasoning. Deciding on the number of latent classes involves increasing the

number of classes until the expected frequencies of the model do not deviate sig-

nificantly from the observed frequencies (see below). The estimated proportion of

children in a given class t of latent variable X is called the unconditional proba-

bility of that latent class. It is noted as

ability is estimated. In this application, it is the estimated proportion of children

who use rule t. The unconditional probabilities sum to 1.

Each latent class is characterized by a pattern of probabilities of responses to

the manifest variables. These probabilities depend on the latent class that the

subject occupies and are therefore called conditional probabilities. An estimated

conditional probability is expressed as, for instance,

ability that the response on a balance item (manifest variable B) is “the scale tips

to the right” (r), given that the child is a member of class t, for latent variable X.

The bar indicates that the probability is conditional on membership of latent

class t. The conditional probabilities sum to unity for each item within each

latent class.

In the confirmatory part of LCA, we test hypotheses on the content of the

classes by means of equality restrictions between conditional probabilities with-

in item types. For instance, the estimated probability of answering “in balance”

to conflict–weight item 1, given membership of the first latent class,

be restricted to be equal to the probabilities of answering that the scale remains

in balance to conflict–weight items 2, 3, 4, and 5, given membership of the first

latent class:

ˆˆˆ

πππ

bbb

111

===

sent the estimates by When equality is used between two items (say,

ˆ

.

π

b

1

CW1and CW2), estimated parameters are expressed as

indicates that the probability of answering “in balance” on the first conflict–

weight item is equal to giving this answer on the second conflict–weight item,

given membership of the first latent class.

The full latent class model for the manifest variables B, W, and D is defined in

Eq. (1), which gives the proportion of the possible response patterns ijk as a func-

tion of the estimated model parameters. The symbols i, j, and k refer to the

response categories of the manifest variables. They can attain the values l (“scale

tips to left”), b (“in balance”), and r (“scale tips to right”).

where the hat indicates that the prob-

where is the prob-

may

In this case, we repre-

ˆˆ

.

ππ

CW XCW XCW X

b

CW X

b

CW X

11

12345

=

ˆ

,

πb

CW X

1

1

ˆ πr

B

ˆ

,

πr t

B X

ˆ ,

πt

X

390

JANSEN AND VAN DER MAAS

CWX

15

K

The notation

ˆ

.

,

π

b

CWX

1

1 2

Page 9

(1)

The confidence intervals of the estimated parameters are determined by means

of the so-called non-naive bootstrap procedure. The estimated parameters of the

model are used to simulate a large number of data sets. For each data set, a model

(with the same number of latent classes and the same restrictions) is estimated (de

Menezes, 1999). The 5% and 95% percentiles of these bootstrapped estimates

define the limits of the confidence interval of an estimate. We report only confi-

dence intervals of which the distance from the lower to the upper limit is larger

than .1. If the confidence interval is symmetrical about the estimate, then we

report the range of the interval; otherwise, we report the bounds of the interval.

Selection of latent class model. Selection of a model takes place by consider-

ing the log–likelihood ratio (LR), which expresses the deviation of the expected

frequencies from the observed frequencies. When this deviation is small, the fit

index is small, indicating an acceptable fit of the model. An alpha level of .05 is

used for all statistical tests. If the model is accurate and the sample size is large,

then the LR follows a chi-square distribution with a number of degrees of free-

dom equal to the difference between the number of independent cells in the

observed frequency table and the number of estimated parameters. Because our

data set is small compared to the number of possible response patterns, we can-

not use the theoretical chi-square distribution to determine the fit of a model. We

use the parametric bootstrap method instead and report bootstrapped p values.1

The Bayesian information criterion (BIC) (Schwarz, 1978) is used to compare

models that show an acceptable fit to the data. The BIC is a penalized log–likeli-

hood criterion and is a function of the number of parameters, the log–likelihood

ratio, and the number of participants. The BIC is calculated as ?2 log L (t) ? par

* ln (N), where L is the likelihood based on t classes, par is the number of param-

eters of the model, and N is the number of participants. The BIC can be used for

comparing both nested and non-nested models. Small values characterize models

that fit well and are parsimonious. Because we prefer parsimonious models and

the BIC favors more parsimonious models (Raftery, 1995), we use the BIC and

not the Akaike information criterion (AIC) (Akaike, 1974).

Latent class memberships. Posterior probabilities give the probability that a

response pattern of a child belongs to a given latent class in the model for each

latent class. Posterior probabilities sum to unity over the latent classes for each

ˆ ˆ ˆ

π π

t

ˆˆ

.

πππ

ijk

X

i t

B X

j t

W X

t

T

k t

DX

=

=∑

1

CHILDREN’S RULE USE

391

1When the data set is sparse (many cells in the frequency table have a low frequency), the fit meas-

ures do not follow the theoretical chi-square distribution. In this case, the fit of a model cannot be

tested by using the theoretical chi-square distribution. The parametric bootstrap method can be used

to obtain an empirical distribution of the fit measures (Langeheine, Pannekoek, & van de Pol, 1995;

van der Heijden, ’t Hart, & Dessens, 1997). By resampling data using the estimated parameters of the

model, bootstrapped fit measures are obtained. Counting the number of bootstrapped fit measures

that are larger than the original fit measure results in a bootstrapped p value. This value, instead of

the p value derived from the theoretical distribution, is used in this article.

Page 10

response pattern. For instance, a response pattern that consists of the response “in

balance” to all distance items will show a high posterior probability associated

with the latent class that corresponds to Rule I and a low posterior probability

associated with the latent class that corresponds to Rule IV. Equation (2) contains

the formula of posterior probabilities:

(2)

A child is assigned to the class associated with the largest, or modal, posterior

probability. Assigning is straightforward if the modal probability of the child’s

response pattern is close to 1 but becomes doubtful if the probabilities associated

with different classes are similar. The percentage of correctly classified partici-

pants and the measure lambda (?) indicate the reliability of assigning participants

to latent classes. The percentage of correctly classified participants is the average

value of the modal probabilities multiplied by 100. The measure ? expresses the

improvement of using the modal probability of a response pattern instead of

assigning each participant to the largest latent class. The measure ? is between 0

and 1, and a higher value is associated with a more accurate assignment

(McCutcheon, 1987, p. 37).

Identification. Identification of a latent class model is an essential condition. It

implies that the minimum of the log–likelihood ratio function is associated with

a unique configuration of parameter values. Stated simply, the model is identified

if there are sufficient observed data (known entities) to obtain unique estimates of

the parameters (unknown entities—the class proportions and conditional proba-

bilities). A necessary (but not sufficient) condition is that the number of degrees

of freedom of the model is positive. Here, we adopt the criterion that the infor-

mation matrix (the matrix of second-order partial derivatives of the log–likeli-

hood ratio function with respect to the unknown parameters) is of full rank, that

is, that the eigenvalues of this matrix are positive (Van de Pol, et al., 1996).

Application of latent class analysis to balance scale data. The contingency

table of the current balance scale test of 25 items with three response categories

comprises of 325cells. Such a large contingency table may generate problems

concerning the fit and identification of the model and makes computation of a

model complex. Although van der Heijden, ’t Hart, and Dessens (1997) and

Boom, Hoijtink, and Kunnen (2001) presented latent class analyses of more than

20 dichotomous variables, most applications of LCA involve a small number of

items. Here, we analyze informative combinations of sets of items.

The analyses of the five items of each item type help to decide whether items

of the same type elicit equal responses. If this is the case, then one item may rep-

resent all items of the same type and a combination of items can be made in order

to study children’s responses to different item types simultaneously. The hypothe-

ses concerning the number and the content of the latent classes can be deduced

from Table 1. For example, the latent class model of the weight items theoreti-

cally consists of only one class because all children are expected to answer these

ˆˆ

/

ˆ

.

πππ

i j k t

BW DX

i j k t

BW DX

i j k t

BW DX

t

T

=

=∑

1

392

JANSEN AND VAN DER MAAS

Page 11

items correctly. The specific hypotheses concerning the number of expected

latent classes for the models of the remaining item types are specified under

Results preceding the results of the analyses.

The analysis of the items of each type starts with the exploratory phase in

which the optimal number of latent classes is determined. Hence, the number of

latent classes is not decided beforehand. In the confirmative phase, all latent

classes are subjected to equality restrictions to test the hypothesis that items with-

in a type are homogeneous and elicit equal responses. Siegler’s (1976, 1981) rules

and the alternative rules guide the restrictions. For example, the probability of

answering “in balance,” given any distance item, should be high and equal for

children who use Rule I. Moreover, the probability of answering “left side down”

should be low and equal to the probability of answering “right side down” for

children who use Rule I to solve distance items. The LR test indicates whether the

restrictions are allowed. The exploratory and confirmative phases of the latent

class analysis result in a model with a satisfying model fit (indicated by p values

larger than .05 for the log–likelihood ratio test), which is also parsimonious (indi-

cated by a low BIC). Only this model is described in detail.

The analyses of individual item types do not distinguish among all rules. For

example, all rules result in the correct response to weight items. Only a latent

class analysis of a combination of item types can describe the rules that children

apply. The choice of items in the combination is based on the results of the analy-

ses of the individual item types. It concerns items of the same part of the test.

Comparing one block of items to a subsequent block of items indicates whether

children switch back and forth between rules or use a rule consistently.

Consistency is compatible with the staircase model, whereas the overlapping

waves model can explain switches between rules.

Strategy switches actually violate the stationarity assumption. We do think that

the latent class model is appropriate for the analysis of responses to a balance

scale test because the latent class models of Jansen and van der Maas (1997)

mainly included clear, consistent, and well-interpretable response patterns. The

classes that were associated with inconsistent response patterns may have result-

ed from the violation of the stationarity assumption. Strategy switches or learn-

ing may explain the deviant classes.

RESULTS

Psychometric Properties of the Test

The internal consistency of the complete test, expressed by Cronbach’s alpha,

is .80, whereas the interitem correlation is .14. For the weight, distance, con-

flict–weight, conflict–distance, and conflict–balance items, the alphas are .81,

.93, .88, .90, and .89 and the interitem correlations are .45, .74, .60, .64, and

.63, respectively. The negative relation between some item types, such as the

weight items and the conflict–weight items, causes the low interitem correlation

for the complete test. These measures indicate that the items of one type tap the

CHILDREN’S RULE USE

393

Page 12

same ability. To further identify the homogeneity of items within an item type,

the latent class analysis of the separate item types is performed. The analyses

identify how children actually respond to the items and divide the children into

clusters that differ qualitatively. The average number of correct items is 15.55

(SD ? 4.62) for the complete test; the average numbers are 4.92 (SD ? 0.46),

3.19 (SD ? 0.21), 3.12 (SD ? 1.99), 2.41 (SD ? 2.08), and 1.91 (SD ? 2.02)

for the weight, distance, conflict–weight, conflict–distance, and conflict–bal-

ance items, respectively.

Latent Class Analyses of Each Item Type

Weight. All children were expected to answer weight items correctly, and the

latent class model is expected to consist of one class. However, three latent class-

es were needed to explain the data. The largest latent class (

strated the expected high estimated probabilities of answering weight items cor-

rectly. The response patterns of the children in the other two, very small classes

responded inconsistently to the weight items as well as to the other item types,

suggesting that they were not able to perform the test or did not understand it.

These children were excluded from the analyses, resulting in a sample of 779

children. Reanalysis of the responses of these 779 children to the weight items

resulted, as expected, in a model of one latent class, LR(240, N ? 779) ? 13.81,

bootstrapped p value ? .34, with high estimated probabilities

answering weight items correctly.

Conflict–weight. The latent class model for conflict–weight items was

expected to comprise three latent classes. The first class should consist of chil-

dren who employ a rule that results in the correct answer: Rule I, Rule II, or

Rule IV. The second latent class should consist of children who use Rule III and

who resort to guessing on conflict items. The third latent class should consist of

children who use a rule that results in the response “in balance”: the addition

rule, or the QP rule.

Initial analyses of the item types showed that the responses to the first set of

items differed from the responses to the remainder of the items. To illustrate this,

we present the selected latent class model of all conflict–weight items and com-

pare it to that of the four conflict–weight items in sets 2 to 5. Table 2 shows the

goodness-of-fit statistics of the models of all conflict–weight items and the mod-

els of the last four conflict–weight items, whereas Table 3 shows the parameters

of the selected models.

Four latent classes were needed to describe the responses to all conflict–weight

items, but only three latent classes were needed when the item of the first set was

excluded. The four-class model of all items is described. The first class (with an

estimated proportion of ) showed high estimated probabilities of

answering that the scale tips to the left side, which is the correct answer

. This class corresponded to the first expected latent class

because the response pattern matched the pattern that was expected from the use

of Rule I, Rule II, or Rule IV. The first class of the model of four items was sim-

) demon-

of

ˆ

.

π1

55

X=

( ˆ

πr

.)

WX

1

15

99

K

=

ˆ

.

πt

X= 99

394

JANSEN AND VAN DER MAAS

( ˆ

π

.)

l

CWX

1

15

95

K

=

Page 13

ilar. In the second class, the estimated probabilities of answering “in balance”

were high The class corresponded to the third expected latent

class because the response pattern in this class agreed with the pattern expected

from the use of the addition rule or the QP rule. The second class of the model of

four items had similar conditional probabilities, but the estimated proportions of

the latent classes differed between the two models (

items,in the model of four items). The estimated unconditional prob-

ability of the third latent class was rather small (

ditional probabilities of answering that the scale tilts to the side with the largest

distance were rather high

class perceived distance as the dominant dimension. However, the confidence

interval of the estimated conditional probability was on average large (ranging

in the model of all

ˆ

.

π2

16

X=

( ˆ

π

. ).

b

CWX

2

15

96

K

=

CHILDREN’S RULE USE

395

TABLE 2

Goodness-of-Fit Indices for Latent Class Models of Responses to the Individual Item Types

Number of classes

df LR

BIC

Conflict–weight items, all items

2

3

4

Four-class model, with restrictions

Conflict–weight items, first item excluded

2

3

4

Three-class model, with restrictions

Distance items, first item excluded

2

3

4

5

Four-class model, with restrictions

Conflict–distance items, first item excluded

2

3

4

Three-class model, with restrictions

Conflict–balance items, first item excluded

2

3

4

5

Four-class model, with restrictions

221

210

202

215

479.78*

255.64*

171.53

212.93

4844.94

4694.03

4663.19

4618.05

63

55

48

64

242.07*

68.95

44.28

95.24

4012.56

3892.71

3914.64

3859.07

63

56

53

49

70

251.40*

86.26*

38.49

24.23

68.10

2766.23

2647.69

2619.91

2632.28

2536.32

64

56

49

67

356.07*

80.93

45.87

105.81

4050.40

3821.87

3833.42

3773.51

63

54

47

43

66

369.83*

160.57*

72.99

54.73

109.01

4433.36

4284.03

4243.05

4251.43

4152.57

Note. N ? 779. All of the analyses of one item type concern the last four items of each type. LR,

log–likelihood ratio; df, degrees of freedom. The probability of the log–likelihood ratio is computed

by means of the parametric bootstrap procedure.

*p ? .05.

). The estimated con-

Possibly, the children in this latent

( ˆ

π

.).

r

CWX

3

15

65

K

=

ˆ

.

π3

04

X=

ˆ

.

π2

39

X=

Page 14

396

TABLE 3

Parameter Estimates for the Selected Latent Class Models of Responses to the Individual Item Types

Conditional probabilities

Item 1 Item 2 Item 3Item 4Item 5

tp(t)LBRLBRLBRLBRLBR

Conflict–weight items, all items

1 .55

2 .16

3 .04

4 .25

Conflict–weight items, first item excluded

1 .57

2.39

3.04

Distance items, first item excluded

1 .25

2 .60

3.13

4.01

Conflict–distance items, first item excluded

1 .35

2 .20

3 .46

Conflict–balance items, first item excluded

1 .42

2 .26

3 .27

4 .05

.951

.01

.33

.75

.03

.962

.02

.23

.02

.04

.653

.02

.951

.02

.27

.27

.03

.962

.08

.674

.02

.02

.653

.06

.951

.04

.31

.27

.03

.962

.04

.674

.02

.00

.653

.06

.951

.01

.21

.24

.03

.962

.14

.674

.02

.03

.653

.09

.951

.01

.22

.23

.01

.962

.13

.674

.04

.03

.653

.09

.941

.14

.20

.04

.812

.10

.02

.04

.703

.941

.15

.30

.05

.812

.00

.01

.03

.703

.941

.12

.21

.04

.812

.09

.03

.06

.703

.941

.12

.24

.02

.812

.07

.05

.06

.703

.012

.981

.493

.095

.981

.012

.48

.095

.012

.012

.03

.826

.012

.981

.704

.095

.981

.012

.28

.095

.012

.012

.02

.826

.012

.981

.493

.095

.981

.012

.41

.095

.012

.012

.10

.826

.012

.981

.704

.095

.981

.012

.30

.095

.012

.012

.00

.826

.012

.51

.981

.012

.25

.012

.981

.24

.012

.012

.10

.75

.012

.34

.16

.981

.56

.09

.012

.32

.981

.012

.49

.012

.981

.20

.012

.012

.58

.981

.012

.28

.012

.981

.14

.012

.031

.20

.031

.724

.022

.36

.963

.125

.963

.44

.022

.156

.031

.04

.031

.724

.022

.73

.963

.125

.963

.22

.022

.156

.031

.10

.031

.724

.022

.34

.963

.125

.963

.55

.022

.156

.031

.06

.031

.724

.022

.58

.963

.125

.963

.36

.022

.156

Note. t, latent class; p(t), ? proportion of latent class t; L, scale tips to left; B, balance; R, scale tips to right. The probabilities of giving the correct answer are underlined.

Superscripts refer to equality restrictions within a latent class model. Parameters with the same superscript are restricted to be equal. For every latent class model, the num-

bering of superscripts restarts. For the conflict–balance items, the side with the larger number of weights is the right side. The items conflict–weight 2 and 3, distance 3 and

5, and conflict–distance 5 are mirrored from the original configuration.

Page 15

from .53 to .73), indicating that the interpretation becomes unreliable. The third

class resembled the third class in the model of four items.

In the first three latent classes in both models, equality restrictions between the

conditional probabilities of giving the most probable response were acceptable.

This was not the case in the additional class of the model of the responses to all

conflict–weight items (with an estimated proportion of

conditional probabilities of answering “in balance” to be equal between all items

resulted in a significant deterioration in goodness-of-fit of the model. The model

fitted the data when the equality restriction on the first item was deleted. The esti-

mated conditional probabilities of answering “in balance” to the remaining con-

flict–weight items were reasonably high

children in this latent class required practice in solving the items and that, after

practicing, they used the addition rule to solve balance scale items (but still with

a considerable error rate).

The latent class models of the other item types showed similar deviant

responses to the first item. Clearly, some of the inconsistencies in the data were

caused by the deviant behavior on the first set of items. Taking this into account,

the analyses of each item type were repeated without these items. Table 2 shows

the goodness-of-fit measures obtained by fitting the models, whereas Table 3

shows the values of the parameters of the selected models.

Distance. We expected that the latent class model for distance items consisted

of two classes. The first latent class should consist of children who use Rule I and

who predict that the scale remains in balance. The second latent class should con-

sist of children who use Rule II or more complex rules and who conclude that the

scale tilts to the side with the largest distance.

Contrary to the predictions, a restricted four-class model best fit the data (see

Table 2). The first class (with an estimated proportion of

high estimated probability of answering “in balance” to the distance items.

Restricting the estimates to be equal was acceptable

matched the expectations for the first expected latent class; the children were

expected to employ Rule I. The second class (with an estimated proportion of

) showed high and consistent estimated probabilities of answering that

ˆ

.

π2

60

the scale tips to the left side

distance items (underlined in Table 3). Equality restrictions on these estimates

were acceptable. The class matched the expected second class because these

children might employ Rule II, Rule III, Rule IV, the QP rule, or the addition

rule. The latent class model contained two additional latent classes. The children

in the third class (with an estimated proportion of

the answer “in balance” and the correct answer. The conditional probabilities of

answering the distance items correctly were restricted to be equal between items

with the same distance difference, which did not significantly deteriorate the

goodness-of-fit of the model. The distance difference is the difference between

the distances at which the weights are placed. The estimated probabilities of

answering items with a distance difference of 2 were larger

). Restricting the

We suggest that the

) showed a

This class

X=

( ˆ

π

.).

b

DX

1

25

98

K

=

ˆ

.

π1

25

X=

( ˆ

π

. ).

b

CWX

4

25

67

K

=

ˆ

.

π4

25

X=

CHILDREN’S RULE USE

397

, which is the correct answer on the

) vacillated between

than

( ˆ

π

.)

,

13

3 5

70

DX=

ˆ

.

π3

13

X=

( ˆ

π

.)

l

DX

2

25

98

K

=

#### View other sources

#### Hide other sources

- Available from Han L J van der Maas · Jun 2, 2014
- Available from Han L J van der Maas · Jun 2, 2014