ArticlePDF Available

The role of understanding in word problems


Abstract and Figures

Word problems are notoriously difficult to solve. We suggest that much of the difficulty children experience with word problems can be attributed to difficulty in comprehending abstract or ambiguous language. We tested this hypothesis by (1) requiring children to recall problems either before or after solving them, (2) requiring them to generate final questions to incomplete word problems, and (3) modeling performance patterns using a computer simulation. Solution performance was found to be systematically related to recall and question generation performance. Correct solutions were associated with accurate recall of the problem structure and with appropriate question generation. Solution “errors” were found to be correct solutions to miscomprehended problems. Word problems that contained abstract or ambiguous language tended to be miscomprehended more often than those using simpler language, and there was a great deal of systematicity in the way these problems were miscomprehended. Solution error patterns were successfully simulated by manipulating a computer model's language comprehension strategies, as opposed to its knowledge of logical set relations.
Content may be subject to copyright.
405-438 (1988)
The Role of Understanding in Solving Word Problems
Yale University
University of Colorado
Word problems
are notoriously difficult to solve. We suggest that much of the
difliculty children experience with word problems can be attributed to difficulty in
comprehending abstract or ambiguous language. We tested this hypothesis by (1)
requiring children to recall problems either before or after solving them, (2) re-
quiring them to generate final questions to incomplete word problems, and (3)
modeling performance patterns using a computer simulation. Solution perfor-
mance was found to be systematically related to recall and question generation
performance. Correct solutions were associated with accurate recall of the prob-
lem structure and with appropriate question generation. Solution “errors” were
found to be correct solutions to miscomprehended problems. Word problems that
contained abstract or ambiguous language tended to be miscomprehended more
often than those using simpler language, and there was a great deal of system-
aticity in the way these problems were miscomprehended. Solution error patterns
were successfully simulated by manipulating a computer model’s language com-
prehension strategies, as opposed to its knowledge of logical set relations. o 1st~
Academic Press, Inc.
Word problems are notoriously diffkult to solve. In the study presented
here, one type of arithmetic problem was solved by all first-grade children
when it was presented in numeric format, but by only 29% of the children
when it was presented as a word problem. Nationally, children perform 10
to 30% worse on arithmetic word problems than on comparable problems
presented in numeric format (Carpenter, Corbitt, Kepner, Linquist, &
Reys, 1980). More importantly, as students advance to more sophisti-
cated domains, they continue to find word problems in those domains
more diEcult to solve than problems presented in symbolic format (e.g.,
algebraic equations).
This work was supported by National Science Foundation Grant BNS-8309075 to Walter
Kintsch and James G. Greeno. We thank Arthur Samuel, Kurt Van Lehn, and an anony-
mous reviewer for helpful comments on this manuscript. Requests for reprints should be
sent to Denise D. Cummins, Psychology Department, University of Arizona, Tucson, AZ
85721. 405 oolo-0285/88 $7.50
Copyright 0 1988 by Academic Press, Inc.
Au rights of reproduction in any form reserved.
This discrepancy in performance on verbal and numeric format prob-
lems strongly suggests that factors other than mathematical skill contrib-
ute to problem solving success. In this paper, we explore the contribution
of a factor we believe to be heavily involved in word problem solving: text
comprehension processes.
We argue that problem texts should be taken
seriously as valid discourse entities. Like narratives, word problems re-
quire skillful mapping of text input onto the reader’s knowledge base if
proper comprehension is to be achieved. In the case of narratives, the
reader must map linguistic input onto world knowledge concerning (e.g.)
actors and their motives. In the case of word problems, the solver must
map linguistic input onto knowledge about the problem domain. Nowhere
are the ramifications of a breakdown in these mappings more strongly felt
than in the domain of children’s problem solving, where developing lin-
guistic skills can play havoc with problem solving strategies. Accordingly,
we, like many other researchers, have chosen this domain as a starting
point for understanding how the nature of problem solving is shaped and
colored by a solver’s verbal comprehension skills.
Solving Arithmetic Word Problems
Not all word problems are alike. Some problems are much easier to
solve than others. For example, even very young children rarely make
errors on Combine 1 problems (See Table l), but frequently make errors
on Compare 1 problems. This differential performance changes with age,
with performance on these problems becoming nearly equivalent over
time. Because problem difficulty patterns change with age, many re-
searchers have adopted a Piagetian view of solution performance charac-
teristics. According to this view, a problem proves troublesome for a
child only insofar as the capacities required to process the problem are
not yet possessed by the child. While this general view is fairly uncon-
troversial, researchers disagree as to
capacities develop over time
to improve solution performance. Explanations generally fall into two
camps: those that attribute improved solution performance to the devel-
opment of logico-mathematical knowledge and those that attribute such
improvement to the development of language comprehension skills. We
discuss each of these in turn.
The logico-mathematical development view.
According to the logico-
mathematical explanation of solution difficulty, children fail to solve cer-
tain problems because they do not possess the conceptual knowledge
required to solve them correctly. Support for the logico-mathematical
development explanation was offered by Riley, Greeno, and Heller (1983)
and Briars and Larkin (1984).
Riley et al. argue that problem difficulty depends in part on the prob-
lem’s semantic structure. Nonetheless, they attribute developmental
trends in problem solving skill to the acquisition of knowledge concerning
logical set relations. To explicate this view, they proposed models of
good, medium, and poor problem solving using a schema type formalism.
Set knowledge is represented in these models as schemata that specify
relations among sets of objects. Their model of good problem solving
possesses elaborate schemata that specify high level set relations, such as
part-whole, or subset-superset relations. In contrast, their model of poor
problem solving ability possesses impoverished schemata that are capable
of representing the integrity of individual sets but not their part-whole
Briars and Larkin also proposed a model of problem solving ability that
simulates solution performance characteristics. Although somewhat tem-
pered with “set language” and memory resource constraints, the primary
mechanisms contributing to solution performance in this model are deti-
ciencies in conceptual knowledge. Unlike Riley et al., however, this con-
ceptual knowledge includes such things as the ability to understand subset
equivalences and the ability to understand that things can be undone in
The linguistic development view. The linguistic development view
holds that certain word problems are difficult to solve because they em-
ploy linguistic forms that do not readily map onto children’s existing
conceptual knowledge structures. For example, a child may understand
par-whole set relations and yet be uncertain as to how the comparative
verbal form (e.g., How many more X’s than Y’s?) maps onto them. If this
were the case, we would say that the child had not yet acquired an
interpretation for such verbal forms.
Importantly, the linguistic development view implies that word prob-
lems that contain certain verbal forms constitute tests of verbal sophisti-
cation as well as logico-mathematical knowledge. Accordingly, solution
errors on these problems may reflect deficiencies in semantic knowledge,
logico-mathematical knowledge, or both. To test the contributions of
each, several researchers have manipulated problem wording and ob-
served its effects on solution performance.
For example, consider the following problem:
There are 5 birds and 3 worms.
How many more birds are there than worms?
This is a relatively difficult problem for children, with correct perfor-
mance ranging from 17% for nursery school children to 64% for first
graders. The logico-mathematical view holds that this problem is difficult
because it requires sophisticated understanding of part-whole relations,
which nursery school children presumably do not yet possess. Hudson
(1983), however, reported dramatic improvements in solution perfor-
mance on this type of problem when the final line was changed to the
How many birds won’t get a worm?
Correct performance on this version of the problem ranged from 83% for
nursery school children to 100% for first graders. Importantly, even nurs-
ery school children exhibited sophisticated set knowledge when solving
this problem. They did not, for example, simply line up the birds and
worms (on an accompanying picture) and count the singletons. Instead,
they solved the problem by counting the smaller set (worms) to determine
its cardinality, counting out a subset of the larger set (birds) to the same
cardinality, and then counting the number of birds remaining and retum-
ing that number as the answer. By using this “match-separate” strategy,
even nursery school children evidenced a tacit understanding of one-
to-one correspondence among sets that possess equivalent cardinality
as well as a sophisticated grasp of
relations. Similar results were found by DeCorte, Verschaffel, and De-
Winn (1985), who improved solution performance by manipulating lin-
guistic aspects of problem texts in such a way as to make the semantic
relations among sentences clearer. In fact, the inlluence of problem word-
ing was apparent in the Riley et al. data. For example, mean solution
accuracy on Compare 4 (see Table 1) was 25% higher than that on Com-
pare 5 (for second graders), even though these two problems describe
identical part-whole set structures, albeit with different words. The same
discrepancy was noted for Compare 3 (80%) and Compare 6 (35%), both
of which describe the same problem structure with different wordings.
Empirical results such as these are damaging to the logico-
mathematical explanation of solution difficulties. If children fail to solve
certain problems because they do not possess the conceptual knowledge
required to solve them, one would not expect minor wording changes to
improve solution performance. Yet this is precisely what is observed.
Instead, these results are entirely consistent with the linguistic develop-
ment view of problem solving development, since they suggest that chil-
dren find certain problems difficult because they cannot interpret key
words and phrases in the problem text.
An unanswered question in this work, however, is just how children do
interpret the problems they are asked to solve, particularly those that
employ troublesome language. This is of some importance because the
errors that children make are often counter-intuitive. For example, the
most commonly committed error on the birds/worms problem is to return
the large number “5” as the answer to the problem. In fact, these “given
number errors” constitute a significant proportion of errors committed on
word problems (Riley et al., 1983; DeCorte et al., 1985). It is not clear
why children believe that the solution to a problem could consist of a
number already given in the problem.
the work to be described here, we attempted
obtain evidence
concerning the interpretations that children apply to standard word prob-
lems. By doing so, we also tested the two opposing viewpoints concerning
solution errors. We accomplished these goals in the following way. In
Experiment 1, we required children to recall problem texts either before
or after solving them, thereby providing us with information concerning
their problem representations. We then compared these recall protocols
to solution errors. We predicted that recall errors and solution errors
would vary systematically in that solution “errors” would constitute cor-
rect solutions to miscomprehended problems. Second, we compared ob-
served error patterns with error patterns obtained when manipulating a
computer simulation program’s linguistic and logico-mathematical knowl-
edge. We predicted that the best match between the two sets of patterns
would be obtained when linguistic knowledge was altered rather than
when logico-mathematical knowledge was altered.
In Experiment 2, we tested children’s interpretations of word problems
by requiring them to generate final questions to incomplete word prob-
lems. We reasoned that in order to complete a problem with an appro-
priate question, it is necessary to understand it properly. We therefore
predicted that solution accuracy would vary systematically with question
generation performance.
In Experiment I, fast grade children were required to recall word prob-
lems either before or after solving them. The word problems that were
used in this study were the same ones used by Riley et al. (1983). In
addition, these children were required to solve the same problems in
numeric format. Our hypotheses were as follows: We predicted that so-
lution performance would vary systematically with recall performance,
that is, that most of the variance in solution performance would be at-
tributable to comprehension success. Moreover, we predicted that solu-
tion errors would in fact be correct solutions to miscomprehended prob-
lems, and that miscomprehensions would be primarily in the direction of
transformations of difficult problems to easier ones. That is, we expected
that when faced with particularly difBcult linguistic forms, children would
try to simplify them to bring them more in line with linguistic forms with
which they were more familiar. Finally, we predicted that the most com-
monly committed solution errors could be simulated by manipulating ver-
bal comprehension.
Subjects. Thirty-eight first grade children from the Boulder Valley School District served
as participants in the study. The children were tested late in the school year (during May).
Apparatus and materials. The 18 story problems used by Riley et al. (1983) served as
stimulus materials in the current study. These 18 problems are presented in Table 1. They
consist of six instances within each of three major problem types. The problem types are as
follows: Combine problems, in which a subset or superset must be computed given infor-
Problems Used in Experiment 1 (Adapted from Riley, Greeno, & Heller, 1983)
Combine problems
1. Mary has 3 marbles. John has 5 marbles. How many marbles do they have
2. Mary and John have some marbles altogether. Mary has 2 marbles. John has 4
marbles. How many marbles do they have altogether?
3. Mary has 4 marbles. John has some marbles. They have 7 marbles altogether. How
many marbles does John have?
4. Mary has some marbles. John has 6 marbles. They have 9 marbles altogether. How
many marbles does Mary have?
5. Mary and John have 8 marbles altogether. Mary has 7 marbles. How many marbles
does John have?
6. Mary and John have 4 marbles altogether. Mary has some marbles. John has 3
marbles. How many does Mary have?
Change problems
1. Mary had 3 marbles. Then John gave her 5 marbles. How many marbles does Mary
have now?
2. Mary had 6 marbles. Then she gave 4 marbles to John. How many marbles does
Mary have now?
3. Mary had 2 marbles. Then John gave her some marbles. Now Mary has 9 marbles.
How many marbles did John give to her?
4. Mary had 8 marbles. Then she gave some marbles to John. Now Mary has 3
marbles. How many marbles did she give to John?
5. Mary had some marbles. Then John gave her 3 marbles. Now Mary has 5 marbles.
How many marbles did Mary have in the beginning?
6. Mary had some marbles. Then she gave 2 marbles to John. Now Mary has 6
marbles. How many marbles did she have in the beginning?
Compare problems
1. Mary has 5 marbles. John has 8 marbles. How many marbles does John have more
than Mary?
2. Mary has 6 marbles. John has 2 marbles. How many marbles does John have less
than Mary?
3. Mary has 3 marbles. John has 4 marbles more than Mary. How many marbles does
John have?
4. Mary has 5 marbles. John has 3 marbles less than Mary. How many marbles does
John have?
5. Mary has 9 marbles. She has 4 marbles more than John. How many marbles does
John have?
6. Mary has 4 marbles. She has 3 marbles less than John. How many marbles does
John have?
mation about two other sets; Change problems, in which a starting set undergoes a transfer-
in or transfer-out of items, and the cardinality of the start set, transfer set, or result set must
be computed given information about two of the sets; Compare problems, in which the
cardinality of one set must be computed by comparing the information given about the
relative sizes of the other set sizes. The instances within each problem type differ in terms
of which set cardinality must be computed and the wording of the problems. The story
problems used in the present study all contained “Mary and John” as actors and “marbles”
as objects. This was done to reduce the memory load required to comprehend the problem.
The child needed only to attend to the relationships among the sets and to remember the
numbers stated in the problems.
Each child solved 18 problems. Half of the 18 problems were first solved and then re-
called; the remaining half were first recalled and then solved. Two versions of problem
presentation were used to ensure that all 18 problems were tested in each solve-recall
condition. In the fust version, one half of the problems in each problem type was assigned
to the Solve-Recall condition, and the remaining halves were assigned to the RecallSolve
condition. In the second version, these assignments were reversed so that version 1 Solve-
Recall problems became RecallSolve problems, and version 2 RecallSolve problems be-
came Solve-Recall problems. The presentation version served as a between-subjects factor.
The number triples used in the story problems included only the numbers 1 through 9 and
were chosen such that correct answers were (a) less than 10 and (b) not the same as a
number used in the story. Nine triples that satisfied these constraints were chosen for use
in the problems: 3-2-5, 4-2-6, S-2-7, 6-2-8,7-2-9,4-3-7, 5-3-8, 6-3-9, and 5-4-9. Half of these
triples were tested as addition problems and half as subtraction. Across subjects, these
triples were assigned to problems such that each triple was tested in each of the eighteen
In addition to the story problems, these number triples were tested as numeric format
problems. Each child received the same number assignment condition for both the story
problem and numeric format tests. For example, a given child received 3 + 2 = ? as both
a story problem (Combine 1) and as a numeric format problem. The child’s performance on
that equation could therefore be observed under both the story and numeric formats. The
numeric formats mirrored the story problem structures to which they corresponded. Note
that in certain cases (e.g., Change 5) this meant that the equation to be solved contained an
unknown on the left side of the equation (e.g., ? + 2 = 5). All numeric format problems
were presented in vertical sentence form; equations such as ? + 2 = 5 were written as an
open box with + 2” underneath it, a line underneath + 2,” and “5” underneath the line.
Procedure. Children were tested individually in a quiet room in their schools during
school hours. In keeping with the methodology of Riley et al. (and others), all problems were
presented orally, and the child was required to solve them without benetit of paper and
pencil. The sessions were recorded on a small, unobtrusive tape recorder. The child was
informed of the presence of the tape recorder, but was assured that only the experimenter
would hear the tape (i.e., parents and teachers would not). No child seemed uncomfortable
having the session taped.
Problem presentation was randomized for each child. The session began with instructions,
followed by practice problems. The practice problems consisted of two solve-recall and two
recall-solve problems. Children were assisted in solving and recalling these if required.
Once the experimenter was satisfied that the child understood the procedure, the experi-
mental session was begun. Children were not assisted in solving or recalling experimental
problems. They also were not told whether a problem was to be solved first or recalled first
until after the problem had been read. This was done to ensure that the strategies used to
solve and recall the problems would be the same in both conditions. Following the oral story
problem session, the child was given a sheet with the numeric problems on it and was
required to solve these.
Combine Change Compare
Problem Type
FIG. 1. Proportion of correct solutions for the problems shown in Table 1 when presented
as word problems (W) and when presented in numerical form (N).
Protocols were scored for the following: (a) solution accuracy on verbal
format problems, (b) solution accuracy on numeric format problems, and
(c) structural recall of each problem. The data are pooled over the recall-
before and recall-after conditions since initial analyses showed this factor
to be nonsignificant. Unless otherwise stated, rejection probability was
.05. Significant interactions from ANOVAs were followed by simple ef-
fects tests (Keppel, 1973). Significant main effects involving more than
one mean were tested using Tukey’s test of pairwise comparisons.
Recall and Solution Accuracy
Figure 1 depicts the proportion of subjects who correctly solved each of
the 18 word- and numeric-format problems. These results are quite similar
to the results of Riley (1981) and Riley et al. (1983), with the possible
exception that our students performed slightly higher on the more difficult
Change and Compare problems. As expected, performance on numeric
problems was consistently higher than that on verbal problems. Some
problems were solved correctly more than three times as often in numeric
format than in verbal form. Some numeric formats, however, also proved
troublesome for the children. These were the number sentences that con-
tained variables, as in “? -t- 5 = 8” (i.e., Change 3, 4, 5, and 6). First
grade children in the Boulder Valley School District are not routinely
exposed to number sentences of these forms. Given this fact, it is not
surprising that children performed less well on these than on the more
typical “3 + 2 = ?.” What is surprising is that these number sentences
were solved correctly nearly two-thirds of the time, despite their relative
As stated earlier, subjects’ verbal recall protocols were scored for ac-
curacy of
structural recd.
A correct structural recall was any recall that
preserved the logical relations among sets in the original problem. For
example, consider Compare problems 4 and 5. These two problems de-
scribe the same problem structure using different wording. In both cases,
the small set must be derived given information about the large and dif-
ference sets. “Recalling” Compare 5 as Compare 4, therefore, constitutes
accurate structural recall because the original problem’s logical structure
is preserved. Structure-preserving recall transformations such as these
were observed on 12% of the trials; along with veridical reproductions
(45%), they constitute our measure of correct structural recall. Together,
they constituted 57% of all recall instances.
Figure 2 illustrates proportion correct structural recall. Like the solu-
tion results, the recall data are also in agreement with those of Riley
(1981), although the greater sensitivity of our recall measure provides a bit
more information than the repetition measure used in the Riley study. As
predicted, the overall pattern of recall accuracy closely resembled that of
word problem solution accuracy, suggesting a strong relationship be-
tween the two.
We predicted that word problem performance would vary systemati-
cally with recall performance but not with numeric format performance.
To test this hypothesis, each subject’s protocol was scored for proportion
of correct word problem solutions, numeric format solutions, and struc-
tural recall across the 18 problems. A regression model was then con-
structed to predict each subject’s overall word problem solution perfor-
mance as a function of his or her overall performance on the structural
recall and numeric format tasks. A forward selection procedure was used
Combine Change Compare
Problem Type
FIG. 2. Proportion correct structural recall for the problems shown in Table 1.
to select candidates for entry in the model. Only one variable met the 5%
significance level for entry into the model, that of structural
simple model accounted for 72% of the variance in solution accuracy
(F(1,36) = 93.62,
= .Ol,
< .OOOl), supporting our prediction that
performance on word problems depends primarily on successful compre-
Finally, to ensure that our observed relationship was not simply re-
flecting subject variation (i.e., talented subjects performing well on all
tasks, less-talented subjects performing poorly on all tasks), we calcu-
lated a 2 x 2 contingency table for each subject indicating the number of
times problem recall and solution were equivalent in accuracy (i.e., both
right or both wrong) or were different in accuracy (i.e., one right, the
other wrong). The expected frequency was then computed for the right-
right case, and the deviation between observed and expected frequency in
that case calculated. These deviations were found to be significantly
greater than zero, t(37) = 6.75,
< .OOl, indicating that a dependency
between solutions and structural recall existed for individual subjects,
regardless of talent.
Miscomprehensions and Error Types
While the quantitative relationship between recall and solution perfor-
mance strongly suggests that solution difficulties are driven by break
downs in story comprehension, we can offer more direct evidence by way
of qualitative error analyses. We assume that when a child recalls a prob-
lem, he or she describes the problem representation he or she constructed
during a solution attempt. The nature of a miscomprehension therefore
should be related to the type of solution error made. In the following
discussion, we will describe the relationships we noted between compre-
hension and solution errors.
Types of miscomprehensions.
Aside from verbatim recall, subjects’
recall protocols could be classified into six categories. The first, termed
structure-preserving transformations
(SP), was mentioned in the struc-
tural recall analysis and comprised 12% of all recall trials. These were
occasions on which the wording of the problem was changed during re-
call, but the all-important mathematical relations among sets was main-
tained (e.g., a subtraction Compare 5 problem became a subtraction Com-
t Separate regressions were also performed on the three problem types. The results were
not appreciably different than the overall regression, with the exception that the regression
coefficient for numeric accuracy was marginally significant for Change problems (b, = .22,
p < .ll). This was not surprising since, as noted earlier, four out of these six problems
contain unknowns in their number sentences, and first grade children are not familiar with
such forms.
pare 4 problem). One interpretation of these transformations is that chil-
dren are reconstructing the text base from their internal representation of
the problem structure, that is, given their comprehension of the logical set
relations. If diftlcult problems are difftcult because their text bases are
unnecessarily complex, then we would expect children to show evidence
of simplifying the text base during recall, that is, transforming a complex
subtraction story such as Change 5 into a simpler subtraction story such
as Change 2. To test this prediction, we divided each of the three problem
types into easy and hard problems based on solution accuracy levels
reported by Riley et al. Easy problems included Combine 1, 2 and 3,
Change 1,2 and 3, and Compare 1,3, and 4. The remaining problems were
classified as hard problems. As predicted, more transformations of hard
problems into easy problems were observed than vice versa. Of the 25
subjects who produced these recall transformations, 4 exhibited tied
scores and 18 exhibited the predicted bias,
p <
.Ol via the sign test.
Moreover, when transforming a problem during recall, children seemed to
be sensitive to problem type. On 77% of these misrecall occasions, the
problem was transformed into simpler problems of rhe same
type, e.g.,
difficult Compare problems were transformed into easier Compare prob-
lems, difficult Change into easier Change. The only exception was a slight
tendency to transform difficult problems of all types into Combine 1, the
problem that occurs most often in arithmetic textbooks (Stigler, Fuson,
Ham, & Kim, 1986). Combine 1 seems to be a “default” problem, and
when all else fails children resort to what is most familiar to them.
The second type of misrecall category involved
(SV). This category comprised 12% of all recall trials. It
included occasions in which problems were transformed into other legit-
imate problems, but the transformation violated the important mathemat-
ical relations of the original problem (e.g., a subtraction Compare 5 be-
came an addition Compare 3). In these cases, both wording and structure
changed. As in SP transformations, these misrecalls showed a bias toward
oversimplification: Of the 21 subjects who produced these transforma-
tions, 1 exhibited a tied score and 20 exhibited the predicted bias,
p <
via the sign test.
The third type of misrecall (NP) comprised 8% of all trials and involved
recalling the problem as the following nonsense problem:
(a) Mary has 5 marbles.
John has 4 marbles.
How many does (Mary, John) have?
Mary had some marbles.
John gave her 3 more.
Now Mary has 7 marbles.
How many does Mary have now?
These are nonsense problems since they require no computation and
instead simply ask for one of the numbers given in the problem.
The fourth category (2s) included recall instances such as the follow-
(b) Mary and John have 5 marbles altogether.
Mary has 3 marbles.
How many do they have altogether?
This misrecall contains two references to the superset in the problem,
once in the body of the problem, and once as a request for the superset
cardinality. This is similar to problem category NP in that the question
requests a number already given in the problem. Misrecalls such as these
were observed on 4% of all trials.
The fifth category (OS), contrasts sharply with category 2s. Instances in
this category were observed on 6% of all trials. They consisted (e.g.) of
the following:
(c) Mary had some marbles.
John gave her 3 more.
How many did Mary have in the beginning?
Here, the superset specification line (i.e., “Now Mary has 7 marbles.“) is
simply left out of the problem altogether. In contrast to the other trans-
formations, this category does not seem to be a “transformation” at all,
but rather a legitimate “misrecall,” or memory error. A line from the
problem was simply left out or forgotten. Compare this to category 2S,
which seems to suggest a true misconceptualization of the problem struc-
ture .
The sixth and final category simply included all misrecall instances that
did not fit into the above categories either because the child could re-
member nothing of the problem, or recall was so confused it could not be
classified. This category comprised 13% of all trials.
In summary, subjects’ miscomprehensions appeared to be systematic
in that they could be classified into five meaningful categories. It is also
interesting to note the distributions of problem types across these cate-
gories. When Compare problems were miscomprehended in a classifiable
way, they tended to fall into two categories, SV (38%) and NP (31%).
Change problems also tended to fall into the SV category (40%) and the
NP category (20%). Combine problems, on the other hand, tended to be
miscomprehended as double-superset problems (33%). Clearly, some as-
pect of these problems tends to invite certain interpretations from chil-
dren. We return to this question below; here, we turn to the more impor-
tant question of how these recall miscomprehensions affected solution
Miscomprehensions and solution performance.
Aside from correct so-
lutions (55%), subjects in this study produced wrong operation errors
(8%), given number errors (18%), arithmetic errors (ll%), and unclassi-
fiable errors (8%).
Wrong operation errors are
errors in which the child
used an incorrect arithmetic operation in an attempt to solve the problem,
i.e., added when subtraction was required or subtracted when addition
was required.
Given number errors
are errors in which the child returned
one of the numbers in the problem as the answer to the problem. These
types of errors are particularly important to our analysis because they
betray faulty comprehension of the problem structure. Moreover, they
are frequently reported in the literature (Riley et al., 1983; Decorte et al.,
1985) and comprised the majority of errors committed in this study (58%).
We predicted that there would be a systematic relationship between
story miscomprehension and solution errors. In particular, we predicted
that “errors” would often be “correct answers” to miscomprehended
stories. Such relationships were observed; they are presented in Table 2.
We discuss this table in some detail below.
Beginning at the top of the table, notice that
structure-preserving trans-
(SP) were most often associated with corrects answers (64%)
and arithmetic errors (25%).
Structure-violating transformations
(SV), on
the other hand, were most often associated with wrong operation errors
(42%). In fact, only 14% of these SV transformations were solved cor-
rectly. Looked at from the other side, 64% of ALL wrong operation
errors tended to be associated with this type of transformation. Intu-
Miscomprehensions and Conceptual Solution Errors
Response type
Recall type co wo SPN SBN AR OTH Total
Structure Preserving (SP) 51 1 5 2 20 1 80
Structure Violating (SV) 11 34 16 4 12 3 80
Nonsense Problem (NP) 10 2 14 24 4 1 55
Double Superset (2s) 4 5 13 0 5 3 30
Partial Recall (OS) 25 1 2 4 5 4 41
Correct (CO) 2.51 3 11 13 20 13 311
OTH (Other) 25 7 12 5 6 32 87
Total 377 53 73 52 72 57 684
Note. Frequencies are based on 18 observations from each of 38 subjects. WO, Wrong
operation errors; SPN, superset given number errors; SBN, subset given number errors;
AR, arithmetic error; OTH, unclassifiable error; CO, correct solution. See the text for an
explanation of recall types.
itively, these results make sense. Transforming a problem in a structure-
violating way meant turning an addition problem into a subtraction prob-
lem, and vice versa. It follows, then, that a wrong operation error would
occur on these trials.
Nonsense-problem transformations were associated overwhelmingly
with given number errors (69%), and in virtually all cases, the number
returned was the cardinality of the set specified in the final line of the
transformed problem. Correct solutions were observed on only 18% of
these trials. Again, these results make sense intuitively. If a problem were
transformed into the nonsense problems described above, the transfor-
mation would produce a “problem” that requests a given number. Ac-
cordingly, a given number is returned as the answer.
Double-superset transformations (2s) were associated with given num-
ber errors (43%) and wrong operation errors (16%). All but one of the
given number errors involved returning the superset quantity as the an-
swer. Virtually all of the wrong operation errors consisted of incorrectly
adding rather than subtracting the numbers in the problem. Correct so-
lutions occurred on 17% of these trials, and the majority of these were on
trials in which the original problem was an addition problem as well. The
addition of a superset line to the transformed problem therefore seems to
be interpreted by children in one of two ways, namely: (1) Add the num-
bers or (2) return the large number (i.e., return how many there are
Interestingly, partial-recall (OS) transformations were associated most
often with correct solutions. On 61% of the trials in which this misrecall
occurred, the problem was solved correctly anyway. This type of trans-
formation therefore seems to be a genuine “misrecall,” or memory error,
as suggested earlier.
In summary, structural recall, both correct and erroneous, provided
clear evidence that children’s problem solving strategies are determined
by their comprehension of the problem stories. Moreover, frequently
observed conceptual errors were related to story miscomprehension in
systematic ways. These conceptual errors were found to be correct an-
swers to m&comprehended stories. Subtraction tended to be used to
solve addition problems that were miscomprehended as subtraction prob-
lems, and vice versa. Given number errors tended to occur on trials where
problems were miscomprehended as nonsense problems or double-
superset problems, that is, as problems that simply request one of the
numbers in the problems. Children, therefore, correctly perform opera-
tions that they believe are requested by the problem statements.
The more important question, however, is why children tend to mis-
comprehend problems in the ways that they do. As is apparent in our
data, these miscomprehensions were not idiosyncratic; recall protocols
could be classified in meaningful ways. This suggests that certain aspects
of these problems tend to invite interpretations that are similar across
subjects. The alternative views we contrast here to explain these misin-
terpretations are (1) that comprehension failures reflect deficiencies in
conceptual knowledge, and (2) that comprehension failures reflect inad-
equate mappings from English phrases onto existing conceptual knowl-
edge. To test these two views, we employed a computer simulation
Model Predictions of Solution Performance
In order to evaluate the relative contributions of logico-mathematical
knowledge and linguistic knowledge to solution accuracy, we required a
computer simulation model to solve these word problems under four
knowledge conditions: full knowledge, impaired logico-mathematical
knowledge, and two versions of impaired linguistic knowledge. The sim-
ulation is based on a model of children’s problem solving proposed by
Kintsch and Green0 (1985). A full description of the computer model is
given by Dellarosa (1986) and Fletcher (1985). We offer here a rather
detailed summary of the model due to its importance in interpreting our
The most fundamental aspect of the model is that it solves problems
through an interaction of text-comprehension processes and arithmetic
problem solving strategies. Its ability to solve them correctly, therefore,
depends on the integrity of both types of knowledge.
In solving problems, the computer model first comprehends the story
by building proposition frames which represent the story’s text base (van
Dijk & Kintsch, 1983). Numeric information, however, is given special
treatment. Such information is used to build representations of sets,
called set frames. In a sense, the simulation “understands” numbers as
sets of objects. The local relations among these sets are captured in su-
perschemata, which are larger set frames that have whole sets as their
components. A superschema in this model is essentially a blended repre-
sentation of story and problem structure.
There are three basic types of superschemata in the model, SUPER-
SET, TRANSFER, and COMPARE. They represent text-driven map-
pings between part-whole set relations and the story situations described
in COMBINE, CHANGE, and COMPARE problems, respectively. For
example, SUPERSET is a superschema that contains three sets, two of
which are
of the third (e.g., 3 dolls and 2 teddy bears = 5 toys). This
is the most transparent mapping. TRANSFER-IN/OUT are supersche-
mata that identify an original set, a set of objects transferred into or out
of that set, and a set representing the results of the transfer (e.g., Mary’s
5 toys, John’s gift to Mary of 2 toys, Mary’s resulting 7 toys). This is
essentially a part-part-whole structure that has been contextualized to
map smoothly onto CHANGE type problems. COMPARE is a super-
schema that identities two sets and the set representing the difference
between their cardinalities (e.g., John’s 5 toys, Mary’s 2 toys, and the 3
toys Mary has more than John). Again, this is essentially a whole-
par-part structure that has been contextualized to map smoothly onto
COMPARE type problems.
Sets are assigned to the slots in these contextualized schemata based on
the relations among ,propositions in the text base and the sets already
existing in short-term memory. For example, if the current line of the text
base contains a transfer proposition (i.e., GAVE MARY JOHN (5
MARBLES)), and a set belonging to MARY already exists, then the set
already in existence is assigned to the role of the STARTSET and the set
referred to in the proposition (i.e., 5 MARBLES) is created and assigned
the role of TRANSFERSET. The existence of these two sets and the
transfer proposition then triggers the creation of a global TRANSFER-IN
It is important to note that these schemata are constrained to direct
mappings of propositional structures onto logical structures. Essentially,
that means that they are constrained to directly mirroring the actions or
descriptions in the text base. Certain problems, however, do not lend
themselves readily to this proposition-schema mapping; instead, they re-
quire that inferences be made concerning abstract part-whole relations.
Take, for example, Changes 5 and 6. These problems start out with an
unknown quantity. Transfers cannot be made into or out of an unknown
quantity. As a result, the simulation (and presumably children) must infer
super-set and result-set role assignments given the
story situa-
tion. The simulation does this by way of conversion rules. These rules are
simply mappings from complete TRANSFER-IN/OUT superschemata (in
which the start set is unknown) onto abstract part-whole schemata. Sim-
ilar conversions are done for Compares 3 through 6 since these problems
do not specify direct comparisons between two set quantities but instead
require inferences about which set among the three is the superset.
Consequently, the simulation also contains other decontextualized
knowledge concerning part-whole relations, in addition to these contex-
tualized schemata. The most important is a decontextualized part-whole
superschema. Assignment of sets to this schema is done either through
the contextualized conversion rules mentioned above or through decon-
textualized reasoning strategies. For example, the model assumes that the
least specified of three sets of like objects can be assigned the role of
WHOLE in a part-whole schema (e.g., windows = big windows and
small windows). It can also make assignments based on class membership
(e.g., 5 toys = 3 dolls and 2 teddy bears) and on conjunction information
(e.g., John & Mary’s 7 marbles = John’s 5 marbles and Mary’s 2 mar-
bles.) In each case a part-whole (or SUPERSET) superschema is created
to capture the logical relations among the sets specified in the problems.
Finally, the presence of superschemata triggers arithmetic counting
procedures which produce answers to the problems. Failure to produce
an adequate superschema (i.e., misunderstanding the problem) causes the
program to use default strategies to produce a “best-guess” answer. An
example of such a default strategy is to search memory to determine
whether the answer is already known, that is, if a set that matches the
requested specifications has already been created. Another default strat-
egy is to mine the text base for key words (e.g., altogether, “in the
beginning”) that might cue a solution procedure. The effect of these de-
fault strategies will become apparent shortly.
When given the 18 problems to solve, the Dellarosa (1986) simulation
model solved all 18 without error, indicating that it and the Kintsch and
Green0 (1985) model upon which it is based are sufficient models of
children’s problem solving. More germane to our discussion here, how-
ever, is its usefulness in explaining children’s errors. Specifically, we
required the model to attempt these same problems under conditions of
impaired knowledge and compared its performance to that of children.
Presented in Table 3 are the answers produced by the simulation under
each of three knowledge impairment conditions, along with children’s
errors observed in this study.
knowledge. To test the contribution of conceptual
knowledge, we removed the simulation’s decontextualized knowledge
concerning par-whole relations and required it to solve the 18 problems.
The first thing to notice is that without conceptual knowledge, the
simulation’s performance matches that of children on four problems. Its
solution protocols can be described as attempts to model the actions in
the story, relying on linguistic knowledge and default strategies to obtain
answers. Relying on its “altogether means add” key word strategy, it
solves the addition Combine problems correctly (Combines 1 and 2), but
produces wrong operation errors on the subtraction Combine problems
(Combines 3 through 6). Children, however, produced given number er-
rors most frequently on these problems, with the exception of Combine 5.
On this problem, both simulation and children produced wrong operations
errors. Also, like children, the simulation had little difficulty solving
Change 1 and 2 problems because these two describe only simple transfer
operations. It cannot solve Changes 3 through 6, however, because the
story actions describe transfers involving unknown quantities, and it has
no way of mapping these onto part-whole structures. It cannot resort to
its default strategy of simply returning the quantity of the set specified in
the last line of the problem because that quantity is “SOME,” and that is
Characteristic Errors: Observed and Simulated-Experiment 1
Children’s errors Simulation error
Combine 1
Combine 2
Combine 3
Combine 4
Combine 5
Combine 6
Change 1
Change 2
Change 3
Change 4
Change 5
Change 6
Compare 1
Compare 2
Compare 3
Compare 4
Compare 5
Compare 6
Total matches (*)
(most frequent) (IC)SL
- -*
- -*
SPN wo
SPN wo
wo wo*
SPN wo
- -*
c(-S)L cs-L
Note. Children’s errors constitute the most frequently observed error based on 38 sub-
jects’ observations per problem. Simulation conditions: YSL, no conceptual knowledge;
C-SL, no problem situation knowledge; CS-L, degraded linguistic knowledge concerning
key words and phrases. Error types: WO, wrong operation errors; SPN, superset given
number errors; SBN, subset given number errors; -, no error; U, unable to derive any
* Problems for which the simulation matched children’s error patterns.
not an acceptable answer. As a result, it produces no answer at all, unlike
children who tended to produce given number errors on these problems.
Finally, it has no difficulty with Compares 1 and 2 because it can perform
the simple comparison of quantities required by those problems. Note,
however, that these are two of the most difficult problems for children to
solve. It cannot solve Compares 4 through 6 because they require map-
pings onto abstract part-whole structures in order to determine whether
addition or subtraction is in order.
Deficient story-situation knowledge. To test the contribution of story-
comprehension components, we removed the schemata that correspond
to Combine, Change, and Compare problems and restored its decontex-
tualized conceptual knowledge. In other words, we removed the contex-
tualized mappings from story situations to part-whole structures. This
means that the simulation could not understand whole story situations,
but instead could only search for key words and the like in order to trigger
its conceptual knowledge concerning part-whole relations. Under these
conditions, the simulation matched children’s response patterns on 7 out
of 18 problems. Its solution protocols revealed the following:
Using its “altogether” default key word strategy, the simulation suc-
cessfully solved Combine 1 and 2 problems. Its performance on the other
Combine problems, however, depended on the ordering of rules/
strategies. If the rules were ordered such that the simulation accessed its
default rules rather than its thinking rules, then it committed wrong op-
eration errors on Combines 3 through 6. If the rules were ordered such
that the simulation “thought” before “defaulting,” then it used its con-
junction-superset strategy to assign the role of SUPERSET to the set
owned by both Mary and John. As a result it solved these problems
correctly. Giving priority to defaulting produced a pattern that matched
children’s performance on three out of six Combine problems; giving
priority to thinking produced a pattern that matched children’s on two out
of six.
In contrast, without its Change schemata, the simulation could not
solve any of the Change problems because it could not understand the
transfers described in them. In other words, it had no way to map these
transfers onto its conceptual knowledge concerning part-whole relations.
This is because Change problems describe story situations, not simple
comparisons or combining of set quantities. Without its story-situation
knowledge, it could process the text base but not produce a coherent
representation of the story.
Turning finally to Compare problems, the simulation was found to pro-
duce wrong operation errors on Compares 1 through 3 and on Compare 6,
while correctly solving Compares 4 and 5-all using the same strategy.
This strategy is one that assigns the role of SUPERSET to the set that is
least specified in the problem. The reason it did this is rather interesting.
Since it no longer understood comparison scenarios, i.e., had no direct
mappings from comparisons to part-whole structures, it ignored the
phrases containing the comparative form. The remaining parts of these
lines therefore specified sets owned by no one and were hence considered
specifications of the superset. For Compares 1 and 2, this translates into
“Mary has 5 marbles. John has 3 marbles. How many marbles (are there
altogether)?” Under these circumstances, the simulation assigned the
role of superset to the unknown quantity referenced in the last line of its
representation and added the other two set quantities. Most importantly,
when children performed wrong operation errors on these problems, they
misrecalled the problem in just this way 77% of the time. For Compares
3 and 6, ignoring the comparative phrase in the last line translated into,
e.g., “Mary has 3 marbles. (There are) 5 marbles (altogether). How many
does John have?” Here, it subtracted 3 from 5, instead of adding them as
it should have. Children were also observed to make wrong operation
errors on these problems, but there was no evidence of this type of mis-
recall in their protocols. Finally, on Compares 4 and 5, ignoring the com-
parative form produced a representation whose interpretation was “Mary
has 8 marbles. (There are) 5 marbles (altogether). How many does John
have?” In this case, the simulation assigned the role of superset to the
quantity “5” and produced a negative number as its solution. Children
were not observed to do this. On Compare problems, then, the simulation
matched children’s errors on only 4 problems. Across all 18 problems,
altering its story understanding knowledge produced a match with chil-
dren’s performance on only 7 problems.
Deficient linguistic knowledge. To test the contribution of knowledge
concerning words and word phrases to solution success, we restored the
simulation’s story schemata and altered instead its understanding of cer-
tain key words and phrases. The alterations were of three types: First, we
altered its understanding of the word “SOME.” As noted by Riley et al.,
among others, children often seem confused about the interpretation of
this term, choosing often to completely ignore it when modeling solution
performance with blocks. Accordingly, we removed the word “SOME”
from the simulation’s quantity word category and placed it instead in its
modifier category. Essentially, this means that, to the simulation,
“SOME” was no longer a word that specified an unknown quantity, but
was instead an adjective. Second, as noted by DeCorte and Verschaffel
(1986), children tend to ignore comparative linguistic forms when reading.
This tendency was also noted in our recall data. Accordingly, the simu-
lation was made to misinterpret “HAVE-MORE-THAN” simply as
“HAVE”. For example, Compare 3 was misparsed’ as follows:
Mary has X marbles.
John has Y marbles.
X marbles are more than Y marbles.
How many marbles does John have?
Essentially this means that the comparative form is interpreted as a state-
ment primarily about set ownership and tangentially about the relative
sizes of two cardinals; nowhere is there an understanding that the original
statement refers to a difTerence set. The third line in this misparsing was
z The simulation does not parse natural language, but uses a propositionalized text base
as its input. In an earlier version (Dellarosa, 1986), the misparsings described here were
simply used as input. In the current version, production rules produce the misparsing effect
by constructing different types of proposition frames based on the expertise level of the
simulation run. The examples in the text constitute transcriptions of the proposition frames
constructed during a low expertise run.
not recalled by children, but it was included in our simulation to allow the
MORE-THAN proposition to enter short-term memory-as it presum-
ably does when children hear it-and hence affect the processing load.
Note the difference between this treatment of the comparative and the
treatment received when the story schemata are missing. Here, the com-
parative form is completely misunderstood as a statement of ownership;
in the former case, it was ignored entirely because it was not taken as a
statement of ownership nor could it be understood as a comparison sce-
nario since no knowledge of such a scenario was present in the simula-
tion’s knowledge base.
Finally, there was some indication in our data that children have difli-
culty interpreting the term “ALTOGETHER,” as in “Mary and John
have X marbles altogether.” There was some suggestion that children
interpret this as meaning that Mary and John EACH have X marbles.
Accordingly, the simulation’s proposition processing rules were made to
interpret these linguistic forms as follows: “Mary has X marbles and John
has X marbles.”
To summarize, in its present state, the simulation interpreted SOME as
an adjective, HAVE-MORE-THAN as simply HAVE, and ALTO-
GETHER as EACH. With these changes in its linguistic structures, the
simulation was again required to solve the 18 problems.
When the simulation’s linguistic knowledge was impaired as described,
it produced a response pattern that matched that of children on 15 out of
18 problems. These results indicate that the characteristic errors reported
here and elsewhere in the literature primarily reflect difficulties children
have in assigning interpretations to certain words and phrases used in
standard word problems.
Let us begin with the Compare type problems, since these are the most
straightforward cases. Recall that these problems are misinterpreted as
John has 4 marbles.
Mary has 3 marbles.
How many does (John, Mary) have?
In such a case, the simulation simply builds three unrelated sets, one
corresponding to John’s marbles, one corresponding to Mary’s, and one
corresponding to the set whose cardinality is requested. No superschema
is built since there is no information about how these sets are logically
related. As a result, none of the standard arithmetic operation rules apply,
and the simulation resorts to its default rules to produce an answer. In this
case, it searches memory to see if it already created a set that matches the
specifications of the requested set. Finding that it does, it returns the
cardinality of the requested set, i.e., John’s or Mary’s marbles, as appro-
priate. As a result, it matched children’s performance on five of the six
Compare problems.
A similar situation arises when the term “ALTOGETHER” is mapped
onto EACH. In this case problems that contain sentences such as “John
and Mary have 12 altogether” end up being represented as follows:
John as 12 marbles.
Mary has 12 marbles.
Mary has 6 marbles.
How many does John have?
Again, the simulation ends up with four unrelated sets in memory, and no
information about how they are logically related. As a result, it performs
a search for a set corresponding to John’s marbles, and returns “12,” or
the superset cardinal. Accordingly, it matched children’s performance on
five out of six Combine problems, the exception being Combine 5, on
which our subjects committed wrong operation errors instead of given
number errors .3
The case of Change problems is a bit more complex. In order for the
simulation to solve a Change problem, it must build a coherent TRANS-
FER-IN or TRANSFER-OUT schema. A TRANSFER-IN schema is
built if the problem describes a starting set into which objects are trans-
ferred. A TRANSFER-OUT schema is built if objects are transferred out
of the starting set. A difficulty arises when the simulation does not un-
derstand “SOME” to be a quantity word. In such a case, it does not
create a set when it encounters a proposition containing “Some.” In
Change 5 and Change 6 problems this is particularly disastrous, because
“Some” describes the starting set. Without this all important set, there is
not enough information to determine whether the problem describes a
TRANSFER-IN situation or a TRANSFER-OUT situation. As a result
the simulation again ends up with three unrelated sets (corresponding to
lines 2,3, and 4 in the problems) instead of a coherent superschema under
which these sets are subsumed. In order to solve the problem, it resorts
to its default rules. In this case, it can either (1) return the cardinality of
the set specified in the final line of the problem (e.g., Mary’s marbles) or
it can (2) use the term “BEGINNING” as a cue to return the cardinal of
the first set it created (i.e., the cardinal of the transfer set, line 2 of the
problem). Accordingly, the simulation matched the children’s perfor-
mance on four of the six Change problems.
To summarize, the best match between the children’s performance and
3 It should be noted that although our subjects committed wrong operation errors on
Combine 5, DeCorte, Verschtiel, and DeWinn (1985) reported that their subjects commit-
ted superset-given number errors on this problem, just as our simulation did.
the simulation’s was obtained when the