ArticlePDF Available

Abstract

In Experiment 1, students who studied example word problems that were grouped by t test, correlation, and chi-square were more likely to sort subsequent problems on the basis of structure and less likely to sort on the basis of surface characteristics than students who received no examples. In Experiment 2, this pattern was strongest when students studied structure-emphasizing rather than surface-emphasizing examples. In Experiment 3, students who studied and practiced 4 structure-emphasizing worked-out examples of t test and correlation problems were more likely to apply the appropriate statistical test correctly to subsequently presented statistics word problems than students who had studied surface emphasizing examples. This pattern was strong for lower but not for higher ability students. Implications of a schema construction theory are discussed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Journal of Educational Psychology
1996,
Vol. 88, No. 1, 144-161Copyright 1996 by the American Psychological Association, Inc.
0022-0663/967$3.00
Role of Examples in How Students Learn to Categorize
Statistics Word Problems
Jill L. Quilici and Richard E. Mayer
University of California, Santa Barbara
In Experiment 1, students who studied example word problems that were grouped by t test,
correlation, and chi-square were more likely to sort subsequent problems on the basis of
structure and less likely to sort on the basis of surface characteristics than students who
received no examples. In Experiment 2, this pattern was strongest when students studied
structure-emphasizing rather man surface-emphasizing examples. In Experiment 3, students
who studied and practiced 4 structure-emphasizing worked-out examples of t test and
correlation problems were more likely to apply the appropriate statistical test correctly to
subsequently presented statistics word problems than students who had studied surface-
emphasizing examples. This pattern was strong for lower but not for higher ability students.
Implications of a schema construction theory are discussed.
Students in introductory statistics courses are expected to
solve a variety of word problems that require using proce-
dures such as t test, chi-square, or correlation. Although
students may learn how to use these kinds of statistical
procedures, a major challenge is to learn when to use them.
According to theories of analogical reasoning, when people
are confronted with a new problem, they search their mem-
ory for a similar problem to guide their solution of the new
problem (Gentner, 1983; Mayer, 1992; Novick & Holyoak,
1991;
Reed, 1987; Vosniadou & Ortony, 1989). The search
for a similar problem may be based on surface features,
such as the specific story line in the problems, or on struc-
tural features, such as the underlying mathematical proce-
dures needed to solve the problems.
Although surface features generally are more salient than
structural features for inexperienced problem solvers, suc-
cessful analogical transfer (Holyoak & Koh, 1987; Novick,
1988) and schema formation (Mayer, 1981, 1982) depend
on recognizing structural similarities among problems. For
example, successful problem solvers are more likely to sort
problems on the basis of their structural problem type than
are unsuccessful problem solvers (Chi, Feltovich, & Glaser,
1981;
Silver, 1981). People often fail to recognize sponta-
neously structural similarities between a problem they know
and a new problem that could be solved using the same
procedure (Holyoak & Koh, 1987; Reed, Ernst, & Banerji,
1974).
Furthermore, to construct the structural features of a
type of story problem, people may need experience with
several examples couched in different surface stories (Gick
& Holyoak, 1980, 1983).
The present study examines the role of examples in stu-
dents'
categorization of statistics word problems. For ex-
ample, when students in an introductory statistics course are
Correspondence concerning this article should be addressed to
either Jill L. Quilici or Richard
E.
Mayer, Department of Psychol-
ogy, University of California, Santa Barbara, California 93106.
Electronic mail may be sent via Internet
to
quilici@psych.ucsb.edu
or mayer@psych.ucsb.edu.
confronted with a statistics problem such as the following,
they must determine which statistical test is appropriate
(e.g., t test) and they must apply the procedure for this test.
A personnel expert wishes to determine whether experienced
typists are able to type faster than inexperienced typists. Six
experienced typists (i.e., with 5 or more years of experience)
and 6 inexperienced typists (i.e., with less than 5 years of
experience) are given a typing test. The average number of
words typed per minute for each of the experienced typists
were 70, 60, 55, 59, 68, and 66 while the average number of
words typed per minute for each of the inexperienced typists
were 50, 43, 59, 47, 60, and 65.
In this case, the student may determine how to solve this
problem by trying to remember a related problem.
An important instructional device for teaching students
how to solve mathematical problems is the use of examples
(Anderson, 1993; Catrambone, 1994; Cooper & Sweller,
1987;
Reed, Dempster, & Ettinger, 1985; Zhu & Simon,
1987).
Chi, Bassok, Lewis, Reimann, and Glaser (1989)
have shown how successful problem solvers use examples
in learning and solving word problems. Although relatively
little research has focused on the role of examples in learn-
ing to solve statistics word problems, research on how to
help students learn to solve statistics problems is becoming
recognized as an important area (Catrambone, 1995;
Catrambone & Holyoak, 1990; Paas, 1992). This study is
concerned with using examples to help students recognize
which problems require which tests, such as tests of means
and correlations, rather than using examples to help students
learn how to compute statistical tests.
The Role of Examples in Thinking by Analogy
When a student is confronted with a statistics word prob-
lem that he or she cannot immediately solve (which can be
called the target problem), a popular strategy is to think of
a related problem that the student knows how to solve
(which can be called the source problem). Then, the student
144
STATISTICS WORD PROBLEMS145
must map a solution method from the source problem to the
target problem, sometimes using a solution method ab-
stracted from the source problem. This scenario involves
analogical reasoning because the student must build an
analogy between the source and target problems (Gentner,
1989;
Holyoak, 1985; Mayer, 1992; Ross, 1987; Vosniadou
& Ortony, 1989).
Thinking by analogy has long been recognized by edu-
cators as an effective technique for facilitating mathematical
problem solving. The famous mathematics educator George
Polya (1945/1957) observed that "analogy pervades all our
thinking" (p. 37). Consequently, his treatise on mathemati-
cal problem-solving techniques calls on students to ask, "Do
you know a related problem?" (Polya, 1945/1957, p. 9).
According to Polya, the process of devising a solution plan
depends on remembering and using an appropriate related
problem. Similarly, Schoenfeld (1985) suggests the follow-
ing problem-solving strategy: "If you cannot solve the orig-
inal problem, try first to solve an easier, related problem.
Look for known solutions to related problems" (p. 86).
According to Ross (1987, 1989), when students are con-
fronted with a statistics problem to solve (i.e., target prob-
lem),
they are reminded of an earlier problem from the
domain (i.e., source problem). In some cases students may
use similarities in surface features both to access and to use
a source problem in solving the target problem, whereas in
other cases they may rely on surface similarities in access-
ing but not in using a source problem. Our study differs
from Ross's work because (a) we examine only the issue of
how similarities between problems affect the accessing of a
source problem (as measured by categorization tasks),
whereas Ross examined both accessing and use (as mea-
sured in problem-solving tasks), and (b) we focus on struc-
ture-emphasizing techniques, whereas Ross focused on the
role of surface features.
Recently, the study of analogical problem solving has
become an exciting venue for research in cognitive science
(Catrambone & Holyoak, 1989; Cummins, 1992; Gentner,
1983,
1989; Holyoak, 1985; Mayer, 1992; Reed, 1987,
1989;
Reed, Ackinclose, & Voss, 1990; Ross, 1987, 1989;
Ross & Kennedy, 1990; Vosniadou & Ortony, 1989). The
picture that is emerging from cognitive studies of analogical
problem solving is that thinking by analogy involves three
processes: recognition, in which a problem solver finds a
source problem that is similar to a target problem; mapping,
in which a problem solver applies the solution method or
principle to the target problem; and abstraction, in which a
problem solver abstracts a solution method or principle
from the source problem (although Ross, 1987, 1989, has
shown that analogical reasoning can occur without abstrac-
tion taking place).
Impediments to successful problem solving can occur at
any of the three processing stages in analogical problem
solving, but we have chosen to focus on the recognition
process for the purposes of this article. The recognition
process depends on the problem solver recognizing a sim-
ilarity between the problem that he or she is working on
(i.e.,
the target problem) and a related problem that he or she
is able to solve (i.e., the source problem). Two techniques
for recognizing similarities between problems are to focus
on surface similarity and to focus on structural similarity
(Vosniadou & Ortony, 1989). Surface similarity depends on
shared attributes of objects in the problems and is derived
from aspects of the cover story, whereas structural similarity
depends on shared relations among objects and determines
aspects of the required solution procedure.
The 12 statistics word problems in Table 1 help to exem-
plify our conceptions of surface and structural similarity in
the domain of introductory statistics. On the basis of struc-
tural features, (a) the problems labeled
t-1,
t-2, t-3, and t-4
share a common structure in that each involves two inde-
pendent groups as the independent variable and a quantita-
tive measure as the dependent variable, and thus each re-
quires a two-sample t test of means; (b) the problems
labeled x-1, x-2, x-3, and x-4 share a common structure in
that each involves two independent groups as the indepen-
dent variable and a categorical measure as the dependent
variable, and thus each requires a two-sample chi-square
test of
proportions;
and (c) the problems labeled
r-1,
r-2, r-3,
and r-4 share a common structure in that each involves one
group with two quantitative variables, and thus each re-
quires a one-sample test of correlation. The last two col-
umns of Table 2 summarize the major structure features of
the f-test, chi-square, and correlation problems listed in
Table 1. We use the terms t test, chi-square, and correlation,
respectively, to refer to a two-sample test of means, a
two-sample test of proportions, and a one-sample test of
correlation. It also should be noted that the structural fea-
tures of the correlation problems in Table 2 could be inter-
preted as characteristics of one-sample test of means prob-
lems.
If both one- and two-sample test of means problems
had been included in this study, the students would have
needed to consider additional structural features.
In contrast, an analysis of the surface features of the 12
problems in Table 1 reveals that (a) problems
t-1,
x-1, and
r-1 share a common set of surface features in that each
involves the experience of typists as the independent vari-
able and typing speed as the dependent variable and is
derived from a cover story about experience and typing
speed; (b) problems t-2, x-2, and r-2 share a common set of
surface features in that each involves average annual tem-
perature as the independent variable and annual rainfall as
the dependent variable and is derived from a cover story
about temperature and precipitation; (c) problems t-3, x-3,
and r-3 share a common set of surface features in that each
involves amount of sleep as the independent variable and
number of errors on a test as the dependent variable and is
derived from a cover story about fatigue and mental alert-
ness;
and (d) problems t-4, x-4, and r-4 share a common
set of surface features in that each involves reading score as
the independent variable and grade point average as the
dependent variable and is derived from a cover story
about reading skill and school grades. The first two columns
of Table 2 summarize the major surface features of the
typing, rainfall, sleep, and reading ability problems listed in
Table 1.
146 QUILICI AND MAYER
Table 1
Twelve Sorting Problems Used in Experiments 1 and 2
Item Problem
t-1 A personnel expert wishes to determine whether experienced typists are able to type
faster than inexperienced typists. Twenty experienced typists (i.e., with 5 or more
years of experience) and 20 inexperienced typists (i.e., with less than 5 years of
experience) are given a typing test. Each typist's average number of words typed
per minute is recorded.
t-2 After examining weather data for the last 50 years, a meteorologist claims that the
annual precipitation is greater in years with below average temperature than in
years with above average temperature. She notes the annual rainfall for each of 25
years that had above average temperatures as well as 25 years that had below
average temperatures.
t-3 A psychologist tests the hypothesis that fatigue affects mental alertness. An attention
test is prepared which requires subjects to sit in front of a blank TV screen and
press a response button each time a dot appears on the screen. A total of 110 dots
are presented during a 90-minute period, and the psychologist records the number
of errors for each subject. Two groups of subjects are selected. The first group of
10 subjects is tested after they have been kept awake for 24 hours. The second
group of 10 subjects is tested in the morning after a full night's sleep.
t-4 A college dean claims that good readers earn better grades than poor readers. The
grade point averages (GPA) are recorded for 50 first-year students who scored
high on a reading comprehension test and for 50 first-year students who scored
low on a reading comprehension test.
x-1 A personnel expert wishes to determine whether or not experienced typists are likely
to be fast typists and inexperienced typists are more likely to be slow typists.
Twenty experienced typists (i.e., with 5 or more years of experience) and 20
inexperienced typists (i.e., with less than 5 years of experience) are given a typing
test. Based on the test, each typist is classified as fast or slow.
x-2 After examining weather data for the last 50 years, a meteorologist claims that the
annual precipitation is more likely to be above average in years when the
temperature is above average than when temperature is below average. For each of
50 years, she notes whether the annual rainfall is above or below average and
whether the temperature is above or below average.
x-3 A psychologist tests the hypothesis that people who are fatigued also lack mental
alertness. An attention test is prepared which requires subjects to sit in front of a
blank TV screen and press a response button each time a dot appears on the
screen. A total of 110 dots are presented during a 90-minute period, and the
psychologist records the number of errors for each subject. Twenty subjects are
selected; half are tested after being kept awake for 24 hours, and half are tested in
the morning after a full night's sleep. Based on their number of errors on the test,
each subject is also labeled as high or low in mental alertness.
x-4 A college dean claims that a group of good readers contains more honors students
than a group of poor readers. For each of 100 first-year college students, a reading
comprehension test was used to determine whether the student was a good or poor
reader and grade point average (GPA) was used to determine whether or not the
student was an honors student.
r-1 A personnel expert wishes to determine whether typing experience goes with faster
typing speeds. Forty typists are asked to report how many years they have worked
as typists and are given a typing test to determine their average number of words
typed per minute.
r-2 After examining weather data for the last 50 years, a meteorologist claims that the
annual precipitation varies with average temperature. For each of 50 years, she
notes the annual rainfall and average temperature.
r-3 A psychologist tests the hypothesis that fatigue is related to mental alertness. An
attention test is prepared which requires subjects to sit in front of a blank TV
screen and press a response button each time a dot appears on the screen. A total
of 110 dots are presented during a 90-minute period, and the psychologist records
the number of errors for each subject. Twenty subjects are given this test, each
with a different number of hours since they woke up (ranging from 1 to 20).
r-4 A college dean claims that scores on a reading comprehension test predict college
grades. The grade point averages (GPA) and reading comprehension scores are
recorded for 100 first-year students.
Note. Each letter-number pair represents a sorting problem: t = t test, x = chi-square, r =
correlation; 1 = cover story about experience and typing speed; 2 = cover story about temperature
and precipitation; 3 = cover story about fatigue and mental alertness; 4 = cover story about reading
skill and grade point average.
STATISTICS WORD PROBLEMS147
Table 2
Surface and Structural Features of 12 Problems
Problem
t-1
t-2
t-3
t-4
x-1
x-2
x-3
x-4
r-1
r-2
r-3
r-4
Surface
Independent
variable
Experience of
typists
Average yearly
temperature
Amount of
sleep
Reading score
Experience of
typists
Average yearly
temperature
Amount of
sleep
Reading score
Experience of
typists
Average yearly
l"p iriTwr!) In rp
Amount of
sleep
Reading score
features
Dependent
variable
Words typed per
minute
Annual rainfall
Number of errors
on test
Grade point
average
Words typed per
minute
Annual rainfall
Number of errors
on test
Grade point
average
Words typed per
minute
Annual rainfall
Number of errors
on test
Grade point
average
Structural
Independent
variable
2 independent
groups
2 independent
groups
2 independent
groups
2 independent
groups
2 independent
groups
2 independent
groups
2 independent
groups
2 independent
groups
1 group
1 group
1 group
1 group
features
Dependent
variable
Quantitative
Quantitative
Quantitative
Quantitative
Categorical
Categorical
Categorical
Categorical
Quantitative
Quantitative
Quantitative
Quantitative
Schema Construction Theory
Our research is motivated by schema construction theory,
in which schematic knowledge is a fundamental component
of mathematical problem-solving expertise (Chi, Feltovich,
& Glaser, 1981; Mayer, 1992; Silver, 1981). Schematic
knowledge refers to knowledge of problem types and can be
evaluated through sorting tasks (Chi, Feltovich, & Glaser,
1981;
Cummins, 1992; Hinsley, Hayes, & Simon, 1977;
Mayer, 1981,1982; Riley, Greeno, & Heller, 1983; Schoen-
feld & Hermann, 1982; Silver, 1981). For example, exper-
tise in statistics problem solving includes knowledge that
allows a person to distinguish among Mest, chi-square, and
correlation problems, as would be indicated by sorting prob-
lems into categories. Research on expertise suggests that
people construct increasingly more accurate problem sche-
mas as they gain more experience in a domain. In particular,
experts are more likely to sort problems on the basis of
structural features and less likely to sort on the basis of
surface features compared to novices.
If the construction of structure-based problem schemas is
a component of successful problem solving in a domain, one
might reasonably ask how the process of schema construc-
tion can be promoted. Research on analogical reasoning and
learning from examples suggests that experience with ex-
ample problems can promote the construction of structure-
based problem schemas (Reed & Bolstad, 1991; Sweller &
Cooper, 1985; Ward & Sweller, 1990; Zhu & Simon, 1987)
and that experience with more than one example of a
problem type may be necessary for the spontaneous con-
struction of structure-based problem schemas (Gick &
Holyoak, 1980, 1983). Sorting tasks have been used as a
way of assessing the degree to which students possess
structure-based schemas for problems (Chi, Feltovich, &
Glaser, 1981; Cummins, 1992; Hinsley, Hayes, & Simon,
1977;
Silver, 1981).
In our research, we tested the prediction that students who
are exposed to examples of statistics word problems (e.g.,
Mest, chi-square, and correlation problems) are more likely
to sort subsequently presented problems on the basis of
structure and less likely to sort on the basis of surface
features compared to students who receive no example
problems. In addition, we tested a secondary prediction mat
students who are exposed to several examples of each type
of statistics word problem are more likely to sort subse-
quently presented problems on the basis of structural fea-
tures and less likely to sort on the basis of surface features
compared to students who receive only one example of each
type.
These predictions are addressed in Experiment 1.
Another important goal of the present project was to
identify conditions under which exposure to examples is
most likely to promote the construction of problem sche-
mas,
as measured by sorting or categorization performance.
Some example sets (which we call "surface-emphasizing")
confound surface and structural features of problems by
using the same surface story for all problems of a given
type.
Some example sets (which we call "structure-empha-
sizing") control for surface and structural features by using
a different surface story for each problem of a given type. If
surface features are more salient to inexperienced problem
solvers than are structural features, then inexperienced prob-
148QUILICI AND MAYER
lem solvers should focus primarily on surface features in a
situation in which surface and structural features are con-
founded, that is, in the surface-emphasizing condition. In
contrast, in the structure-emphasizing condition, the influ-
ence of surface features is reduced by using the same
collection of surface stories in the problems of each type. In
this situation, problem solvers must recognize that surface
characteristics are not good predictors of group membership
and will therefore focus on other characteristics distinguish-
ing the problem types, namely, structural features.
We propose that structure-emphasizing example sets are
more likely to promote schema construction on the basis of
problem type than are surface-emphasizing example sets.
Thus,
we predicted that students who were exposed to
structure-emphasizing examples would be more likely to
sort subsequently presented problems on the basis of struc-
tural features and less likely to sort on the basis of surface
features compared to students who were exposed to surface-
emphasizing examples. This prediction is addressed in Ex-
periment 2 and, in an extended form, in Experiment 3.
Finally, we considered the role of individual differences
in problem schema construction. We hypothesized that
higher ability students would be more likely to focus on
structural features of problems, whereas lower ability stu-
dents would be more likely to focus on surface features of
problems. Thus, we predicted that higher ability students
would be more likely to construct structure-based schemas
from examples regardless of whether or not structural and
surface features are confounded, whereas lower ability stu-
dents would be more likely to construct structure-based
schemas from structure-emphasizing examples than from
surface-emphasizing examples. This prediction was exam-
ined in Experiment 3.
Experiment 1
Experiment 1 explored the effect of examples on how
students sort statistics word problems. When asked to sort a
set of statistics word problems, students may use either
surface characteristics, such as placing problems with the
same cover story in the same category, or structural char-
acteristics, such as placing problems requiring the same
statistical test in the same category. Research on expertise in
solving mathematics and physics word problems has shown
that experts are more likely to focus on structure and less
likely to focus on surface characteristics of problems than
are novices (Chi, Feltovich, & Glaser, 1981; Silver, 1981).
According to schema construction theory, exposure to
multiple examples of each problem type (e.g., Mest, chi-
square, and correlation problems) can foster the develop-
ment of structure-based schemas. In contrast, statistics-
naive students who have had no exposure to example prob-
lems may be more likely to use surface rather than structural
characteristics in categorizing a set of statistics word prob-
lems.
Novices who receive only one example of each prob-
lem type may perform intermediate between these groups,
being more likely to use structural features than students
receiving no examples and less likely to use structural
features than students receiving multiple examples of each
problem type. Experiment 1 explored the hypothesis that
statistics-naive students who are given examples of three
structurally different types of statistics word problems will
be more likely than equivalent students receiving no exam-
ples to abstract the underlying structure, as indicated in a
subsequent sorting task. In particular, we predicted that
students who were exposed to example problems for each of
three problem types would be more likely to sort subse-
quently presented problems on the basis of structure and
less likely to sort on the basis of surface features compared
to students who were not exposed to example problems. We
also predicted that students who received three example
problems for each of three problem types would be more
likely to sort subsequently presented problems on the basis
of structural features and less likely to sort on the basis of
surface features compared to students who were exposed to
only one example problem for each of the three problem
types.
Method
Participants and design. The participants were 81 college stu-
dents who had little or no knowledge about statistics (as indicated
by their reporting on a questionnaire that they had never taken a
statistics course or that they had taken an introductory level sta-
tistics course). Twenty-seven students served in the no-example
group, 27 students served in the one-example group, and 27
students served in the three-example group. In addition to these 81
participants, we tested 5 other participants. However, data for these
participants were eliminated because the participants failed to
follow instructions.
Materials. The materials consisted of a questionnaire, a set of
three example sheets each containing three examples, a set of three
example sheets each containing one example, and a deck of 12
problem cards.1
The questionnaire was an 8.5- X 11-in. sheet containing ques-
tions requesting the age, gender, academic major, year in school,
and SAT scores of the participant, and a list of any mathematics
and statistics courses taken.
The one-example set contained three 8.5- X 11-in. sheets of
paper with one example problem on each sheet, namely, one Mest,
one chi-square, and one correlation problem, respectively. The
three-example set contained three 8.5- X 11-in. sheets of paper
with three example problems of the same type on each sheet,
namely, three Mest, three correlation, and three chi-square prob-
lems,
respectively. The surface stories for the example problems
were different from one another and from the problem cards used
on the sorting test. Examples are listed in Table 3.
The problem cards consisted of twelve 3- X 5-in. index cards
with a statistics problem typed on each card. The problems were
generated by completing each cell of a 3 X 4 design, with the first
factor being structural characteristics it test, chi-square, and cor-
relation) and the second factor being surface characteristics (typ-
ing, weather, mental fatigue, and reading). The 12 problems are
listed in Table 1.
Procedure. Students were tested in groups of 5 or fewer per
1 In Experiment 1, nine versions of the one-example and nine
versions of the three-example materials were constructed to coun-
terbalance the effects of surface story, and three participants in the
one-example and three-example groups received each version.
STATISTICS WORD PROBLEMS 149
Table 3
Sample Sets of Example Sheets for the One-Example and Three-Example
Groups in Experiment 1
One-example (-test sheet
A school board would like to test the assertion that employees who have more years of
experience in the district are absent more days that those who have worked fewer years.
Two hundred school employees are asked whether they have worked for the district less
than 10 years or more than 10 years and are asked to report how many days they were
absent last year.
One-example chi-square sheet
A drug company wishes to determine whether a high dosage of pain reliever is related to
fast relief of pain. Fifteen headache sufferers are given a low dosage and 15 are given a
high dosage of pain reliever. For each person, a record is made of whether the pain
reliever worked fast (i.e., within 30 minutes) or not.
One-example correlation sheet
An instructor believes that the more time that students study, the better their grade will
be.
He asks 30 of his students to record how many hours they spend studying in a
month. He also records the course grade for each student on a scale from 0 (for F) to 4
(for A).
Three-example f-test sheet
A drug company wishes to determine whether extra-strength pain reliever ends headaches
faster than regular-strength pain reliever. Fifteen headache sufferers are given extra-
strength dosage and 15 other headache sufferers are given the regular-strength dosage.
For each person, a record is made of the length of time from administration of the pain
reliever to the first indication of relief from pain.
A health researcher wants to determine whether extroverts smoke fewer cigarettes than
introverts. Fifty people who score high in extroversion and 50 people who score low in
extroversion on a standard personality inventory are asked to report how many
cigarettes they smoke per day.
A researcher proposes that college graduates have more free time than high school
graduates. Twenty-five high school graduates and 25 college graduates are asked to
complete a survey detailing the number of hours of free time each person has in a
week.
Three-example chi-square sheet
An instructor believes that successful students are more likely to study a lot and
unsuccessful students are more likely to study a little. Based on their course grades, she
identifies 15 of her students who are successful and 15 who are unsuccessful. Of these,
she identifies which students reported studying less than one hour per day and which
students reported studying one or more hours per day.
A marriage counselor wishes to test her idea that couples who have been married for a
short time (i.e., less than 5 years) are more likely to spend their free time together than
those who have been married a long time (i.e., 10 years or more). Twenty-five couples
who have been married a short time and 25 couples who have been married a long time
are sorted as spending much free time together (i.e., more than 50%) or as spending
little free time together (i.e., 50% or less).
A principal wishes to determine whether tall children are over-represented among above-
average intelligence children and short children are over-represented among below-
average intelligence children. Twenty children who scored below average and 20
children who scored above average on the Wechsler Intelligence Scale for Children
(WISC) were categorized as below average or above average in height.
Three-example correlation sheet
A school board would like to test the assertion that the number of days school employees
are absent varies with the number of years of employment in the school district. Two
hundred employees are asked to report how many years they have worked for the
school district and how many days they were absent last year.
A marketing researcher suspects that the more people read newsmagazines the more likely
they are to earn a higher income. Forty people are asked to report their annual income
for last year as well as the number of issues of Time and/or Newsweek they read each
month.
A restaurant owner wants to determine whether fast service is related to tips. Time is
measured beginning when the customers are seated and ending when the customers'
meals are brought to the table. Each server's average time and each server's total
amount of tips earned that week are recorded.
session and were randomly assigned to a treatment group within the three-example group received three sheets containing three
each session. Each student was seated in a separate cubicle and had Mest, three correlation, and three chi-square example problems,
no contact with any other student during the session. First, students respectively; students in the one-example group received three
completed the questionnaire at their own rates. Second, students in example sheets containing one f-test, one correlation, and one
150QUILICI AND MAYER
chi-square problem, respectively; and students in the no-example
group received no example sheets. Then, with the example sheets
still present, students in the one-example and three-example
groups were asked to sort a deck of 12 problem cards into groups
by placing each card on the example sheet it "best goes with."
Students in the no-example group were asked to sort the same 12
problem cards into groups that "best went together." The problem
cards were always given to participants in the same fixed order,
subject to the constraint that there were never two cards in a row
with either the same surface story or the same structural problem
type.
Results and Discussion
Are the groups equivalent in ability? An analysis of
variance (ANOVA) conducted on the SAT scores of each
participant indicated that the three groups did not differ
significantly on SAT-mathematics (Ms = 573, 582, and 564
for one-example, three-example, and no-example groups,
respectively), F < 1.
Scoring. First, we computed a structure score for each
participant by tallying the number of pairs of problems in
each sorted group (with possible scores of 0 to 6 for each of
three groups) that came from the same structure category
(i.e.,
t test, correlation, or chi-square) and dividing that
number by 18 (i.e., the highest possible number). Second,
we computed a surface score for each participant by tallying
the number of pairs of problems in each sorted group (with
possible scores of 0 to 3 for each of four groups) that came
from the same surface category (i.e., typing, precipitation,
fatigue, or reading) and dividing that number by 12 (i.e., the
highest possible number). Third, we classified each partic-
ipant as structure-using if the participant's structure score
was greater than or equal to the participant's surface score
or as surface-using if the participant's surface score was
greater than the participant's structure score. Fourth, for
each group we tallied the number of times each pair of
problems had been placed into the same category (i.e.,
ranging from 0 to 27 per pair); these data were used in
cluster analyses performed for each group.
Does experience with examples foster structural schema
construction? Experiment 1 was designed to examine the
hypothesis that experience with example problems fosters
structural schema construction better than no experience
and that experience with three examples fosters structural
schema construction better than experience with one exam-
ple.
In particular, we predicted that students who studied
grouped examples of each problem type (i.e., three-example
and one-example groups) would be more likely to sort new
problems on the basis of structural features than students
who lacked this experience (no-example group) and that
students who studied multiple examples of each problem
type (i.e., three-example group) would be more likely to sort
new problems on the basis of structural features than stu-
dents who studied a single example of each problem type
(i.e.,
one-example group).
We found support for the hypothesis that giving students
grouped examples of a problem type increases the likeli-
hood that they would subsequently focus on structural rather
than surface features in sorting problems. First, an ANOVA
performed on the structure scores revealed significant dif-
ferences among the mean structure scores of the three
groups, F(2, 78) = 35.929, MSE =
.019,
p <
.001;
further-
more, supplemental Tukey tests (with alpha equal to .05)
indicated that the mean structure scores of the one-example
(M =
.323,
SD = .162) and three-example (M = .327, SD =
.135) groups were each significantly greater than the mean
structure score of the no-example group (M = .049, SD =
.113) but did not differ significantly from each other.
Second, an ANOVA performed on the surface scores
revealed significant differences among the mean surface
scores of the three groups, F(2, 78) = 17.815, MSE = .085,
p < .001; furthermore, supplemental Tukey tests (with
alpha equal to .05) indicated that the mean surface scores of
the one-example (M = .488, SD - .319) and three-example
groups (M = .441, SD = .282) were each significantly less
than the mean surface score of the no-example group (M =
.873,
SD = .27) but did not differ significantly from each
other.
Third, the proportion of participants in the one-example
and three-example groups who predominantly used struc-
ture in sorting (25 out of 54) was significantly higher than
in the no-example group (3 out of 27) on the basis of a
chi-square test with Yates correction, ^(1) = 8.358, p <
.004.
We collapsed the one-example and three-example
groups together in this analysis because our main focus in
Experiment 1 was on the role of examples and because no
significant differences between the two groups were found
in the foregoing two analyses.
Fourth, hierarchical cluster analyses (Williams, Hill, &
Vang, 1992) were performed on the tallies of the number of
times each pair of test problems had been placed into the
same category for each group. Figure 1 shows the resulting
clusters for each group, indicating that the no-example
group sorted on the basis of surface features, whereas the
three-example and one-example groups showed some clus-
tering on the basis of structure and some clustering on the
basis of surface features. A useful way to view the cluster-
ing structures of each group is to focus on the first four or
five clusters created with the constraint that no cluster
contain more than four items. For example, in examining
the first four clusters established by the no-example group,
we find that each cluster contains three problems with the
same surface feature, indicating that there is no clustering
on the basis of problem structure. In contrast, in examining
the first four clusters established by the three-example
group, we find some clustering on the basis of structure such
as r-2 and
r-3;
some clustering on the basis of both structure
and surface, such as t-2, t-3, x-2, and x-3; and some clus-
tering on the basis of surface features, such as x-1 and t-1
and r-4, x-4, and t-4. It is interesting that when students
show signs of clustering on the basis of structure, they seem
to be better able to distinguish correlation problems as a
separate category (as indicated by the cluster containing r-2
and r-3) and more likely to confuse chi-square and Mest
problems (as indicated by the cluster containing t-2,
t-3,
x-2,
and x-3). Finally, the first four clusters established by the
one-example group are mainly based on surface features,
STATISTICS WORD PROBLEMS151
Three-Example
r-1-
r-2-
r-3-
t-3-
x-3-
t-2-
x-2-
r-4-
x-4-
t-4-
x-1-
t-1-
t-3-
r-3-
x-3-
x-4-
t-4-
x-1-
t-1-
r-1"
r-4-
t-2"
r-2-
x-2-
One-Example
t
=
1-test
r
=
correlation
x
=
chi-square
1,2,3, and 4
=
different surface stories
No-Example
Figure 1. Hierarchical cluster analysis results for three groups in Experiment 1. One centimeter
equals
1
unit of Euclidean
distance.
Each letter-number pair represents a sorting
problem:
t = t test,
x = chi-square, r = correlation;
1
= cover story about experience and typing speed, 2 = cover story
about temperature and precipitation, 3 = cover story about fatigue and mental alertness, 4 = cover
story about reading skill and grade point average.
with t-3, r-3, and x-3 clustered, x-4 and t-4 clustered, x-1,
t-1,
and r-1 clustered, and t-2, r-2, and x-2 clustered (along
with r-4), whereas two problems of the same structural type
are clustered in one category, namely, r-2 and r-4 (along
with t-2 and x-2).
Taken together, these results are consistent with our main
hypothesis that statistics-naive students who are exposed to
examples of various types of statistics problems are more
likely to construct structure-based schemas of problem
types than equivalent students who are not exposed to
examples.
However, these results are contrary to our secondary
prediction that students receiving three examples of each
problem type are more likely to sort subsequent problems
on the basis of structural features and less likely to sort
problems on the basis of surface features than students
receiving only one example of each problem type. Simi-
larly, these results are not consistent with research by Gick
and Holyoak (1983), who found that students receiving two
examples of a problem type showed more evidence of
structural schema construction than did students who re-
ceived only one example. An important difference between
our study and Gick and Holyoak's study is that students in
our study were asked to categorize three types of problems,
whereas students in Gick and Holyoak's study were tested
on only one type of problem. In our study, when students
sought to categorize three types of problems, they may have
been encouraged to look for differences among the three
problem types, which leads to structural schema construc-
tion in both the one-example and three-example groups. On
152QUILICI AND MAYER
the other hand, in Gick and Holy oak's study, participants in
the two-example group were asked to write a description of
how the two examples were similar, which may have en-
couraged additional structural schema construction in the
two-example group compared to the one-example group.
We did not ask the three-example group to perform this type
of comparison task in our study. As in our study, Ahn,
Brewer, and Mooney (1992) found evidence for schema
acquisition from a single example.
Experiment 2
The results of Experiment 1 provide evidence that expo-
sure to examples of each of three types of statistics word
problems (i.e., f test, chi-square, and correlation) can en-
courage students to focus more on structure and less on
surface characteristics when they are confronted with new
statistics word problems. However, all examples may not be
equally effective in promoting structure-based schema con-
struction, as indicated by an increased likelihood for stu-
dents to focus on structural features of new statistics word
problems. Experiment 2 explores the question of whether
some examples are more likely to prime structure-based
schema induction whereas others are more likely to prime
surface-based schema construction. In particular, Experi-
ment 2 was designed to pinpoint the conditions under which
examples promote structure-based or surface-based schema
construction.
For the purposes of this experiment, we developed struc-
ture-emphasizing and surface-emphasizing example sets for
f-test, chi-square, and correlation problems. In both example
sets,
there were three examples of f-test problems on one
sheet, three examples of chi-square problems on a second
sheet, and three examples of correlation on a third sheet. In
the structure-emphasizing set, three different surface stories
were used for the f-test problems (i.e., years of experience
and number of days of absence by employees, personal
income of subscribers to different magazines, and serving
time in a business and amount of money earned), the same
three surface stories were used for the chi-square problems,
and the same three surface stories were used for the corre-
lation problems. In short, the structure-emphasizing exam-
ple set presented the same surface stories between problem
types.
We labeled this treatment structure-emphasizing be-
cause students could see that the common features of the
problems on each sheet were not based on the surface story
(e.g., the same stories are on different sheets) but rather
were based on the structure of the problem (e.g., all the
problems on a sheet required the same statistical test).
In the surface-emphasizing set, the three f-test problems
used the same surface story, but it was different from those
used on the other sheets (e.g., years of work experience and
number of days of absence by employees); the three chi-
square problems used the same surface story, but it was
different from those used on the other sheets (e.g., personal
income of readers of different magazines); and the three
correlation problems used the same surface story, but it was
different from those used on the other sheets (e.g., serving
time in business and amount of money earned). In short, the
surface-emphasizing example set presented the same sur-
face stories within each problem type. We defined this
treatment as surface-emphasizing because it confounds sur-
face and structural features as the basis for categorizing
problems, and we assumed that surface features would be
more salient to statistics-naive students.
We predicted that students who were exposed to struc-
ture-emphasizing example sets would be more likely to
categorize subsequent statistics word problems on the basis
of structural features, whereas students who were exposed
to surface-emphasizing example sets or no examples would
be more likely to categorize subsequent statistics word
problems on the basis of surface features.
Method
Participants and design. The participants were 54 college stu-
dents from the same participant pool as in Experiment 1. Eighteen
students apiece served in the no-example, surface-emphasizing,
and structure-emphasizing groups.
Materials. The materials consisted of the same questionnaire
and deck of 12 problem cards as in Experiment 1. In addition, there
were two sets of example sheets: a surface-emphasizing set and a
structure-emphasizing set.2 Both sets consisted of three 8.5- X
11-in. sheets, with each sheet containing three example problems;
one sheet contained three Mest problems, one sheet contained
three correlation problems, and one sheet contained three chi-
square problems. For the surface-emphasizing set, the surface
story was the same for all problems on a sheet but different from
problems on other sheets. For the structure-emphasizing set, three
different surface stories were represented on each sheet (e.g., three
different surface stories for three f-test examples), but the same
three surface stories were used on each sheet (e.g., the three
surface stories used for t tests were also used for chi-square and for
correlation). Thus, the surface-emphasizing set confounded struc-
tural and surface characteristics, whereas the structure-emphasiz-
ing set controlled for structural and surface characteristics.
Procedure. The procedure corresponded to that used in Exper-
iment 1 except that the surface-emphasizing group received the
surface-emphasizing example set, the structure-emphasizing group
received the structure-emphasizing example set, and the no-exam-
ple group received no examples. The structure-emphasizing and
surface-emphasizing groups were given the same sorting instruc-
tions as the one-example and three-example groups in Experiment
1,
and the no-example group was given the same sorting instruc-
tions as the no-example group in Experiment 1. In addition, a short
questionnaire was administered after the sorting task was com-
pleted, but the responses were so idiosyncratic that the data were
not analyzed further.
Results and Discussion
The groups were equivalent. An ANOVA conducted on
the SAT scores of each participant indicated that the three
2 In Experiment 2, three versions of the surface-emphasizing and
three versions of the structure-emphasizing materials were con-
structed to counterbalance the effects of surface story, and 6
participants in the surface-emphasizing and structure-emphasizing
groups received each version.
STATISTICS WORD PROBLEMS153
groups did not differ significantly on SAT-mathematics
(Ms = 595, 586, and 595 for the structure-emphasizing,
surface-emphasizing, and no-example groups, respectively),
F < 1.
Scoring. Scoring was identical to Experiment 1.
Does experience with structure-emphasizing examples
foster structural schema construction? Experiment 2 was
designed to examine the hypothesis that experience with
structure-emphasizing examples fosters structural schema
construction better than experience with surface-emphasiz-
ing examples or no examples. In particular, we predicted
that students in the structure-emphasizing group would be
more likely to categorize test problems on the basis of
structural characteristics than students in the surface-em-
phasizing or no-example group. We found support for this
prediction. First, an ANOVA performed on the structure
scores revealed significant differences among the mean
structure scores of the three groups, F(2, 51) = 11.179,
MSE = .046, p < .001; furthermore, supplemental Tukey
tests (with alpha equal to .05) indicated that the structure-
emphasizing group (M = .497, SD = .174) sorted by
structure significantly more than the surface-emphasizing
(M = .318, SD = .149) and the no-example (M = .157,
SD = .295) groups, which did not differ from each other.
Second, an ANOVA performed on the surface scores
revealed significant differences among the mean surface
scores of the three groups, F(2, 51) = 16.980, MSE = .097,
p < .001; furthermore, supplemental Tukey tests (with
alpha equal to .05) indicated that the structure-emphasizing
group (M = .176, SD = .166) sorted by surface significantly
less than the surface-emphasizing (M = .560, SD = .336)
and no-example (M = .773, SD = .389) groups, which did
not differ significantly from each other.
Third, the proportion of participants in the structure-
emphasizing group who predominantly used structure in
sorting (16 out of 18) was significantly higher than in the
surface-emphasizing and no-example groups (8 out of 36)
based on a chi-square test with Yates correction, ^(1) =
18.984, p < .001. We collapsed the no-example and sur-
face-emphasizing groups together in this analysis because
we were particularly interested in determining whether the
structure-emphasizing treatment was more effective than
other treatments in encouraging students to categorize prob-
lems on the basis of structure.
Fourth, hierarchical cluster analyses were performed on
the tallies of the number of times each pair of problems had
been placed into the same category for each treatment
group. Figure 2 shows the clusters that resulted from these
analyses for each of the three treatment groups. As in our
interpretation of Figure 1, we focused on the first few
clusters created with the constraint that no cluster contain
more than four problems. As predicted, the structure-
emphasizing group was more likely to sort on the basis of
structure and less likely to sort on the basis of surface
compared to the surface-emphasizing and no-example
groups. For example, as in Experiment 1, the first four
clusters formed by the no-example group each contained
three problems with the same surface features, indicating
that students in this group did not sort on the basis of
structure. In contrast, using the criterion of clusters contain-
ing no more than four problems, the structure-emphasizing
group created only one cluster consisting of two chi-square
(x-1 and x-2) and two r-test (t-2 and t-4) problems, indicat-
ing at least some clustering on the basis of structure. Even-
tually, after several more cycles, there were two clusters:
one cluster containing all of the correlation problems and
one cluster containing all of the chi-square and all of the
f-test problems. Consistent with Experiment 1, there is some
evidence that when students are sensitive to structure, they
are better able to distinguish correlation problems as a
separate category and are more likely to confuse chi-square
and f-test problems. Finally, the surface-emphasizing group
created only one somewhat random cluster containing r-1,
t-4,
r-2, and x-4. On each additional cycle more problems
were added to the cluster, but other clusters did not emerge.
Taken together, these results are consistent with the hy-
pothesis that statistics-naive students who are exposed to
multiple examples of various types of statistics problems are
more likely to construct structure-based schemas of problem
types than are equivalent students who are not exposed to
examples, especially when the surface stories in the exam-
ples are the same across problem types rather than the same
within problem types. These findings are also consistent
with Paas and Van Merrienboer's (1994) finding that stu-
dents learn more from highly variable examples than from
examples that share highly similar surface features.
Experiment 3
A limitation of the foregoing experiments is that the
experimental task is limited to sorting; therefore, Experi-
ment 3 extended the investigation to a situation in which
students learned to solve statistics word problems (e.g., t test
and correlation) on the basis of structure-emphasizing or
surface-emphasizing worked-out examples. In Experiment
3,
the degree to which students focused on the structural
features of problems was assessed by giving students a
problem-solving test and noting which type of statistic they
attempted to calculate for each problem. For example, the
degree of structural schema construction would be indicated
by the degree to which students attempted to calculate a
f-test statistic for f-test problems and a correlation coeffi-
cient for correlation problems. Thus, in Experiment 3 we
evaluate students' categorization of test problems indirectly
by noting which type of statistical test was computed for
each problem.
We eliminated the chi-square test and reduced the number
of example problems in Experiment
3
to keep the length and
complexity of the study to a manageable level. Because
Experiment 3 included an instructional component, it was
much longer than Experiments 1 and 2. If we had retained
the chi-square test and included three examples of each test
in Experiment 3, students would have had to devote several
more hours to the study. Given the brief nature of Experi-
ment 3, we focused on two problem types—correlation and
t test—that we suspected could be learned at least partially
in a short time. In addition, the results of Experiments
1
and
154QUILICI AND MAYER
Structure-Emphasizing
r-2
r-4
r-1
r-3
x-4
t-1
x-2
t-2
t-4
x-1
x-3
t-3
t-2
t-1 '
r-3 '
r-4
x-3 '
x-4
r-2
t-4
r-1
x-2
x-1
t-3
t-1
r-1 "
x-1
r-3
x-3
t-3
x-4 '
t-4
r-4
t-2 '
r-2
x-2
r
1 '
h
H
Y
Surface-Emphasizing
t = t-test
r = correlation
x = chi-square
1,
2, 3, and 4 = different surface stories
No-Example
Figure 2. Hierarchical cluster analysis results for three groups in Experiment
2.
Two centimeters
equals
1
unit of Euclidean
distance.
Each letter-number pair represents a sorting
problem:
t = t test,
x = chi-square, r = correlation;
1
= cover story about experience and typing speed, 2 = cover story
about temperature and precipitation, 3 = cover story about fatigue and mental alertness, 4 = cover
story about reading skill and grade point average.
2 and other studies suggested that two examples would be
sufficient to replicate the structure-emphasizing treatment in
Experiment 3.
In Experiment 3, we tested three hypotheses concerning
how experience with worked-out examples affects the strat-
egies students use to categorize statistics word problems.
Our first hypothesis concerned the quality of examples:
Experience with structure-emphasizing worked-out exam-
ples fosters structural schema construction more than expe-
rience with surface-emphasizing worked-out examples, so
that students who are given structure-emphasizing examples
of each problem type (i.e., structure-emphasizing group)
will be more likely to correctly categorize new problems
than students who receive examples that emphasize surface
features (i.e., surface-emphasizing group). To create struc-
ture-emphasizing examples of
f-test
and correlation prob-
lems,
we used the same two surface stories (e.g., work
experience and days absent, and serving speed and tips) to
construct worked-out examples of each of the two problem
types.
To create surface-emphasizing examples of t test and
correlation, we used two versions of one surface story for
one problem type and two versions of another surface story
for the other problem type.
Our second hypothesis concerned the manner in which
the examples are presented: Simultaneously presenting an
example of each type of problem fosters structural schema
construction better than presenting examples of the two
problem types separately, so that students who experience
two training sessions in which both problem types are
presented simultaneously (i.e., mixed group) will be more
STATISTICS WORD PROBLEMS155
likely to categorize new problems correctly than students
who experience two training sessions, one for each problem
type (i.e., blocked group). Our rationale for this prediction
was that participants in the mixed group would have the
opportunity to compare and contrast the two types of ex-
ample problems, whereas the blocked group would not.
Our third hypothesis concerned the mathematical ability
of each subject: High mathematical ability as measured by
the Scholastic Aptitude Test (SAT) enables structural
schema construction better than average or low mathemat-
ical ability, so that students with high SAT scores in math-
ematics (i.e., higher ability group) will be more likely to
categorize new problems correctly than students with aver-
age or low SAT scores in mathematics (i.e., lower-ability
group).
In short, we expected higher ability students to be
more likely than lower ability students to possess the math-
ematical skill of paying attention to structurally important
problem features. Therefore, we expected higher ability
students to be able to recognize structural features regard-
less of instructional treatment, so that the predicted patterns
(e.g., concerning structure-emphasizing vs. surface-empha-
sizing examples) would be strong for lower ability but not
higher ability students. This pattern is consistent with re-
search on attribute-by-treatment interactions (ATIs), in
which less able students benefit more from well-structured
instruction than do more able students (Mayer, 1987). Rel-
evant ability data were not available for conducting a sim-
ilar analysis in Experiments 1 and 2.
lation problems using Surface Story 2 in Part 2. The four practice
problems had surface stories that were different from each other
and from the example problems. The Appendix gives a worked-out
example for a f-test problem and a correlation problem.
The posttest set contained four sheets showing the step-by-step
computations needed for r-test and correlation problems (without
giving a surface story or sample question) and contained four
problem sheets containing two f-test and two correlation problems.
Each problem had a surface story that was different from the other
test problems and from worked-out example and practice prob-
lems.
Procedure. Each student worked individually in a separate
cubicle, with up to 5 students being tested at a time. There was no
contact among students, and treatments were randomly represented
in each session. Each student was first given a questionnaire
requesting information about his or her background in mathemat-
ics and statistics. On the basis of each student's reported SAT
score in mathematics, students were then classified as higher if the
score was above 575 and lower if the score was below 575.
Students were then randomly assigned to a specific level of ex-
ample quality and manner of presentation. Every student experi-
enced the same basic procedure, consisting of two training sessions
followed by a test session. During each training session, students
first were given two examples of worked-out statistics problems to
read for 8 min and then were given two practice problems to
complete in 20 min with the aid of the examples. After the practice
period, students were given the correctly worked-out solutions to
the practice problems they had just completed, and they had 4 min
to check their work. After a short break, students were given four
test problems to complete, one at a time, with 8 min for each
problem.
Method
Participants and design. The participants were 128 college
students from the same participant pool as in Experiments
1
and 2.
Sixteen students apiece served in the eight groups formed by
crossing the three independent variables: example quality (struc-
ture emphasizing or surface emphasizing), manner of presentation
(mixed or blocked), and mathematical ability (higher or lower). In
addition to these 128 participants, 32 others were tested. However,
the data from these participants were not analyzed either because
the students did not give their math SAT score or because they did
not attempt any of the practice problems.
Materials. The materials consisted of a questionnaire, four
instructional sets, and a posttest set.3 The questionnaire was iden-
tical to the one used in Experiments 1 and 2.
The four instructional sets each contained the following sheets
for Part 1: two worked-out examples involving t test, correlation,
or both; two corresponding practice problems to solve; and
worked-out solutions to the two practice problems. The same was
true for Part 2. The structure-emphasizing/mixed set contained
worked-out examples of t test and correlation using Surface Story