ArticlePDF Available

Abstract and Figures

Three studies show that the retrieval process itself causes long-lasting forgetting. Ss studied 8 categories (e.g., Fruit). Half the members of half the categories were then repeatedly practiced through retrieval tests (e.g., Fruit Or_____). Category-cued recall of unpracticed members of practiced categories was impaired on a delayed test. Experiments 2 and 3 identified 2 significant features of this retrieval-induced forgetting: The impairment remains when output interference is controlled, suggesting a retrieval-based suppression that endures for 20 min or more, and the impairment appears restricted to high-frequency members. Low-frequency members show little impairment, even in the presence of strong, practiced competitors that might be expected to block access to those items. These findings suggest a critical role for suppression in models of retrieval inhibition and implicate the retrieval process itself in everyday forgetting.
Content may be subject to copyright.
Journal of Experimental Psychology:
Learning, Memory, and Cognition
1994,
Vol. 20, No. 5,1063-1087
Copyright 1994 by the American Psychological Association, Inc.
0278-7393/94/13.00
Remembering Can Cause Forgetting:
Retrieval Dynamics in Long-Term Memory
Michael C. Anderson, Robert A. Bjork, and Elizabeth L. Bjork
Three studies show that the retrieval process itself causes long-lasting forgetting. Ss studied 8
categories (e.g., Fruit). Half the members of half the categories were then repeatedly practiced
through retrieval tests (e.g., Fruit Or ). Category-cued recall of unpracticed members of
practiced categories was impaired on a delayed test. Experiments 2 and 3 identified 2 significant
features of this retrieval-induced forgetting: The impairment remains when output interference is
controlled, suggesting a retrieval-based suppression that endures for 20 min or more, and the
impairment appears restricted to high-frequency members. Low-frequency members show little
impairment, even in the presence of strong, practiced competitors that might be expected to block
access to those items. These findings suggest a critical role for suppression in models of retrieval
inhibition and implicate the retrieval process itself
in
everyday forgetting.
A striking implication of current memory theory is that the
very act of remembering
may
cause forgetting. It is not that the
remembered item itself becomes more susceptible to forget-
ting; in fact, recalling an item increases the likelihood that it
will be recallable again at a later time. Rather, it is other
items—items that are associated to the same cue or cues
guiding retrieval—that
may be
put in greater jeopardy of being
forgotten. Impaired recall of such related items may arise if
access to them is blocked by the newly acquired strength of
their successfully retrieved competitors (Blaxton & Neely,
1983;
Brown, 1981; Brown, Whiteman, Cattoi, & Bradley,
1985;
Roediger, 1974, 1978; Roediger & Schmidt, 1980; Run-
dus,
1973).
This implication follows from three assumptions underlying
what we herein refer to as strength-dependent competition
models of interference: (a) the
competition assumption—that
memories associated to a common cue compete for access to
conscious recall when that cue is presented; (b) the
strength-
dependence assumption—that
the cued recall of an item will
decrease as a function of increases in the strengths of its
Michael C. Anderson, Robert A. Bjork, and Elizabeth L. Bjork,
Department of Psychology, University of California, Los Angeles.
The research reported herein was supported in part by Grant
4-564040-RB-19900 to Robert A. Bjork and Grant 4-564040-EB-19900
to Elizabeth L. Bjork from the Committee on Research, University of
California, Los Angeles, and by Grant MDA 903-89-K-0179 to Keith
Holyoak from the Army Research Institute. The article appears on
University Microfilms as part of a dissertation submitted to the
University of California, Los Angeles, in fulfillment of the degree of
PhD for Michael C. Anderson.
We gratefully acknowledge the assistance of Myra Jimenez, Steven
Machado, and Shirley Yu in the collection of data and of Catherine
Fritz, Dina Ghodsian, Keith Holyoak, Keith Horton, John Shaw,
Bobbie Spellman, and Tom Wickens for comments on drafts of this
article. We also thank Todd Gross, Steven Machado, Anthony Wag-
ner, and especially Bobbie Spellman for many thoughtful conversa-
tions on the topic of retrieval inhibition.
Correspondence concerning this article should be addressed to
Michael C. Anderson, Department of Psychology, University of
California, 405 Hilgard Avenue, Los Angeles, California 90024-1563.
competitors' associations to the
cue;
and (c) the
retrieval-based
learning assumption—that
the act of retrieval is a learning
event in the sense that it enhances subsequent recall of the
retrieved item. Taken together, these assumptions imply that
repeated retrieval of a given item will strengthen that item,
causing loss of retrieval access to other related
items.
We refer
to this possibility as
retrieval-induced
forgetting.
In this article,
we explore two questions regarding retrieval-induced forget-
ting, one empirical and the other theoretical: (a) Is retrieval-
induced forgetting a significant factor producing fluctuations
in the long-term accessibility of knowledge? and (b) To what
extent
do such
effects support the strength-dependence assump-
tion? We believe that exploring these questions may help solve
the puzzle of why so little of the knowledge available in
long-term memory remains consistently accessible.
Many studies illustrate that prior retrievals can make subse-
quent retrieval of related information more difficult, at least
within the context of a single testing session. For example, in
the domain of episodic memory, the study of output interfer-
ence has shown that an item's recall probability declines
linearly as a function of
its
serial position in a testing sequence.
This decline has been demonstrated with recall of paired
associates (Arbuckle, 1966; Roediger & Schmidt, 1980; Tulv-
ing
&
Arbuckle, 1963,1966) and categorized word lists (Dong,
1972;
Roediger,
1973;
Roediger
&
Schmidt,
1980;
Smith, 1971,
1973;
Smith, D'Agostino, & Reid,
1970);
it occurs regardless of
a category's serial position in the learning list (Smith, 1973),
and it does not result from the loss of items from primary
memory over time (Smith,
1971).
In semantic
memory,
speeded
generation of several category exemplars on the basis of letter
cues (e.g., Fruit A ) slows generation of later exemplars
and increases the number of generation failures (Blaxton &
Neely,
1983;
Brown,
1981;
Brown et al.,
1985).
These effects of
output interference in both episodic and semantic memory
violate expectations derived on the basis of semantic priming
and spreading activation, according to which retrieval should
facilitate recall of related knowledge, not impair it (Loftus,
1973;
Loftus & Loftus, 1974; Neely, 1976; Warren, 1977).
These effects show that retrieval-induced forgetting does
occur, at least within a single testing session, which some
1063
1064M. ANDERSON, R. BJORK, AND E. BJORK
authors have taken as evidence that retrieval is a basic process
underlying forgetting from long-term memory (Roediger, 1974).
Although these initial forays into retrieval-induced forget-
ting are suggestive, little work has been done to justify the
assertion that retrieval plays a significant role in producing
long-term fluctuations in accessibility. All studies of retrieval-
induced forgetting have emphasized the decline in recall
arising from retrievals occurring within a single test session.
The extrapolation from these findings to long-lasting impair-
ment hinges crucially on a theoretical interpretation of output
interference in terms of strength-dependent competition, which
is an interpretation that may not be warranted. For example,
no evidence suggests that these effects reflect anything other
than temporary suppression occurring within the brief span of
an episodic or semantic recall task. However, if the strength-
dependence interpretation is correct, such effects should not
be restricted to a single output
session:
A
single,
effortful recall
buried within the context of other thoughts and processes
should cause forgetting of related memories on even remote
occasions
provided that retrieval-based learning
endures.
When
we consider the ubiquity of retrieval in our daily cognitive
experiences, retrieval-induced forgetting might be a pervasive
source of long-lasting retrieval failures in long-term memory,
an implication that starkly contrasts with the cursory weight
given to retrieval processes in recent theoretical treatments of
interference (e.g., Mensink & Raaijmakers, 1988). Thus, a
major goal of the present work is to seek evidence for
retrieval-induced forgetting that endures beyond the retrieval
event during which it is induced.
The strength-dependence interpretation of retrieval-in-
duced forgetting depends, of course, on the assumptions
underlying strength-dependent competition. Although strength-
dependent competition has a long history in interference
theory (Anderson, 1976; McGeoch, 1936; Melton & Irwin,
1940;
Mensink
&
Raaijmakers, 1988) and remains popular as a
means of explaining a variety of phenomena (e.g., the increase
in part-set cuing inhibition with the number of
cues:
Roediger,
1974;
Rundus, 1973; the increase in retroactive interference
with the degree of interpolated
learning:
Mensink
&
Raaijmak-
ers,
1988; list-strength effects in free recall:
Ratcliff,
Clark, &
Shiffrin,
1990;
the exacerbation of
the
tip-of-the-tongue experi-
ence with recent presentation of similar words: Baddeley,
1982;
Jones, 1989; Reason & Lucas, 1984; Woodworth, 1938),
the empirical case for the strength-dependence assumption is
not as clearly established as those for the retrieval-based
learning assumption
(e.g.,
Allen, Mahler,
&
Estes,
1969;
Bjork,
1975;
Gardiner, Craik, & Bleasdale, 1973; Hogan & Kintsch,
1971) and the competition assumption (see Watkins,
1978,
for
a
review).
When studies show that strengthening
some
informa-
tion in memory impairs recall of other information, there is
substantial disagreement on the theoretical interpretation of
the impairment (regarding part-set cuing, see Basden, Basden,
& Galloway, 1977; Sloman, Bower, & Roher,
1991;
regarding
retroactive interference, see Greeno, James, DaPolito, &
Poison, 1978; Martin, 1971; Postman, Stark, & Fraser, 1968;
Riefer & Batchelder, 1988; regarding the tip-of-the-tongue
state,
see Brown, 1991; Burke, MacKay, Worthley, & Wade,
1991).
More troubling, however, than any such theoretical disagree-
ments are the various findings that strengthening can fail to
produce impairment. These failures are illustrated vividly in
studies by DaPolito (1966) and Blaxton and Neely (1983).
DaPolito explored the amount of proactive interference suf-
fered by a later studied associate to a cue (an A-C item) as a
function of the number of presentations of an earlier studied
associate to that cue (an A-B item). Although increasing the
presentations of the A-B items from one to three increased
recall for those items from 49% to 82%, recall of once-
presented A-C items went from 30% to 32% (see Riefer &
Batchelder, 1988, for detailed analysis of this study). In a
different but related theoretical context, Blaxton and Neely
(1983) demonstrated that prior presentation of several cat-
egory exemplars for speeded naming actually facilitated genera-
tion of target exemplars from semantic memory. In both
studies, strengthening of prior responses should have signifi-
cantly impaired subsequent retrieval of related items but did
not. If strengthening is not sufficient to cause impairment,
retrieval-based learning may not cause long-lasting retrieval-
induced forgetting.
Given the uncertain empirical status of the strength-
dependence assumption, we thought it useful to treat the
present work not only as an exploration of retrieval-induced
forgetting but also as a test of the strength-dependence
assumption
itself.
In the next section, we introduce a new
paradigm for examining the impact of retrieval on the long-
term accessibility of related information, and we contrast this
method with previous procedures used to investigate strength-
dependent competition. The new procedure improves on
previous paradigms
by
unconfounding the strengthening opera-
tion from other logical phases of the experiment, a problem
that has arguably generated many of the interpretational
difficulties surrounding strength-dependent competition. Next,
we develop predictions concerning the relative impairment
expected for different stimulus materials on the basis of a
general class of strength-dependent competition
models:
ratio-
rule models. If impaired recall is observed with the new
procedure, then retrieval-induced forgetting
will
be implicated
as a significant factor in producing long-term retrieval failures.
Furthermore, if the impairment follows the pattern expected
on the basis of the ratio rule, then we will have obtained
evidence for strength-dependent competition.
A Paradigm for Examining
Retrieval-Induced Forgetting
In constructing a paradigm to explore retrieval-induced
forgetting, we thought it important to consider both the logic
of strength-dependent competition and the conditions under
which retrieval-induced forgetting might be expected to occur
naturally. Because strength-dependent competition among
items is thought to occur with respect to
a
shared retrieval cue,
we placed special emphasis on cue-target relationships in all
phases of the paradigm. We also sought to minimize opportu-
nities for the formation of item-to-item (as opposed to cue-to-
item) associations, the presence of which could provide sub-
jects with
retrieval routes for circumventing strength-dependent
competition. Because retrieval-induced forgetting may arise
from retrieval-based learning that occurs long after initial
REMEMBERING CAUSES FORGETTING
1065
learning, we separated initial study and retrieval-based learn-
ing into distinct phases; we also included a substantial reten-
tion interval between retrieval-based learning and the final test
to examine the long-term effects of retrieval.
These considerations led to our designing a retrieval-
practice paradigm that consists of three phases: a study phase,
a retrieval-practice phase, and a final test phase. In the study
phase, subjects study a series of category-exemplar pairs, such
as Fruit Orange, with a typical series consisting of six members
of each of eight different categories. Because the exemplars of
a given category share the category label as a retrieval cue, they
should compete for access to conscious recall on later presen-
tation of the category cue. After the study phase, subjects
engage in directed retrieval practice on half of the items from
half of the categories (e.g., three items from each of four
categories). The retrieval practice of a given item is induced by
presenting a category name together with an exemplar stem
(e.g.,
Fruit Or ). Each exemplar test appears several
times throughout the practice phase, interleaved with practice
trials on other items to maximize the facilitatory effects of
retrieval practice. After a substantial retention interval (e.g.,
20 min), a final, surprise category cued-recall test is adminis-
tered: Subjects are cued with each category name and asked to
free recall any exemplars of that category that they remember
having seen at any point in the experiment. If strengthening
due to retrieval practice endures throughout the retention
interval, the practiced exemplars in a given category should
still create substantial competition for the unpracticed exem-
plars in that category on the delayed category cued-recall test.
The impact of this competition can be assessed by contrasting
the final recall of the unpracticed items from the practiced
categories with the final recall of items from the unpracticed
categories (i.e., those categories for which none of their
exemplars had been given retrieval practice). If impairment is
observed, we have evidence that retrieval-induced forgetting
may contribute to long-lasting retrieval failures and that these
failures may result from strength-dependent competition.
The separation of the retrieval-practice paradigm into three
phases appears to have several advantages over other well-
known procedures thought to provide evidence for strength-
dependent competition. These features are highlighted in
Figure 1, which contrasts the retrieval-practice paradigm with
the retroactive-interference and part-set cuing procedures.
These paradigms are represented according to their temporal
organization into learning (L), strengthening (S), and final test
(T) phases. (Distinct phases are depicted by
boxes;
contiguous
boxes indicate logically distinct, but co-occurring, phases.) In
the retroactive-interference paradigm, subjects learn a second
list of associates to the same stimuli (L2), and these associates
are strengthened by repeated study-test trials
(S);
this strength-
ening of second-list associates is thought to impair recall of
earlier responses from the first list (LI) on a subsequent test
(T) relative to a baseline condition in which subjects never
learned the second list (L2). In the part-set cuing paradigm,
several exemplars from an earlier studied categorized word list
(containing exemplars Li... LN) are presented as cues at test
(T),
presumably strengthening (S) those cues; this strengthen-
ing of the cue exemplars is thought to impair recall of the
Retroactive
InterferencePart-set
CuingRetrieval
Practice
Figure
1. The temporal organization of retroactive interference,
part-set cuing, and retrieval-practice paradigms into discrete phases.
Boxes denote distinct experimental phases; contiguous boxes denote
logically distinct but simultaneous phases; arrows indicate the
flow
of
time.
The letters L, S, and T designate learning, strengthening, and
testing of
items,
respectively. Note that the strengthening operation is
confounded
with
different
phases
for
all paradigms except
the retrieval-
practice paradigm. Note also that the retroactive interference para-
digm divides the learning of the
two
competitors (LI, L2) per stimulus
into
distinct
contexts,
whereas
all items
are learned in the
same
context
for other paradigms.
remaining noncue exemplars relative to a baseline condition in
which subjects receive no cues. The retrieval-practice para-
digm, as described above, is depicted in the right column of
Figure 1.
That strengthening does not occur in a distinct phase in the
retroactive-interference and part-set cuing paradigms compli-
cates interpreting the effects of that strengthening. The retro-
active-interference procedure confounds strengthening of L2
competitors with the acquisition of the new temporal context
(List 2) in which those competitors are learned, confusing the
relative contributions of strength-dependent competition and
response-set suppression to the impaired recall of LI associ-
ates (Postman et al., 1968); in the retrieval-practice procedure,
on the other hand, any response-set suppression on the
learning list caused by the retrieval-practice phase should be
equated across practiced categories and the within-subjects
baseline (i.e., those categories that remain unpracticed; see
Delprato, 1972, for a similar approach). The part-set cuing
paradigm confounds strengthening of competitors with presen-
tation of those items as retrieval cues on the final test,
1066M. ANDERSON, R. BJORK, AND E. BJORK
obscuring the relative effects of strength-dependent competi-
tion and those deriving from the role of strengthened items as
retrieval cues (Basden et al., 1977; see also Raaijmakers &
Shiffrin, 1981; Sloman et al., 1991); in the retrieval-practice
procedure,
a
long interval separates retrieval-based strengthen-
ing from the final test, and no items are presented as cues,
eliminating the psychological context of cuing. To the extent
that confounding the various factors described above with
strengthening compromises the measure of strength-depen-
dent competition in the retroactive-interference and part-set
cuing paradigms, the retrieval-practice paradigm may provide
a better means of testing strength-dependent competition.
Testing Strength-Dependent Competition
Models of Retrieval
Because our paradigm seemed to have certain advantages as
a means of testing strength-dependent competition, we took
our exploration of retrieval-induced forgetting as an opportu-
nity to evaluate strength-dependent competition more system-
atically. Because ratio-rule formulations of retrieval are the
most widely applied and best articulated strength-dependent
models (e.g., Anderson, 1976; Gillund & Shiffrin, 1984; Men-
sink & Raaijmakers, 1988; Raaijmakers & Shiffrin, 1981;
Rundus, 1973), we used a simple ratio-rule model to develop
predictions of the relative amount of impairment to be
expected across materials differing in their strength of associa-
tion to a cue.
In the present studies, we manipulated the taxonomic
frequency of exemplars in a category. In Experiments
1
and 2,
to test an implication of the basic ratio-rule equation, we
contrasted categories consisting entirely of strong exemplars
with categories consisting entirely of weak exemplars. For a
broad range of learning-rate assumptions, ratio-rule models
predict that retrieval-based strengthening should impair weak
exemplar categories to a proportionally greater extent than
strong exemplar categories (see Appendix A for a numerical
example). Qualitatively, the reason for this prediction is
straightforward. The ratio-rule model asserts that the probabil-
ity of retrieving an item is a function of the strength of
association of that item to the retrieval cue, relative to the
strength of association of all other memory items to that cue.
This relation can be expressed as a simple recall probability
ratio,
as in the following example: P(recall Orange given the
cue Fruit) = Strength of the Fruit-Orange association/sum of
strengths for all Fruit associates. When other items, such as
Banana, are strengthened through retrieval practice, the
denominator in the equation for Orange increases, decreasing
its recall probability ratio. Because retrieval practice will
increase the associative strength of a weaker item to a
proportionally greater extent (see Appendix A), proportional
impairment of its competitors will also be greater. If retrieval-
induced forgetting manifests this pattern of impairment across
strong- and weak-exemplar categories, specific evidence in
favor of ratio-rule formulations of strength-dependent compe-
tition will have been obtained; if it does not, the ratio rule, and
perhaps strength-dependent competition in general, may be
inadequate as an account of retrieval-induced forgetting.
Experiment 1
In Experiment
1,
we used the retrieval-practice paradigm to
determine whether retrieval-based learning causes long-
lasting memory failures. In the initial study phase, subjects
studied 8 six-item categories. Four of these categories were
composed of strong exemplars (e.g., Fruit Orange), and four
were composed of weak exemplars (e.g., Tree Hickory). After
the study phase, three exemplars from two strong and two
weak categories received retrieval practice (e.g., Fruit
Or ) three times each. The three retrievals for each
item, interleaved with tests of other items, were ordered to
produce an expanding sequence of intertest intervals for each
item to maximize the consequences of retrieval practice (see
Landauer & Bjork, 1978). After a 20-min retention interval, a
final unexpected category cued-recall test was administered:
Subjects were cued with each category name and asked to free
recall any members of that category they could remember
having been presented at any point in the experiment.
To describe our predictions (for each of the experiments we
report) more concisely and to simplify discussions throughout
this article, we have labeled the different types of categories
and items that occur in the retrieval-practice paradigm as
follows: Categories for which some of their members receive
retrieval practice are labeled Rp categories (i.e., retrieval
practice categories); categories for which no members receive
any retrieval practice are labeled Nrp categories (i.e., no
retrieval practice categories). The items within an Rp category
that actually receive retrieval practice are labeled Rp+ items
(i.e.,
Rp category, practiced items); items within an Rp
category that do not receive retrieval practice are labeled
Rp—
items (i.e., Rp category, unpracticed items); and,
finally,
items
within an Nrp category, none of which, of course, receive any
retrieval practice, are simply labeled Nrp items. If retrieval-
induced forgetting produces long-lasting retrieval failures,
retrieval practice of Rp+ items should impair later recall of
Rp-
items (relative to recall observed for the Nrp baseline),
even though retrieval-based learning occurred in a context
separated from the final test by 20 min. If impaired recall of
Rp-
items is caused by strength-dependent competition from
the Rp+ items, the impairment of weak Rp- items should be
proportionally greater than the impairment of strong Rp
items.
Method
Subjects
The subjects were 36 introductory psychology students from the
University of California, Los Angeles, whose participation partially
fulfilled a course requirement.
Design
Two factors, retrieval-practice status and category composition,
were manipulated within subjects. Retrieval-practice status had three
levels:
(a) Rp+
items,
which were practiced three times each
by
means
of an expanding schedule of category-plus-stem cued-recall tests (e.g.,
Fruit Or ) during the retrieval practice phase; (b) Rp items,
which were not practiced, but were members of the same category as
the Rp+ items, and (c) Nrp items, which received no additional
REMEMBERING CAUSES FORGETTING1067
retrieval practice and were not members of a practiced category. Nrp
items,
which were divided into two subgroups of three (called Nrpa
and Nrpb) for counterbalancing purposes, served as a baseline against
which to measure the positive effects of practice in the case of Rp+
items,
and the hypothesized negative effects of practice on
Rp
items.
Category composition had two levels: Strong categories, which
contained exemplars whose taxonomic frequency had an average rank
order of 8 (Battig & Montague, 1969); and weak categories, which
contained exemplars with an average rank order of
33.
The dependent
measure was the proportion of each type of item recalled on a final
category cued-recall test.
Procedure
The experiment
was
conducted in four
phases:
a learning, a practice,
a distractor, and a surprise category cued-recall phase. In the learning
phase, subjects were randomly assigned to one of two random orders
of the learning materials. Each subject was given a learning booklet,
face down, as well as an instruction page, which they followed as the
experimenter read the instructions aloud. Subjects were told that (a)
they were participating in an experiment on memory and reasoning,
(b) they would be given 5 s to study category-exemplar pairs and
should spend all of this time relating the exemplar to its category, (c)
after each
5
s passed,
a voice
on a tape recording would signal them to
turn the
page,
and (d) the sequence
was to
be repeated until all pairs in
the learning booklet had been presented. On completion of the
instructions, subjects were told to turn their booklets over and begin
studying.
Booklets and instructions were collected as soon as the learning
phase
was
completed. Subjects were then randomly assigned to one of
four practice counterbalancing conditions and
to
one of three retrieval-
practice orders for that counterbalancing condition. Subjects received
a booklet face down and a new instruction
page,
which they followed as
the experimenter read it aloud. Subjects were told that (a) each page
would contain one of the category labels that they had received in the
previous phase along with a hint about what exemplar they were to
retrieve; (b) the hint consisted of the first
two
letters of the appropriate
exemplar; and (c) they were to retrieve an item that they had seen,
rather than responding with any exemplar that fit the letter cues.
Subjects then turned their booklets over and began the test: They were
given 10 s to recall each cued exemplar, and a tape-recorded voice
instructed them when to turn pages. After the practice phase, subjects
participated in an unrelated causal reasoning experiment for
20
min.
In the testing
phase,
subjects
were
randomly assigned
to
one of three
random testing orders of the
categories.
Booklets
were
distributed face
down and the experimenter read instructions aloud. Subjects
were
told
that, at the top of each page, there would be a name of one of the
categories studied previously and that they should recall all exemplars
of that category that they had been shown at any time in the
experiment. Subjects were given 30 s for each category, and were then
instructed to turn the page.
Materials
Category
selection.
Ten categories, two of which were used as
fillers,
were drawn from several published norms (Battig
&
Montague,
1969;
Marshall & Cofer, 1970; Shapiro & Palermo, 1970). The 8
experimental categories were selected in the following manner. Rela-
tively unrelated categories (i.e., dissimilar and nonassociated catego-
ries) were chosen to ensure that measures of category-recall perfor-
mance were as independent as possible. Intercategory similarity and
association were first determined by the experimenters carefully
assessing the relatedness of the knowledge domains (e.g., If Fruit were
to be used, Vegetable would not be selected); these judgments were
reinforced, using the Marshall and Cofer (1970) norms, by minimizing
(a) the pairwise associations between category labels and (b) the
interexemplar associations (after particular exemplars had been cho-
sen).
The phonemic similarities among the category labels was also
minimized.
To reduce variations in stimulus complexity and associability,
category labels were constrained to be semantically unambiguous and
only one word in length (e.g., no categories such as Earth Formations
were included). Finally, the word frequencies (Kucera & Francis,
1967) of category labels were kept in the low to moderate range, with
all labels falling between
25
and
100
occurrences per million.
Exemplar
selection.
Once eight categories were found that met
these constraints, particular exemplars were chosen for each one (see
Appendix
B).
Four of the categories were randomly chosen to contain
all strong exemplars and four to contain all weak exemplars. Exem-
plars in three of the strong categories had an average rank order of 8
(median = 7, i.e., average position in a list rank ordered by frequency
of report), according to Battig and Montague (1969) category norms.
Exemplars in the remaining strong category (Leather) were drawn
from the Shapiro and Palermo (1970) norms and had an average rank
order of 3.8. Exemplars in the four weak categories had an average
rank order, according to Battig and Montague, of
33
(median = 23).
Thus,
there was a clear difference in the taxonomic frequency of
exemplars in the strong
versus
the weak categories.
Exemplars were also constrained to be low-frequency, unambigu-
ous,
noncompound words. The average word frequency (Kucera &
Francis, 1967) for all eight categories
was 13
occurrences per million,
SD = 3.8. No two exemplars began with the same first two letters,
ensuring that each two-letter cue in the retrieval-practice task would
be unique. In addition, to avoid interference of extraexperimental
items,
no chosen category exemplar had the same
first
two letters
as
an
unchosen category exemplar that was listed in the Battig and Mon-
tague (1969) norms. For example, the word trumpet could not be
chosen as a musical instrument because the word
trombone
might
produce extraexperimental interference. Items with strong a priori
item-to-item associations (e.g., cat and mouse as members of the set
animals) were avoided.
Finally, two constraints were used to match the effectiveness of the
first two letters of an exemplar as a retrieval cue for the retrieval
practice task: versatility matching and syllable matching. The versatil-
ity (Solso
&
Juel,
1980) of
a
set of letters corresponds to the number
of
words containing those letters in the specified positions. For example,
an estimate of the versatility of the letter combination BA in the first
two positions of a word is 413 because there are approximately 413
words that begin with that combination of letters in the Kucera and
Francis (1967) norms. Versatilities of the two-letter stems of exem-
plars were
constrained to be at a moderate level of difficulty (M = 281,
SD = 12) as measured by Solso and Juel. Finally, stems were con-
strained to provide less than one syllable of information. In ambiguous
cases,
we used
Webster's New Collegiate Dictionary
(1980) to determine
where syllabic breaks occurred.
Learning booklets. Learning booklets were constructed from the 48
experimental and 12
filler
items. The placement of these items in the
learning booklet was designed to minimize interexemplar associations
because such associations could provide secondary retrieval routes to
unpracticed items in the practiced categories, offsetting the impair-
ment caused by the competition for the primary retrieval cue. Two
measures were taken to minimize interitem association among cat-
egory members and to maximize attention to category-exemplar
relationships. First, category-exemplar pairs were presented to sub-
jects centered on individual pages in paired-associate format (e.g.,
Fruit Orange). Second, rather than presenting all exemplars from a
given category at once, the order of exemplars within a booklet was
determined by blocked randomization in which each block contained
one exemplar from each category, resulting in six blocks of 10 items
1068M. ANDERSON, R. BJORK, AND E. BJORK
(each block containing 8 items from experimental categories and 2
items from filler categories). The ordering of exemplars within each
block was determined randomly except that (a) in the first block, filler
items appeared in the beginning to control for primacy effects; (b) in
the last block, filler items appeared at the end to control for recency
effects; and (c) throughout the booklet, no two categories appeared in
sequence more than once. Two different learning booklets were
constructed, in which both the ordering of categories within blocks and
the list position of particular category items varied.
Retrieval-practice booklets. Each page of a retrieval-practice book-
let contained one test of
a
single category exemplar. The category label
appeared centered on the page with the first two letters of the
exemplar printed two spaces to the right of it, followed by a solid line
to indicate that the item was incomplete (e.g., Fruit Or ). The
stem of the exemplar was provided to direct subjects to retrieve a
particular item. The solid line was the same length for all items so that
no cues for word length would be given.
To construct retrieval-practice booklets, we first defined an abstract
ordering of exemplar tests using the following constraints. The first
and last few items in all practice booklets were tests of filler items to
acquaint subjects with the practice task and to control for primacy and
recency effects on final recall. All experimental items were tested three
times on an expanding schedule, with an average spacing of 3.5 trials
between the first and second test and 6.5 trials between the second and
third test. In general, no two category members were tested on
adjacent pages, and the average test position of each category in the
test booklet was kept constant. To the extent possible, we prevented
particular sequences of category-exemplar tests from appearing con-
secutively more than once (as is prone to occur with systematic spacing
manipulations) by inserting tests of filler items.
To control for specific-category effects, we counterbalanced which
categories were practiced and which were not. The eight experimental
categories were divided into two random sets of four (referred to as Set
A and Set B), with the constraint that two strong and two weak
categories appeared in each set. Half of the subjects performed
retrieval practice on Set A and the other half of the subjects on Set B.
To control for specific-exemplar effects, we further divided Set A and
Set B into two random subsets (referred to as Subsets Al, A2, Bl, and
B2).
For Subset Al, three exemplars were randomly selected from
each of the four categories in A, with the remaining three exemplars
constituting A2. Half of the subjects who practiced the Set A
categories practiced Al exemplars, and the remaining subjects prac-
ticed A2 exemplars. Subsets Bl and B2 were constructed and distrib-
uted in the same manner (see Appendix B for the materials and their
divisions into these sets). These procedures ensured that every item
participated in every condition equally often, and resulted in four sets
of 12 items (Al, A2, Bl, and B2) from which we constructed
retrieval-practice booklets.
Each of the four 12-item counterbalancing sets was assigned to the
abstract ordering of exemplar tests three times, resulting in
12
booklets
of
51
pages (three practice orders for each of the four counterbalanc-
ing sets). Distractor materials were booklets containing causal-
reasoning tasks.
Test booklets. Each page of the nine-page test booklets contained
one category cue centered at the top. The first page for all testing
booklets was one of the filler categories (mountains), which was
inserted to minimize variance due to output interference. The order of
the remaining experimental categories was random, except that across
the three testing orders, the average test position for each category and
each condition was approximately the same. Each of the three testing
orders was combined with each of the 12 practice booklets, yielding 36
distinct combinations.
Finally, we used a portable tape recorder to play the tape instructing
subjects when to turn booklet pages and a stopwatch to time subjects in
the final test phase.
Results and Discussion
Retrieval
Practice
The retrieval practice success rates for Rp+ items varied as
a function of category composition, with
74%
and
90%
success
rates being obtained across weak and strong Rp-t- items,
respectively. (Note that potential difficulties of interpretation
created by the differing rates of retrieval-practice success are
addressed in Experiment 3).
Final Test Performance
All
analyses were
first conducted treating the counterbalanc-
ing subgroups of Nrp items as distinct levels of the retrieval
practice factor. Because no significant difference was obtained
between the recall means of these subgroups (M = 48.8% and
48.1%
for Nrpa and Nrpb items, respectively) nor was there a
simple interaction between the Nrpa-Nrpb and the strong-
weak manipulation, the data from these subgroups were
combined in the results reported below.
Table
1
shows the percentages of each type of
item
that were
correctly recalled for the strong and weak categories, respec-
tively. As expected, repeatedly retrieving several members of
a
studied category improved the recall of those items
(Rp+ = 73.6%) relative to the baseline (Nrp = 48.4%) on the
final delayed recall test, F(l, 32) =
136.9,
p <
.0001,
MSe =
.022.
More important, however, is the finding of impaired
recall for the remaining unpracticed category exemplars
(Rp-
= 37.5%) relative to the same baseline, F(l, 32) = 30.3,
p <
.0001,
MSe = 019. This pattern of improved recall for
Rp+ items and impaired recall for Rp- items is consistent
with the item-specific interference predicted by strength-
dependent competition models of forgetting: That is, retrieval
practice appears to have produced enduring retrieval-based
learning of the Rp+ items, as evidenced by their improved
recall performance, thereby reducing the competitiveness of
the
Rp
items during the
final
recall test,
as
evidenced
by
their
impaired recall performance. Furthermore, this pattern of
results indicates that retrieval-induced forgetting is not re-
stricted to a single output session and may, in fact, contribute
to long-lasting retrieval failures.
As expected, the main effect of our category composition
manipulation was significant, with strong exemplars being
recalled at a higher level than weak exemplars (M = 58.3%
and 45.7%, respectively), F(l, 32) =
53.2,
p <
.0001,
MSC
=
Table 1
Mean
Percentage
of Items Recalled on a Category Cued-Recall
Test as a Function of Category Composition in Experiment 1
Category composition
Strong exemplars
Weak exemplars
Retrieval practice status of item
Rp-t-
81.0
66.2
Rp-
40.3
34.7
Nrp
56.0
41.0
Note. Rp+ = practiced exemplars from practiced categories; Rp =
unpracticed exemplars from practiced categories; Nrp = unpracticed
exemplars from unpracticed categories.
REMEMBERING CAUSES FORGETTING1069
.022.
An
analysis
of the
magnitudes
of
retrieval-practice
facilitation
for
strong and weak exemplars, however, revealed
that the absolute improvement for weak items was not reliably
different from that
for
strong items,
(Rp+
Nrp
=
66.