Content uploaded by Henry Roediger
Author content
All content in this area was uploaded by Henry Roediger on Jan 03, 2017
Content may be subject to copyright.
23
2
IntrIcacIes of spaced retrIeval
A Resolution
Henry L. Roediger III and Jeffrey D. Karpicke
Robert Bjork has spent his entire professional life studying learning and
memory, and many of us have spent our lives (in part) reading his path-
breaking research. One interesting characteristic of Bob’s work, much
of it conducted in collaboration with Elizabeth Bjork, is the oen coun-
terintuitive nature of the ndings emanating from their lab. At the risk
of overstatement, one can view many of the important contributions
that the Bjorks have made as creating a paradox and then mounting a
satisfactory explanation for it.
Our chapter will deal with several paradoxes raised by the Bjorks’
work. ere are three interrelated puzzles. First, remembering an event
that is repeated is greatly aided if the rst presentation is forgotten to
some extent before the repetition occurs. (Yes, you read it correctly—
good remembering of an event can depend on its forgetting.) Second,
retrieving an event can be a more potent learning opportunity than
restudying it, which ies in the face of educational wisdom that study-
ing creates learning and testing merely measures it. ird, putting these
two paradoxes together, testing an event has a greater eect if one waits
for some forgetting to make retrieval more eortful and dicult.
is last claim seems especially puzzling, because if we want to
test people, shouldn’t we want to do it under conditions in which they
cannot make errors? Aer all, the idea of learning through “errorless
retrieval” is a hallmark of certain approaches to memory remediation
Y109937_C002.indd 23 7/15/10 12:45:27 PM
24 • Henry L. Roediger III and Jerey D. Karpicke
in brain-damaged individuals. As we shall show, these approaches
advocating errorless retrieval imply a wrong assumption, at least in
healthy people (the case may be dierent in older adults and brain-
damaged individuals). We can thank Bob and Elizabeth Bjork for these
insights. In this chapter we unpack them and show how and when they
are true.
Forgetting an event Can enhanCe its relearning
e theme of this volume is how successful forgetting can sometimes
enhance remembering. e case is most obvious in studies of directed
(or intentional) forgetting. If people must remember two sets of infor-
mation successively, they can learn and remember the second set better
if they have been told just before learning it that they can forget the
rst set of material they recently learned. at is, if two lists are pre-
sented, getting a forget instruction for the rst list improves retention
of a second list relative to the case where subjects feel responsible for
remembering the rst list while learning the second list. Establishing
this fact was one of Bob Bjork’s rst major scientic contributions (e.g.,
Bjork, LaBerge, & LeGrand, 1968; Bjork, 1970). Intentional forgetting
has been examined in many studies over the years, and whole volumes
are devoted to it (Golding & MacLeod, 1998).
Forgetting of information can lead to successful remembering in
another, more paradoxical way, too. Strangely, successful remembering
of information can depend—in certain situations—on having success-
fully forgotten (to some degree) the same information earlier. e pre-
vious statement may seem weird or even patently absurd, but we review
evidence here that it is true. Once again, Bob Bjork was responsible for
this critical insight (Bjork & Allen, 1970). e condition in which the
previous statement holds true occurs when an event to be remembered
is repeated in some form, either restudied or tested. To the extent that
a rst presentation is forgotten, its repetition will be well remembered.
Bjork gleaned this insight from research on the spacing eect and then
extended it. e spacing eect (e.g., Glenberg, 1976; Madigan, 1969;
Melton, 1970) refers to the situation when events are repeated and the
spacing or lag between repetitions is varied. When an event occurs,
its repetition has little eect on retention when the repetition occurs
immediately (when the event is still fresh from its rst presentation), but
the impact grows as the repetition is delayed. Figure 2.1 shows a typical
spacing eect from a careful study by Madigan (1969) in which words
were presented once or twice in a long list and the spacing between rep-
etitions was manipulated. Free recall of the list items was the dependent
measure. Of course, as the lag increases since the rst presentation in
these kinds of experiments, the rst presentation is increasingly forgot-
ten. So the second presentation creates more durable learning when it
occurs with an increasing amount of forgetting of the rst presentation
(up to some limiting condition). Crowder (1976, Chapter 9) spells out
the logic quite clearly.
Bjork and Allen (1970) took this observation from the spacing eect
and created an experiment in which the time between presentations
would be held constant, but forgetting could still be manipulated by
varying the diculty of the task given to the subject between presenta-
tions. Subjects either performed a dicult task (one causing more for-
getting) aer the rst presentation or performed an easy task (with less
forgetting) between presentations. Sure enough, the second presenta-
tion led to greater recall on a nal criterial test when it occurred aer
the dicult rather than the easy interpolated task. Ergo, forgetting of
the information causes its greater retention aer a repetition. Others
replicated this nding (Robbins & Wise, 1972; Tzeng, 1973), but it
may not extend to all situations (Roediger & Crowder, 1975). However,
Logan, Roediger, and McDermott (2010) have shown how this prin-
ciple—greater forgetting prior to a representation leading to greater
recall—may benet foreign language vocabulary learning.
More recently, Storm, Bjork, and Bjork (2008) examined recall of
items aer two presentations. Aer the rst presentation, some items
Y109937_C002.indd 24 7/15/10 12:45:27 PM
Intricacies of Spaced Retrieval • 25
measure. Of course, as the lag increases since the rst presentation in
these kinds of experiments, the rst presentation is increasingly forgot-
ten. So the second presentation creates more durable learning when it
occurs with an increasing amount of forgetting of the rst presentation
(up to some limiting condition). Crowder (1976, Chapter 9) spells out
the logic quite clearly.
Bjork and Allen (1970) took this observation from the spacing eect
and created an experiment in which the time between presentations
would be held constant, but forgetting could still be manipulated by
varying the diculty of the task given to the subject between presenta-
tions. Subjects either performed a dicult task (one causing more for-
getting) aer the rst presentation or performed an easy task (with less
forgetting) between presentations. Sure enough, the second presenta-
tion led to greater recall on a nal criterial test when it occurred aer
the dicult rather than the easy interpolated task. Ergo, forgetting of
the information causes its greater retention aer a repetition. Others
replicated this nding (Robbins & Wise, 1972; Tzeng, 1973), but it
may not extend to all situations (Roediger & Crowder, 1975). However,
Logan, Roediger, and McDermott (2010) have shown how this prin-
ciple—greater forgetting prior to a representation leading to greater
recall—may benet foreign language vocabulary learning.
More recently, Storm, Bjork, and Bjork (2008) examined recall of
items aer two presentations. Aer the rst presentation, some items
AU: 2009 in references.
0.0 1P 0248 20
Lag
40
0.1
0.2
Probability of Recall
0.3
0.4
0.5
0.6
Figure 2.1 The basic spacing (or lag) effect in free recall. Words were either presented once (1P)
or repeated. When repeated, items occurred at various spacings indicated on the abscissa. Later
recall increased as a function of lag between presentations. (Data are adapted from Madigan, S. A.,
Journal of Verbal Learning and Verbal Behavior, 8, 828–835, 1969.)
Y109937_C002.indd 25 7/15/10 12:45:27 PM
26 • Henry L. Roediger III and Jerey D. Karpicke
were subjected to retrieval-induced forgetting (using the Anderson,
Bjork, & Bjork, 1994, technique) and others were not. All items were
repeated and then recalled on a later test. Storm et al. found that “items
that were relearned beneted more from that relearning if they had
previously been forgotten” (2006, p. 230). ey commented that this
outcome “is very surprising from a common sense standpoint” (ibid.).
Of course, so are all the ndings reviewed here: How and why should
greater forgetting of an event before it is presented again cause better
later retention? at mystery runs throughout this chapter (see Crowder,
1976, pp. 273–314, for ideas in the context of spacing eect research).
retrieval as a MeMory ModiFier
e remainder of this chapter is about the eects of testing one’s mem-
ory on later retention. is is not a new topic. In fact, it predates the
festschri for Bob Bjork by exactly 100 years, if we ta ke the date as being
of the rst empirical papers we can nd on the topic (Abbott, 1909;
see Roediger & Karpicke, 2006a, for a review). e discovery made by
Abbott and replicated by countless others is that the eect of taking a test
is not neutral but alters later retention. When information is correctly
retrieved on a test, this act makes the probability of future retention on
a delayed test greater than if no test had occurred or even if the person
had restudied the material rather than being tested on it (see Roediger &
Karpicke, 2006b; Whitten & Bjork, 1977, among many others).
In the cognitive psychology of memory, the 1970s were the heyday of
studies of retrieval, with many important papers on topics such as the
encoding specicity principle (Tulving & omson, 1973) and transfer-
appropriate processing (Morris, Bransford, & Franks, 1977). Another
milestone publication of that era was R. A. Bjork’s (1975) chapter that has
the same title as this section heading. He argued that educators and psy-
chologists both tended to ignore the importance of testing. He wrote:
Retrieval from memory is oen assumed, implicitly or explic-
itly, as a process analogous to the way in which the contents of a
memory location in a computer are read out, that is, as a process
that does not, by itself, modify the state of the retrieved item in
memory. In my opinion, however, there is ample evidence for a
kind of Heisenberg principle with respect to retrieval processes:
an item can seldom, if ever, be retrieved from memory without
modifying the representation of that item in memory in signi-
cant ways. (1975, p. 123)
AU: 2008 meant?
Y109937_C002.indd 26 7/15/10 12:45:27 PM
Intricacies of Spaced Retrieval • 27
were subjected to retrieval-induced forgetting (using the Anderson,
Bjork, & Bjork, 1994, technique) and others were not. All items were
repeated and then recalled on a later test. Storm et al. found that “items
that were relearned beneted more from that relearning if they had
previously been forgotten” (2006, p. 230). ey commented that this
outcome “is very surprising from a common sense standpoint” (ibid.).
Of course, so are all the ndings reviewed here: How and why should
greater forgetting of an event before it is presented again cause better
later retention? at mystery runs throughout this chapter (see Crowder,
1976, pp. 273–314, for ideas in the context of spacing eect research).
retrieval as a MeMory ModiFier
e remainder of this chapter is about the eects of testing one’s mem-
ory on later retention. is is not a new topic. In fact, it predates the
festschri for Bob Bjork by exactly 100 years, if we ta ke the date as being
of the rst empirical papers we can nd on the topic (Abbott, 1909;
see Roediger & Karpicke, 2006a, for a review). e discovery made by
Abbott and replicated by countless others is that the eect of taking a test
is not neutral but alters later retention. When information is correctly
retrieved on a test, this act makes the probability of future retention on
a delayed test greater than if no test had occurred or even if the person
had restudied the material rather than being tested on it (see Roediger &
Karpicke, 2006b; Whitten & Bjork, 1977, among many others).
In the cognitive psychology of memory, the 1970s were the heyday of
studies of retrieval, with many important papers on topics such as the
encoding specicity principle (Tulving & omson, 1973) and transfer-
appropriate processing (Morris, Bransford, & Franks, 1977). Another
milestone publication of that era was R. A. Bjork’s (1975) chapter that has
the same title as this section heading. He argued that educators and psy-
chologists both tended to ignore the importance of testing. He wrote:
Retrieval from memory is oen assumed, implicitly or explic-
itly, as a process analogous to the way in which the contents of a
memory location in a computer are read out, that is, as a process
that does not, by itself, modify the state of the retrieved item in
memory. In my opinion, however, there is ample evidence for a
kind of Heisenberg principle with respect to retrieval processes:
an item can seldom, if ever, be retrieved from memory without
modifying the representation of that item in memory in signi-
cant ways. (1975, p. 123)
Bjork’s chapter went on to report research on retrieval as a memory
modier. He interpreted the phenomenon of the testing/retrieval eect
through the lens of the then-new levels of processing ideas of Craik
and Lockhart (1972), maintaining that there could be levels of process-
ing during retrieval just like there were during encoding. Specically,
when retrieval occurred under easy, supercial conditions, it did not
benet later retention. However, when retrieval involved more dicult
and complex processes, the eects on later recall were much greater.
us, all acts of retrieval are not equal: Some confer great benet and
some provide little or no benet. We return to this theme, too, later in
the chapter.
A couple of years later, Whitten and Bjork (1977) reported an ele-
gant experiment that documented Bjork’s earlier points quite well. We
report only a sketch of the logic here; the actual experiment was more
complex. e authors presented subjects with two words to be remem-
bered and then had them perform a distracter task for varying amounts
of time aerwards: 4, 8, or 14 seconds. At this point, the items were
either presented again or tested. Subjects had to recall the pair of words
on test trials. When items were tested, recall declined from .72 to .61
to .54 across the three intervals. No feedback was given, so the level of
retrieval success is critical in the case of tested events. Of course, when
items were restudied, subjects were reexposed to 100% of the original
items, so testing put items in that condition at a disadvantage relative
to repeated study conditions, especially in the long-delayed conditions.
A nal test was given a bit later, aer many items had been presented in
these conditions.
We consider here the nal test results for items that were studied
twice or studied once and then tested. For simplicity, we consider only
the extreme lags in this gure, those items that had been tested or
restudied aer lags of 4 or 14 seconds during the initial learning phase.
e results can be seen in Figure 2.2, with data points estimated from
Whitten and Bjork’s (1977) Figure 1. Final recall showed a spacing eect
in both cases: Performance was better when the second presentation or
the test occurred aer 14 seconds of distracter activity rather than only
4 seconds, which shows the usual lag eect (in both restudy and test-
ing). In addition, nal recall performance was better in the condition
in which subjects had taken a test during learning than when they had
restudied the item. Note that this testing eect occurred despite the fact
that, as the delay increased, recall on that rst test became increasingly
poor, such that barely more than half the items (54%) were recalled on
the initial test aer 14 seconds of distracter activity. us when overt
recall occurred early (aer 4 seconds), it had less of a positive eect on
Y109937_C002.indd 27 7/15/10 12:45:28 PM
28 • Henry L. Roediger III and Jerey D. Karpicke
a nal test than when recall occurred aer 14 seconds (despite the fact
that recall on the initial test dropped during this period). Whitten and
Bjork interpreted the results as indicating that retrieval diculty was
the critical component and cited related research by Jacoby and Bartz
(1972) as reinforcing their point (see too Gardiner, Craik, & Bleasdale,
1973; Jacoby, 1978).
Whitten and Bjork’s (1977) results may look rather slender. e
advantage of testing to restudy was only 6% or so, although it was con-
sistent. However, the problem of low performance on the initial test
should be borne in mind. When Whitten and Bjork performed con-
ditional analyses, examining nal recall performance conditional on
subjects successfully recalling items on the rst test, the testing eect
was much larger. Yet such conditional analyses raise the specter of item
selection eects. e general logic is that “easier” items are, by deni-
tion, the ones recalled on the initial test. erefore, any resulting advan-
tage of recalling these items at a higher level on the delayed test may be
due to selection of ease items in this condition rather than an eect of
testing. at is true, but many analyses have shown convincingly that
testing eects are not due to item selection eects and are not restricted
only to easy items (Karpicke, 2009; Karpicke & Roediger, 2007b, 2008;
Roediger & Karpicke, 2006), and even in Whitten and Bjork’s study
there was an absolute advantage in the testing conditions when all items
AU: Please indicate 2006a
or 2006b.
414
Spacing (Seconds)
0.2
Proportion Recalled
0.3 0.28
0.34 0.35
0.41
Test
Restudy
0.4
Figure 2.2 Final recall results as a function of the lag between study of a word pair and its restudy
(white bars) or test (gray bars). Both spacing (lag) and testing had positive effects. (Data adapted
from Whitten, W. B., & Bjork, R. A., Journal of Verbal Learning and Verbal Behavior, 16, 465–478,
1977, Figure 1.)
Y109937_C002.indd 28 7/15/10 12:45:28 PM
Intricacies of Spaced Retrieval • 29
a nal test than when recall occurred aer 14 seconds (despite the fact
that recall on the initial test dropped during this period). Whitten and
Bjork interpreted the results as indicating that retrieval diculty was
the critical component and cited related research by Jacoby and Bartz
(1972) as reinforcing their point (see too Gardiner, Craik, & Bleasdale,
1973; Jacoby, 1978).
Whitten and Bjork’s (1977) results may look rather slender. e
advantage of testing to restudy was only 6% or so, although it was con-
sistent. However, the problem of low performance on the initial test
should be borne in mind. When Whitten and Bjork performed con-
ditional analyses, examining nal recall performance conditional on
subjects successfully recalling items on the rst test, the testing eect
was much larger. Yet such conditional analyses raise the specter of item
selection eects. e general logic is that “easier” items are, by deni-
tion, the ones recalled on the initial test. erefore, any resulting advan-
tage of recalling these items at a higher level on the delayed test may be
due to selection of ease items in this condition rather than an eect of
testing. at is true, but many analyses have shown convincingly that
testing eects are not due to item selection eects and are not restricted
only to easy items (Karpicke, 2009; Karpicke & Roediger, 2007b, 2008;
Roediger & Karpicke, 2006), and even in Whitten and Bjork’s study
there was an absolute advantage in the testing conditions when all items
were included. Testing eects are oen quite large in other experiments
(see Roediger & Karpicke, 2006a).
expanding retrieval sChedules
In 1978 Landauer and Bjork provided another important empirical
contribution that has guided research and thinking in the intervening
years. ey asked: If testing aids retention (and it does), and if multiple
tests provide greater benets to retention than do single tests (also true),
what schedule of testing provides the best performance? If we want to
learn a person’s name, or foreign language vocabulary, or denitions
of scientic concepts, what is the best way to schedule our self-testing?
is question is of critical importance for students who must learn a
large body of factual knowledge.
Two experiments by Landauer and Bjork (1978) sought an answer.
e authors contrasted several dierent possible schedules of testing
that will be described momentarily. e materials subjects learned were
paired associates (either rst names with last names in Experiment 1
or face-name pairs in Experiment 2). Aer studying a pair, students
received various schedules of repeated tests.
We will describe selected conditions of their Experiment 1 here. In
one condition, items were presented only once. In four other condi-
tions, four schedules of repeated testing with various schedules of spac-
ing between tests were used. ree tests were given in all conditions, but
the lags between tests varied according to the four schedules of spacing.
e conditions of repeated testing were uniform-short spacing, uni-
form-moderate spacing, expanding spacing, and contracting spacing.
(We provide an operational explanation of these labels shortly.) During
tests, students were given the rst name of the person and asked to pro-
duce the last name (in Experiment 1). In the two uniform conditions, the
three tests were given with equal intervals between them. us, in the
uniform-short condition, students were tested three times immediately
aer studying a pair. Following Landauer and Bjork (1978), we will refer
to this condition as the 0-0-0 condition, because no intervening items
occurred between tests. e uniform-moderate condition employed a
5-5-5 schedule of spacing, meaning that ve intervening study or test
events occurred between the tests of a particular pair in this condition.
is condition is also called an equal interval condition, because the
interval between tests is equivalent. e expanding test condition used
a 1-4-10 spacing, indicating that a pair was rst tested aer only inter-
vening item, then aer 4 more, and nally aer 10 intervening items. In
the contracting condition, the spacing was reversed: 10, 4, and 1. Many
Y109937_C002.indd 29 7/15/10 12:45:28 PM
30 • Henry L. Roediger III and Jerey D. Karpicke
items were tested in these various conditions. In addition, as a baseline
control condition, some items were presented a single time and never
tested, which permits an answer to the question of what benets the
various testing schedules have over and above a single presentation of a
pair with no testing. A nal point is that all tests in these experiments
were given without feedback.
Aer the acquisition phase of the experiment just described, students
were given a nal test 30 minutes later (with a lecture occurring dur-
ing the interval). ey were again given the rst name of the pair and
asked to produce the last name. e results (estimated from Figure 2 of
Landauer & Bjork, 1978) are presented in Figure 2.3. All the testing con-
ditions produced better nal recall than the single-presentation study
condition, but performance diered widely among the testing condi-
tions (despite the fact that the number of prior tests was held constant
at 3). e uniform-short (0-0-0) condition was poorest, the uniform-
long (5-5-5) and contracting (10-4-1) conditions were intermediate, and
the expanding condition (1-4-10) was best. Note that the three latter
1-4-105-5-50-0-0Study Once 10-4-1
Spacing Condition
0.2
Proportion Recalled
0.3
0.25
0.33
0.47
0.42
0.38
0.4
0.5
Figure 2.3 Final recall after either a single presentation (study once) or a single presentation and
three tests. Schedules of the three tests had a large effect on recall. All testing conditions aided
recall relative to the single presentation condition, but the massed testing condition conferred the
least benefit, and the expanding retrieval condition produced the most benefit. (Data adapted from
Landauer, T. K., & Bjork, R. A., in M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical Aspects
of Memory, Academic Press, London, England, 1978, pp. 625–632, Figure 2.)
Y109937_C002.indd 30 7/15/10 12:45:29 PM
Intricacies of Spaced Retrieval • 31
conditions all have equivalent numbers of total events between tests
(15); the critical point is how they were distributed. e expanding
retrieval schedule was best. Experiment 2, which used face-name pairs,
reported the same nding.
Landauer and Bjork (1978) extolled expanding retrieval as the best
method to learn new information such as names and faces, and prob-
ably everyone reading their report quickly agreed. (e rst author of
this paper certainly did, when he read it; the second author here was
not yet born.) e underlying rationale seems so straightforward and
the benet seems commonsensical (aer the fact). e advice would
be that when you meet a new person and hear her name, you should
retrieve it rather quickly before the name is lost from immediate (work-
ing) memory. at initial retrieval ensures you encoded the item. Aer
that initial retrieval, you should then wait a bit longer and retrieve it
again, to practice retrieval at an intermediate time span. en, nally,
you should wait even longer for a further retrieval that would solidify or
consolidate the memory more permanently. Landauer and Bjork wrote:
“e expanding procedure may thus be seen as an eective shaping pro-
cedure for successively approximating the desired behavior of unaided
recall at long delays” (Landauer & Bjork, 1978, p. 631).
For many psychologists who had learned about shaping of behav-
ior through reinforcement by successive approximations to the desired
behavior (e.g., Skinner, 1953, Chapter VI), the principle seemed intui-
tive (at least with the 20/20 wisdom of hindsight). Many textbook writ-
ers and teachers (again, the rst author included) began to preach that
expanding retrieval was the best way to practice new information to
retain it best.
e main thrust in the remainder of this chapter is to claim that,
despite the early rush to embrace expanding retrieval as a central tech-
nique in using retrieval-enhanced learning via testing, the idea is fun-
damentally awed. As it has usually been operationalized in extant
research, expanding retrieval has a fatal aw: e rst test given (oen
aer lags of zero or one intervening item from initial presentation)
makes retrieval “too easy,” and making retrieval easy undermines its
positive eect. We provide evidence below to support this claim, but
of course it took many years for researchers to understand this point.
Aer all, the data in Landauer and Bjork’s (1978) paper showed that
expanding retrieval was better than equal interval retrieval, so what is
the problem? We describe it below.
Although Landauer and Bjork’s (1978) claims now seem wrong to us,
Bob Bjork actually anticipated the problem in his writings before that
1978 paper, ones we reviewed above. In his 1975 chapter, Bjork argued
Y109937_C002.indd 31 7/15/10 12:45:29 PM
32 • Henry L. Roediger III and Jerey D. Karpicke
that retrieval diculty is critical to the testing eect—the more dif-
cult the retrieval on a rst test, the better the later recall on a second
test. However, in the wisdom of hindsight, the expanded retrieval tech-
nique makes an initial retrieval very easy: In any schedule in which the
rst retrieval occurs aer a lag of zero intervening items, it is essen-
tially perfect, and with one intervening item performance does not
drop much. ese are the standard lags for initial tests in expanding
retrieval conditions. As we shall see later in the chapter, the diculty
of the rst retrieval in the typical expanding scheme is critical to later
performance. But we are getting ahead of ourselves. Before we discuss
this later part of the story, we will review (albeit briey) the 30-year his-
torical impact of the Landauer and Bjork paper by selectively reviewing
research from 1978 to 2007.
expanding retrieval: researCh and Controversy
A strange thing happened to research in this area aer publication of
Landauer and Bjork’s (1978) landmark paper: nothing. For many years
no one did research on the issue of expanding retrieval, at least not
compared to that for equally spaced retrieval. e matter seemed to
have been considered a closed case; no further research seemed needed.
Why? Our guess is that the ndings (although new) made so much sense
that everyone nodded and said “of course.” e fact that the ndings
were compelling and intuitive seemed to choke o further inquiry into
the matter for about a decade. On the positive side, many people talked
about the ndings and included them in lectures and books, which was
hardly surprising because they were interesting and were directed at an
important practical problem.
In this section, we provide a selective overview of research directed at
this issue aer the 1978 paper until 2007, when a spate of new research
was published. Balota, Duchek, and Logan (2007) have provided a
much more thorough review of work during this period, which should
be consulted for additional detail.
A few researchers did examine expanding retrieval sequences
as a mode of learning. Rea and Modigliani (1985) tested third grade
school children as they learned multiplication facts and spelling words.
However, their control condition was massed testing—four tests with no
other items between tests (0-0-0-0, using the notation above). Rea and
Modigliani (1985) showed that an expanded retrieval sequence (0-1-2-
4) was more eective than massed retrieval, but they did not have the
critical equally spaced condition, and so total spacing was confounded
with condition. Other researchers also compared expanding schedules
Y109937_C002.indd 32 7/15/10 12:45:29 PM
Intricacies of Spaced Retrieval • 33
of retrieval to various other conditions, again usually massed testing or
sometimes expanding study (rather than testing) schedules. ey gen-
erally concluded that the expanding testing schedule was better either
in neurologically impaired patients (e.g., Camp & McKitrick, 1992) or
in healthy adults learning names (e.g., Morris & Fritz, 2002) than were
massed schedules or multiple presentations without testing. However,
the critical equal interval testing condition was not included.
In the rst study since Landauer and Bjork’s original one compar-
ing expanding retrieval to equally spaced retrieval, Shaughnessy and
Zechmeister (1992) were able to replicate Landauer and Bjork and
showed a small positive eect of expanding retrieval over equally
spaced retrieval on a test given soon aer acquisition. However, a few
years later Cull, Shaughnessy, and Zechmeister (1996) obtained quite
mixed evidence across a series of ve experiments. e results were
puzzling, so Cull (2000) followed up this work with a more dedicated
eort. Without going into the details of all the experiments (see Balota
et al., 2007), suce it to say that Cull found no evidence that expand-
ing retrieval schedules provided any benet to recall relative to equal
interval schedules (although both led to better performance than did
massed testing schedules). Carpenter and DeLosh (2005) also showed
no superiority of expanding to equal interval training. In fact, the trend
(during both acquisition and retention phases) was for the equal inter-
val condition to be superior.
Balota, Duchek, Sergent-Marshall, and Roediger (2006) mounted a
study with large numbers of young adults, healthy older adults, and
other older adults with early-stage Alzheimer’s disease. Because the
subjects had widely dierent memory abilities, Balota et al. began all
subjects with two massed tests of paired associates to ensure subjects
had encoded the material well before implementing further massed,
spaced, or equal interval schedules. us, all subjects received ve tests
aer a single presentation with the following schedules: 0-0-0-0-0, 0-0-
3-3-3, or 0-0-1-3-5. During acquisition, massed testing produced essen-
tially perfect performance in all subject groups, whereas the expanding
condition led to greater performance on the last test than did the equal
interval condition. Because expanding retrieval led to better perfor-
mance during learning, one might expect this benet to carry forward
to the nal criterial test at the end of the session. However, this did not
happen. Despite the fact that spaced retrieval produced much greater
nal recall than did massed retrieval for all three groups of subjects,
expanding retrieval was not better than equal interval retrieval in
any of the groups. us, once again, no evidence was found support-
Y109937_C002.indd 33 7/15/10 12:45:29 PM
34 • Henry L. Roediger III and Jerey D. Karpicke
ing Landauer and Bjork’s hypothesis that expanding retrieval could
“shape” later recall.
Two other points are worth making about the Balota et al. (2006)
results. First, in the massed condition, subjects were tested on and
successfully recalled all items. Thus, there were five successful
retrievals under conditions that fostered errorless retrieval, thought
on some accounts to be optimal for later performance (because sub-
jects never make an error or draw a blank). However, this massed
condition produced the worst performance on the final test, prob-
ably because the retrievals were effortless and shallow (Bjork, 1975).
The second point is more subtle: Recall that at the end of learning,
the expanding retrieval condition produced higher performance
than the equal interval condition, yet on the delayed test, the two
conditions were equivalent. What this pattern must indicate is
that forgetting occurs more rapidly after expanding retrieval than
after equal interval retrieval. In fact, this same pattern occurred in
Landauer and Bjork’s (1978) original study. In almost all the experi-
ments discussed thus far, the final criterial test was at the end of one
experimental session (but see Cull, 2000). The pattern of differen-
tial forgetting between conditions suggests that, with much longer
retention intervals, there may be a reversal—retention may actu-
ally be better following equal interval retrieval practice relative to
expanding retrieval practice. We consider designs with such delays
in the next section.
the Mystery oF expanding retrieval praCtiCe
and its viCissitudes: a partial solution
At this point in the chapter, the reader is rightfully confused.
Landauer and Bjork (1978) found that expanding retrieval is supe-
rior to massed or equal interval retrieval, and their nding accords
well with other ideas in the learning and memory literature, such
as shaping and errorless retrieval. Although their conclusion about
expanding retrieval was accepted for many years (and all studies
show that it is superior to massed retrievals), evidence since the mid-
1990s paints a mixed picture. Why? We attempt to answer that ques-
tion in this section by relying on two related concepts championed
by Bob Bjork.
Recently Bjork (1999) has advocated an important and counterin-
tuitive idea about the relation between initial learning performance
and long-term retention. ere are many instances where the rate
Y109937_C002.indd 34 7/15/10 12:45:29 PM
Intricacies of Spaced Retrieval • 35
and level of initial learning is very good relative to some other con-
dition, yet these seemingly benecial conditions ultimately produce
poor long-term retention as assessed on delayed tests (again, relative to
a companion condition in which learning was slower). Stated another
way, conditions that make initial learning slower and more dicult
might produce worse initial learning performance but lead to gains in
long-term retention. Bjork has called this the idea of creating “desirable
diculties” to promote learning, and he has gathered a variety of evi-
dence supporting this concept (see Bjork, 1999; Schmidt & Bjork, 1992).
Some diculty that makes initial learning slower and more eortful
can make long-term retention better.
An example of desirable diculties relevant to this chapter is the
spacing eect: When repeated presentations are massed together, they
oen produce better performance on an immediate test (one soon aer
the second presentation) than does spacing the presentations (Peterson,
Wampler, Kirkpatrick, & Saltzman, 1963). However, as is well known,
spaced repetition produces better retention on delayed criterial tests than
does massed practice (see Figure 2.1). is spacing × retention interval
interaction for studied materials is both replicable and important (see
Balota, Duchek, & Paullin, 1989; Balota et al., 2007). e same pattern
occurs if we consider spaced retrieval practice. Performance is essentially
perfect on massed repeated tests (e.g., with a 0-0-0 schedule) and will
be better than performance on equally spaced tests because forgetting
will have occurred before the rst retrieval attempt (e.g., with a 5-5-5
schedule). Yet invariably the spaced retrieval conditions produce better
performance on delayed retention tests than does massed retrieval.
In short, Bjork’s key point from the concept of desirable diculties is
that performance during initial learning is not necessarily diagnostic of
long-term retention. is fact has profound implications for education
and other training scenarios, because instructors oen use initial learn-
ing performance as the metric by which they evaluate the eectiveness
of learning and training activities. ey rarely test performance long
aer the learning episode to determine what is retained.
Returning to the focus of this chapter—schedules of retrieval prac-
tice—an expanding retrieval condition is bound to perform better dur-
ing the initial learning phase than an equally spaced condition. at
is, subjects are likely to recall more items in an expanding condition
than in an equally spaced condition because the rst retrieval attempt
occurs soon aer study in the expanding condition. In most experi-
ments on spaced retrieval, subjects are not given feedback aer each
test, but there is also very little (if any) forgetting across tests aer the
rst one. erefore, the position of the rst test determines the level of
Y109937_C002.indd 35 7/15/10 12:45:30 PM
36 • Henry L. Roediger III and Jerey D. Karpicke
performance on subsequent tests. If 80% of items are recalled on test 1,
then approximately 80% will be recalled on repeated tests. If 60% are
recalled on test 1, then about 60% will be recalled on repeated tests.
And so on. is fact is independent of the schedule of repeated tests and
is apparent in Landauer and Bjork’s (1978) data (Figure 3 in their paper)
and in other experiments, too. e dierence in level of performance
across conditions is entirely due to the position of the rst test. Yet the
surprising nding is that the forgetting rate seems faster in the expand-
ing than in the equally spaced condition. is is indicated in studies
where there are large advantages of expanding relative to equally spaced
conditions during initial learning, but no dierences between the con-
ditions on retention tests given at the end of the experimental session
(e.g., Balota et al., 2006). Again, the same pattern can also be seen in
Landauer and Bjork’s data (their Figure 3).
Does the concept of desirable diculties help explain the puzzling
eects of retrieval practice schedules? at is, does expanding retrieval
promote good performance during initial learning (greater retrieval
success than equally spaced schedules) but result in relatively poor
long-term retention? A number of recent experiments have addressed
this question and suggested that the answer is yes.
We carried out a series of experiments in which subjects learned dif-
cult vocabulary words under a variety of spaced retrieval conditions
(Karpicke & Roediger, 2007a). We examined massed (0-0-0), expanding
(1-5-9), and equally spaced (5-5-5) conditions, and we also included two
conditions in which subjects took just a single test during initial learn-
ing: e single test occurred either aer a lag of one trial or aer a lag
of ve trials. e latter two conditions are conceptually similar to those
used by Whitten and Bjork (1977) and others (e.g., Jacoby, 1978). e
critical aspect of the experiment was that we manipulated the retention
interval that occurred between the initial learning phase and the nal
criterial test: Half of the subjects took the nal test at the end of the
experimental session (about 10 minutes aer the initial learning phase)
and half took the nal test 2 days later.
Figure 2.4 shows the proportion of word pairs recalled on the nal
tests in each spacing condition at the two dierent retention intervals.
First, it is worth pointing out that at both retention intervals the spaced
retrieval conditions (expanding and equal interval) led to better recall
than did massed retrieval. e le panel of Figure 2.4 provides recall
on the nal test that occurred shortly aer learning, and the data show
an advantage of expanding retrieval relative to equal spacing. is
outcome replicates Landauer and Bjork’s (1978) original nding and
is due to greater retrieval success during the learning phase, because
Figure 2.4 Final recall as a function of various schedules of retrieval practice. The left panel
shows final recall 10 minutes after the learning phase, and the right panel shows final recall 2
days after the learning phase. Expanding retrieval (1-5-9) produced a short-term benefit relative to
equally spaced retrieval (5-5-5), but equally spaced retrieval produced better long-term retention
than expanding retrieval. (Data adapted from Karpicke & Roediger, 2007, Experiment 1.)
Y109937_C002.indd 36 7/15/10 12:45:30 PM
Intricacies of Spaced Retrieval • 37
the expanding condition recalled more items initially than the equally
spaced condition. However, two days aer learning the pattern had
reversed: Now the equally spaced condition produced better long-term
retention than expanding retrieval.
Note that a similar interaction occurred when considering just the
single-test conditions: A single test aer a short delay during acqui-
sition (one intervening item) produced better recall than a single test
aer a somewhat longer delay (ve intervening items) both during
acquisition and on the immediate test, but on the test given two days
later, the single, more eortful initial test (the one aer ve intervening
items) led to better retention than the easier initial test (the one given
aer one item).
Another feature of the data in Figure 2.4 documents the fact that
giving several tests under conditions that are too easy undermines the
positive eects of testing. In the 0-0-0 condition subjects were required
to recall items three times under conditions in which they were essen-
tially always correct. However, these three (easy) retrievals led to later
retention that was even worse than a single test given under more dif-
cult conditions (the ve conditions at both delays).
0.0 150-0-0 1-5-9 5-5-5
Spacing Condition
150-0-0 1-5-9 5-5-5
0.1
Proportion Recalled
0.2
0.3
0.4
0.5
0.6
0.7 0.65
0.57
0.47
0.71
0.62
2 Day DelayImmediate
0.22 0.20
0.30 0.33
0.45
0.8
Figure 2.4 Final recall as a function of various schedules of retrieval practice. The left panel
shows final recall 10 minutes after the learning phase, and the right panel shows final recall 2
days after the learning phase. Expanding retrieval (1-5-9) produced a short-term benefit relative to
equally spaced retrieval (5-5-5), but equally spaced retrieval produced better long-term retention
than expanding retrieval. (Data adapted from Karpicke & Roediger, 2007, Experiment 1.)
AU: Please indicate 2007a
or 2007b, and then give
full reference info in
source line.
Y109937_C002.indd 37 7/15/10 12:45:30 PM
38 • Henry L. Roediger III and Jerey D. Karpicke
Lo gan a nd B alo ta (2 008) also rec ently c onducted an ex per iment e xam-
ining the eects of expanding and equally spaced retrieval schedules at
short and long retention intervals. ey tested both younger and older
adults and examined several dierent spacing schedules. e subjects in
their experiments learned weakly associated word pairs under dierent
schedules and took a nal test either at the end of the experimental ses-
sion (immediate) or one day later. e results are shown in Figure 2.5.
Overall, Logan and Balota did not nd a consistent advantage of expand-
ing retrieval over equally spaced retrieval in either subject group at either
retention interval. In fact, they found that equally spaced retrieval was
oen better than expanding retrieval on the delayed nal test.
0.0
0.2
0.4
Proportion Recalled
0.6
0.8 0.75
Immediate1 Day Delay
1-2-3 vs. 2-2-2
0.75
Expanding Equal
0.46
YoungerYoungerOlder Older
0.55 0.50
0.60
0.16 0.20
1.0
0.0
0.2
0.4
Proportion Recalled
0.6
0.8 0.73
Immediate1 Day Delay
1-3-5 vs. 3-3-3
0.74
0.55
YoungerYoungerOlderOlder
0.52 0.51 0.52
0.11
0.23
1.0
1-3-8 vs. 4-4-4
0.0
0.2
0.4
Proportion Recalled
0.6
0.8 0.78
Immediate1 Day Delay
0.75
0.50
YoungerYoungerOlder Older
0.61
0.51
0.61
0.23 0.20
1.0
Figure 2.5 Final recall after expanding or equally spaced retrieval practice on immediate or one-
day delayed tests. The figure shows results for both younger and older adults. The top, middle, and
bottom panels show performance for different expanding and equally spaced schedules that are
matched in total spacing. No advantage of expanding retrieval was evident, and equally spaced
retrieval often produced better final recall than expanding retrieval on the one-day delayed test.
(Data adapted from Logan, J. M., & Balota, D. A., Aging, Neuropsychology, and Cognition, 15,
257–280, 2008.)
Y109937_C002.indd 38 7/15/10 12:45:31 PM
Intricacies of Spaced Retrieval • 39
e Karpicke and Roediger (2007a) and Logan and Balota (2008)
results might seem strange given the belief that expanding retrieval is
supposed to improve long-term retention. But the ndings are consis-
tent with Bjork’s concept of learning tasks that produce desirable di-
culties. e desirable condition, however, is the equally spaced retrieval
schedule, not expanding retrieval.
is pattern of results must force us to reconsider the theory about
why expanding retrieval ought to work. e standard theory of expand-
ing retrieval practice is that the schedule combines the positive features
of retrieval success and retrieval diculty. Of course, dicult retrieval
is important, but unless subjects are given feedback (and they are not
in most spaced retrieval studies), retrieval practice can only promote
learning when a person is able to successfully recover the desired item.
erefore, expanding retrieval is thought to work in part because the
early rst retrieval promotes retrieval success and, as noted above, this
determines the level of performance on repeated tests. Retrieval dif-
culty comes into play because it is assumed that gradually increasing
the spacing of repeated tests should increase retrieval diculty on the
tests. However, Karpicke and Roediger (2007a) and Logan and Balota
(2008) examined response times on tests during initial learning and
showed that retrieval grew increasingly faster across repeated tests. is
does not accord with the idea that retrieval grew increasingly dicult
across tests regardless of the schedule of repeated tests.
e alternative hypothesis we have proposed is that the position of the
rst test is the important diculty for improving long-term retention,
not the schedule of repeated tests (see Karpicke & Roediger, 2007a). In
expanding retrieval conditions, the rst retrieval attempt oen occurs
almost immediately aer studying the item (lags of zero or one trial).
is retrieval attempt might not be eective because retrieval occurs
while items still reside in immediate memory. erefore, equally spaced
retrieval practice might enhance retention because that schedule involves
a delayed rst test (e.g., a lag of ve trials between study and a rst test).
e crux of the problem in virtually all compa risons of expanding and
equal interval retrieval is that the position of the rst retrieval attempt
is confounded with the schedule of repeated tests. Expanding retrieval
conditions involve an immediate rst test (e.g., 1-5-9), and equally
spaced conditions involve a delayed rst test (e.g., 5-5-5). We conducted
an experiment that eliminated this confound (Karpicke & Roediger,
2007a, Experiment 3; see too Carpenter & DeLosh, 2005). Two condi-
tions involved an immediate rst test (aer a lag of zero trials), and two
involved a delayed rst test (aer a lag of ve trials). en the repeated
tests were either expanding (1-5-9) or equal (5-5-5). e results are shown
Y109937_C002.indd 39 7/15/10 12:45:31 PM
40 • Henry L. Roediger III and Jerey D. Karpicke
in Figure 2.6. When we controlled for the position of the rst test, the
advantage of expanding retrieval practice disappeared on an immediate
nal test (cf. Carpenter & DeLosh, 2005) and there was no dierence as
a function of placement of the rst test (0 or 5). However, on the test two
days later, an overall advantage of the two conditions with a delayed rst
test (5-1-5-9 and 5-5-5-5) appeared (relative to the conditions in which
the rst test was immediate). us, in delayed recall, the eect of position
of the rst test mattered, but the schedule of repeated tests (expanding or
equally spaced) did not have any eect. is result falls perfectly in line
with the results of Whitten and Bjork (1977) and accords with Bjork’s
(1975) notion that dicult retrieval is critical for promoting learning, but
once again, it does not support the idea that expanding retrieval is the
best schedule of retrieval practice for long-term retention.
We end this section by describing an experiment that explored the
eects of dierent schedules of retrieval on learning educational texts.
Landauer and Bjork’s (1978) original study was focused on a rather
specic applied scenario: learning faces and names when it is inappro-
priate or impossible to receive feedback aer an initial presentation.
0.8
0.7
0.6
0.5
0.4
Proportion Recalled
0.3
0.43
0.61
0.63
0.66
Immediate2 Day Delay
0.62
0.45
0.51 0.52
0.2
Spacing Condition
0-1-5-9 0-5-5-5 5-5-5-55-1-5-9 0-1-5-9 0-5-5-5 5-5-5-55-1-5-9
Figure 2.6 Final recall as a function of schedule of retrieval practice. The left panel shows final
recall 10 minutes after the learning phase, and the right panel shows final recall 2 days after the
learning phase. The four retrieval schedules factorially crossed the position of the first test (lags
of 0 or 5) with the schedule of repeated tests (1-5-9 or 5-5-5). There was no effect of schedule on
immediate final tests, but there was a main effect of delaying the first test on the delayed final tests.
(Data adapted from Karpicke, J. D., & Roediger, H. L., Journal of Memory and Language, 57, 151–162,
2007a, Experiment 3.)
Y109937_C002.indd 40 7/15/10 12:45:32 PM
Intricacies of Spaced Retrieval • 41
e idea of expanding retrieval practice emerged from this study, and
subsequently the argument was made that expanding retrieval was a
general technique that could be applied broadly. e data reviewed here
suggest that expanding retrieval might not represent the best retrieval
schedule for promoting long-term retention, but as of yet there have
been few tests of the idea that expanding retrieval might apply broadly
to materials and contexts that are more educationally relevant than
those used in paired-associate learning tasks. Perhaps when taken out
of the context of paired-associate learning, an advantage of expanding
retrieval would become apparent.
To address this question, we examined free recall of brief exposi-
tory texts (Karpicke & Roediger, 2010). Subjects read brief texts and
recalled them on free recall tests spaced according to dierent sched-
ules. In both experiments we factorially crossed the position of the rst
test (immediate or delayed) and the spacing of repeated tests (expand-
ing or equal interval). We examined the eects of the dierent retrieval
practice schedules on a nal criterial test one week aer learning.
Figure 2.7 shows several important results. First, there is a testing
eect: taking a single test aer reading a text enhanced long-term reten-
tion more than reading the text and not testing. Second, repeated testing
(in the spaced retrieval conditions) enhanced retention more than tak-
ing a single test. ird, testing with feedback (restudying the passages)
produced better retention than testing without feedback. However, and
most importantly for our purposes, there were no dierences between
expanding and equally spaced schedules of retrieval practice.
In sum, the body of evidence indicating that expanding retrieval
practice is not benecial (relative to equal interval practice) is growing.
If anything, equal interval schedules seem to produce better retention
on delayed tests, probably because the initial test is rendered more dif-
cult when it does not occur immediately aer study, as is the case in
expanding schedules of retrieval. e diculty of the initial retrieval
seems to hold the key to performance in experiments of this kind. e
subsequent schedule of retrieval practice seems to have little eect
under conditions examined thus far.
praCtiCa l adviCe
What advice might we give students about how to apply the research
on testing reviewed in this chapter? We think the answer is straight-
forward: Students should determine the knowledge they want to retain,
create a testing mechanism with feedback, and test themselves until
they can retrieve the information on a much-delayed test (say, two days
Y109937_C002.indd 41 7/15/10 12:45:32 PM
42 • Henry L. Roediger III and Jerey D. Karpicke
0.0
0.1
0.2
0.22
Spacing Condition
Feedback
No Feedback
Study Single
Test
0-1-2-3 0-2-2-2 2-1-2-3 2-2-2-2
Spacing Condition
Study Single
Test
0-1-2-3 0-2-2-2 2-1-2-3 2-2-2-2
0.35
0.55 0.53
0.45
0.52
0.3
0.4
Proportion Recalled
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2 0.20
0.40
0.80 0.78
0.84 0.83
0.3
0.4
Proportion Recalled
0.5
0.6
0.7
0.8
0.9
1.0
Figure 2.7 Final recall of expository texts as a function of initial retrieval practice schedule. The
top panel shows performance without initial feedback, and the bottom panel shows performance with
feedback (students reread the texts after each recall test). Taking a single test enhanced retention
relative to reading once, and repeated testing produced even greater effects on retention. Feedback
also enhanced long-term retention. However, the schedule of retrieval did not matter. (Data adapted
from Karpicke, J. D., & Roediger, H. L., Memory and Cognition, 38, 116–124, 2010, Experiment 2.)
Y109937_C002.indd 42 7/15/10 12:45:32 PM
Intricacies of Spaced Retrieval • 43
since original study). e testing should not be done under massed or
even closely spaced fashion; if the literature is clear on any point, it is
that repeated testing under conditions in which retrieval is easy leads to
p oo r l on g- te rm r et en t io n. (S o mu ch fo r th e pr in c ip le of e rr or l es s r e tr i ev a l
being a good way to study.) But what about the mechanism for spacing
of retrieval? Our data reviewed above suggest that the critical ingredi-
ent is encouraging fairly dicult retrieval, especially on an initial test.
Beyond that point, it probably does not matter whether students test
themselves using expanding or equal interval conditions. What matters
is repeated spaced retrieval (with feedback if an error is made).
Let us consider a practical example. A h grade student needs to
learn the capitals of the 50 states. She creates ash cards for each state
with, for example, Montana on one side and Helena on the other. e 50
ashcards would rst be studied one at a time, perhaps employing some
mnemonic (my aunt Helen was from Montana). Aer this initial study,
the cards are shued and then ten minutes later the student gives her-
self a test, looking at the name of each state and trying to remember the
capital. Whether or not she produces a name, she turns the card over
to study the reverse side (see Butler, Karpicke, & Roediger, 2008). Any
items missed are put at the end of the deck for further practice in the
same session. She records the number correct on the rst pass through
and then returns to test herself again on the ones she missed, again with
feedback. Aer this phase, the student puts the cards away and stud-
ies other material. en, hours later, she returns to the cards and tests
herself in the same way. is process would be repeated the next day
and then sporadically thereaer, as needed. Each time the deck would
be shued anew. With spacing between retrievals spread over days, the
whole issue of schedule of individual state-capital pairs within a session
would not need to be much considered. Of course, the spacing of entire
testing/relearning sessions would then be of interest.
One critical point about the foregoing advice: students should not
trust their own intuitions about what they know and quit testing them-
selves too soon. Just because Helena can be retrieved a time or two
does not mean that it is in a “learned” state. Students need to practice
retrieval even of learned information (Karpicke & Roediger, 2008).
e technique just described can be applied to nearly any sort of
factual material—scientic concepts, the critical points of important
journal articles, the presidents of the United States and their main
accomplishments and events while they were in oce, and so on. e
title of one of our articles is “Repeated Retrieval During Learning Is the
Key to Long-Term Retention” (Karpicke & Roediger, 2007a), and we
believe more rmly than ever that this is the case.
Y109937_C002.indd 43 7/15/10 12:45:33 PM
44 • Henry L. Roediger III and Jerey D. Karpicke
ConClusion
We began the chapter by noting how Bob and Elizabeth Bjork’s work
had, over the years, pointed to several apparent paradoxes (or at least
nonintuitive ndings). We explored several paradoxes and applied their
(and our) analyses to the issue of the best way of practicing retrieval over
relatively short intervals, such that testing can be used to best advan-
tage. All studies show that repeated massed retrieval is poor, despite its
errorless nature. Bjork (1975) has argued this was true based on data
then available. However, the mystery of whether expanding or equal
interval retrieval leads to better long-term retention turns out to rest
on a similar consideration. When retention is measured at a healthy
delay (say two days or one week aer learning), delayed recall is bet-
ter following equal interval practice because (in the usual design) the
rst retrieval in the equal interval design occurs under more dicult
retrieval conditions. us, expanding retrieval turns out to exemplify
the Bjorkian principle of a desirable diculty—although initial recall
is poorer with equal interval schedules relative to expanding schedules,
long-term retention is better.
Our results provide a resolution of claims in the literature: Landauer
and Bjork’s (1978) results can be replicated at short retention intervals
(when testing occurs in the same experimental session as acquisi-
tion). However, aer longer retention intervals (two days or a week in
our experiments), the situation reverses: Equal interval schedules of
retrieval practice in an initial learning session produce better reten-
tion than expanding schedules of retrieval practice. We suggest in
the preceding section on practical applications that, so long as one
uses sessions of spaced retrieval practice with feedback, the question
of expanding or equal interval schedules within a session may well
be moot. Spaced retrieval practice (with feedback) is the key to long-
term retention.
Even though Landauer and Bjork’s (1978) important claim about
expanding retrieval turns out 30 years later to be limited (or even
wrong), the reasons for this state of aairs are accounted for by Bjork’s
other research and theorizing (Bjork, 1975; Bjork & Bjork, 1994). Even
when Bob Bjork seems to be wrong in one arena, he turns out to have
been right all along.
reFerenCes
Abbott, E. E. (1909). On the analysis of the factors of recall in the learning pro-
cess. Psychological Monographs, 11, 159–177.
AU: Only Bjork in refer-
ences (one author).
Y109937_C002.indd 44 7/15/10 12:45:33 PM
Intricacies of Spaced Retrieval • 45
ConClusion
We began the chapter by noting how Bob and Elizabeth Bjork’s work
had, over the years, pointed to several apparent paradoxes (or at least
nonintuitive ndings). We explored several paradoxes and applied their
(and our) analyses to the issue of the best way of practicing retrieval over
relatively short intervals, such that testing can be used to best advan-
tage. All studies show that repeated massed retrieval is poor, despite its
errorless nature. Bjork (1975) has argued this was true based on data
then available. However, the mystery of whether expanding or equal
interval retrieval leads to better long-term retention turns out to rest
on a similar consideration. When retention is measured at a healthy
delay (say two days or one week aer learning), delayed recall is bet-
ter following equal interval practice because (in the usual design) the
rst retrieval in the equal interval design occurs under more dicult
retrieval conditions. us, expanding retrieval turns out to exemplify
the Bjorkian principle of a desirable diculty—although initial recall
is poorer with equal interval schedules relative to expanding schedules,
long-term retention is better.
Our results provide a resolution of claims in the literature: Landauer
and Bjork’s (1978) results can be replicated at short retention intervals
(when testing occurs in the same experimental session as acquisi-
tion). However, aer longer retention intervals (two days or a week in
our experiments), the situation reverses: Equal interval schedules of
retrieval practice in an initial learning session produce better reten-
tion than expanding schedules of retrieval practice. We suggest in
the preceding section on practical applications that, so long as one
uses sessions of spaced retrieval practice with feedback, the question
of expanding or equal interval schedules within a session may well
be moot. Spaced retrieval practice (with feedback) is the key to long-
term retention.
Even though Landauer and Bjork’s (1978) important claim about
expanding retrieval turns out 30 years later to be limited (or even
wrong), the reasons for this state of aairs are accounted for by Bjork’s
other research and theorizing (Bjork, 1975; Bjork & Bjork, 1994). Even
when Bob Bjork seems to be wrong in one arena, he turns out to have
been right all along.
reFerenCes
Abbott, E. E. (1909). On the analysis of the factors of recall in the learning pro-
cess. Psychological Monographs, 11, 159–177.
Anderson, M. C., Bjork, E. L., & Bjork, R. A. (1994). Remembering can cause for-
getting: Retrieval dynamics in long-term memory. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 20, 1063–1087.
Balota, D. A., Duchek, J. M., & Logan, J. M. (2007). Is expanded retrieval practice
a superior form of spaced retrieval? A critical review of the extant litera-
ture. In J. S. Nairne (Ed.), e foundations of remembering: Essays in honor
of Henry L. Roediger, III (pp. 83–105). New York, NY: Psychology Press.
Balota, D. A., Duchek, J. M., & Paullin, R. (1989). Age-related dierences in
the impact of spacing, lag, and retention interval. Psychology and Aging,
4, 3–9.
Balota, D. A., Duchek, J. M., Sergent-Marshall, S. D., & Roediger, H. L. (2006).
Does expanded retrieval produce benets over equal interval spac-
ing? Explorations in healthy aging and early stage Alzheimer’s disease.
Psychology and Aging, 21, 19–31.
Bjork, R. A. (1970). Positive forgetting: e noninterference of items inten-
tionally forgotten. Journal of Verbal Learning and Verbal Behavior, 9,
255–268.
Bjork, R. A. (1975). Retrieval as a memory modier: An interpretation of nega-
tive recency and related phenomena. In R. L. Solso (Ed.), Information
processing and cognition: e Loyola symposium (pp. 123–144). Hillsdale,
NJ: Erlbaum.
Bjork, R. A. (1994). Memory and metamemory considerations in the training
of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition:
Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press.
Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In
D. Gopher & A. Koriat (Eds.), Attention and performance XVII. Cognitive
regulation of performance: Interaction of theory and application (pp. 435–
459). Cambridge, MA: MIT Press.
Bjork, R. A., & Allen, T. W. (1970). e spacing eect: Consolidation or dif-
ferential encoding? Journal of Verbal Learning and Verbal Behavior, 9,
567–572.
Bjork, R. A., LaBerge, D., & LeGrande, R. (1968). e modication of short-
term memory through instructions to forget. Psychonomic Science, 10,
55–56.
Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2008). Correcting a meta-
cognitive error: Feedback increases retention of low condence correct
responses. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 34, 218–928.
Camp, C. J., & McKitrick, L. A. (1992). Memory interventions in Alzheimer’s-
type dementia populations: Methodological and theoretical issues. In R. L.
West & J. D. Sinnott (Eds.), Everyday memory and aging: Current research
and methodology (pp. 152–172). New York, NY: Springer-Verlag.
Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing
eects to name learning. Applied Cognitive Psychology, 19, 619–636.
Y109937_C002.indd 45 7/15/10 12:45:33 PM
46 • Henry L. Roediger III and Jerey D. Karpicke
Craik, F. I. M., & Lockhart, R. (1972). Levels of processing: A framework for
memory research. Journal of Verbal Learning and Verbal Behavior, 11,
671–684.
Crowder, R. G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum.
Cull, W. L. (2000). Untangling the benets of multiple study opportunities
and repeated testing for cued recall. Applied Cognitive Psychology, 14,
215–235.
Cull, W. L., Shaughnessy, J. J., & Zechmeister, E. B. (1996). Expanding under-
standing of the expanding-pattern-of-retrieval mnemonic: Toward con-
dence in applicability. Journal of Experimental Psychology: Applied, 2,
365–378.
Gardiner, J. M., Craik, F. I., & Bleasdale, F. A. (1973). Retrieval diculty and
subsequent recall. Memory and Cognition, 1, 213–216.
Glenberg, A. M. (1976). Monotonic and nonmonotonic lag eects in paired-
associate and recognition memory paradigms. Journal of Verbal Learning
and Verbal Behavior, 15, 1–16.
Golding, J. M., & MacLeod, C. M. (Eds.). (1998). Intentional forgetting:
Interdisciplinary approaches. Mahwah, NJ: Lawrence Erlbaum.
Jacoby, L. L. (1978). On interpreting the eects of repetition: Solving a prob-
lem versus remembering a solution. Journal of Verbal Learning and Verbal
Behavior, 17, 649–667.
Jacoby, L. L., & Bartz, W. H. (1972). Rehearsal and transfer to LTM. Journal of
Verbal Learning and Verbal Behavior, 11, 561–565.
Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding
to practice retrieval during learning. Journal of Experimental Psychology:
General, 138, 469–486.
Karpicke, J. D., & Roediger, H. L. (2007a). Repeated retrieval during learning is the
key to long-term retention. Journal of Memory and Language, 57, 151–162.
Karpicke, J. D., & Roediger, H. L. (2007b). Expanding retrieval practice pro-
motes short-term retention, but equally spaced retrieval enhances long-
term retention. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 33, 704–719.
Karpicke, J. D., & Roediger, H. L. (2008). e critical importance of retrieval for
learning. Science, 319, 966–968.
Karpicke, J. D., & Roediger, H. L. (2010). Is expanding retrieval a superior
method for learning text materials? Memory and Cognition, 38, 116–124.
Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name
learning. In M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical
aspects of memory (pp. 625–632). London, England: Academic Press.
Logan, J. M., & Balota, D. A. (2008). Expanded vs. equal interval spaced
retrieval practice: Exploring dierent schedules of spacing and reten-
tion interval in younger and older adults. Aging, Neuropsychology, and
Cognition, 15, 257–280.
Logan, J. M., Roediger, H. L., & McDermott, K. B. (2009). Using spaced retrieval
practice to learn foreign language vocabulary: How does activity during
the interval aect learning? Manuscript in preparation. AU: Please update.
Madigan, S. A. (1969). Intraserial repetition and coding processes in free recall.
Journal of Verbal Learning and Verbal Behavior, 8, 828–835.
Melton, A. W. (1970). e situation with respect to the spacing of repetitions and
memory. Journal of Verbal Learning and Verbal Behavior, 9, 596–606.
Morris, D. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing ver-
sus transfer appropriate processing. Journal of Verbal Learning and Verbal
Behavior, 16, 519–533.
Morris, P. E., & Fritz, C. O. (2002). e improved name game: Better use of
expanding retrieval practice. Memory, 10, 259–266.
Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Eect of
spacing presentations on retention of a paired associate over short inter-
vals. Journal of Experimental Psychology, 66, 206–209.
Rea, C. P., & Modigliani, V. (1985). e eect of expanded versus massed prac-
tice on the retention of multiplication facts and spelling lists. Human
Learning, 4, 11–18.
Robbins, D., & Wise, P. S. (1972). Encoding variability and imagery: Evidence for
a spacing-type eect without spacing. Journal of Experimental Psychology,
95, 229–230.
Roediger, H. L., & Crowder, R. G. (1975). Spacing of lists in free recall. Journal
of Verbal Learning and Verbal Behavior, 14, 590–602.
Roediger, H. L., & Karpicke, J. D. (2006a). e power of testing memory:
Basic research and implications for educational practice. Perspectives on
Psychological Science, 1, 181–210.
Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking
memory tests improves long-term retention. Psychological Science, 17,
249–255.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice:
Common principles in three paradigms suggest new concepts for train-
ing. Psychological Science, 3, 207–217.
Shaughnessy, J. J., & Zechmeister, E. B. (1992). Memory-monitoring accu-
racy as inuenced by the distribution of retrieval practice. Bulletin of the
Psychonomic Society, 30, 125–128.
Skinner, B. F. (1953). Science and human behavior. Oxford, England: Macmillan.
Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology,
30, 641–656.
Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning aer
retrieval-induced forgetting: e benet of being forgotten. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 34,
230–236.
Tulving, E., & omson, D. M. (1973). Encoding specicity and retrieval pro-
cesses in episodic memory. Psychological Review, 80, 352–373.
Tzeng, O. J. (1973). Stimulus meaningfulness, encoding variability, and the
spacing eect. Journal of Experimental Psychology, 99, 162–166.
Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Eects of spacing.
Journal of Verbal Learning and Verbal Behavior, 16, 465–478.
Y109937_C002.indd 46 7/15/10 12:45:33 PM
Intricacies of Spaced Retrieval • 47
Craik, F. I. M., & Lockhart, R. (1972). Levels of processing: A framework for
memory research. Journal of Verbal Learning and Verbal Behavior, 11,
671–684.
Crowder, R. G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum.
Cull, W. L. (2000). Untangling the benets of multiple study opportunities
and repeated testing for cued recall. Applied Cognitive Psychology, 14,
215–235.
Cull, W. L., Shaughnessy, J. J., & Zechmeister, E. B. (1996). Expanding under-
standing of the expanding-pattern-of-retrieval mnemonic: Toward con-
dence in applicability. Journal of Experimental Psychology: Applied, 2,
365–378.
Gardiner, J. M., Craik, F. I., & Bleasdale, F. A. (1973). Retrieval diculty and
subsequent recall. Memory and Cognition, 1, 213–216.
Glenberg, A. M. (1976). Monotonic and nonmonotonic lag eects in paired-
associate and recognition memory paradigms. Journal of Verbal Learning
and Verbal Behavior, 15, 1–16.
Golding, J. M., & MacLeod, C. M. (Eds.). (1998). Intentional forgetting:
Interdisciplinary approaches. Mahwah, NJ: Lawrence Erlbaum.
Jacoby, L. L. (1978). On interpreting the eects of repetition: Solving a prob-
lem versus remembering a solution. Journal of Verbal Learning and Verbal
Behavior, 17, 649–667.
Jacoby, L. L., & Bartz, W. H. (1972). Rehearsal and transfer to LTM. Journal of
Verbal Learning and Verbal Behavior, 11, 561–565.
Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding
to practice retrieval during learning. Journal of Experimental Psychology:
General, 138, 469–486.
Karpicke, J. D., & Roediger, H. L. (2007a). Repeated retrieval during learning is the
key to long-term retention. Journal of Memory and Language, 57, 151–162.
Karpicke, J. D., & Roediger, H. L. (2007b). Expanding retrieval practice pro-
motes short-term retention, but equally spaced retrieval enhances long-
term retention. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 33, 704–719.
Karpicke, J. D., & Roediger, H. L. (2008). e critical importance of retrieval for
learning. Science, 319, 966–968.
Karpicke, J. D., & Roediger, H. L. (2010). Is expanding retrieval a superior
method for learning text materials? Memory and Cognition, 38, 116–124.
Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns and name
learning. In M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical
aspects of memory (pp. 625–632). London, England: Academic Press.
Logan, J. M., & Balota, D. A. (2008). Expanded vs. equal interval spaced
retrieval practice: Exploring dierent schedules of spacing and reten-
tion interval in younger and older adults. Aging, Neuropsychology, and
Cognition, 15, 257–280.
Logan, J. M., Roediger, H. L., & McDermott, K. B. (2009). Using spaced retrieval
practice to learn foreign language vocabulary: How does activity during
the interval aect learning? Manuscript in preparation.
Madigan, S. A. (1969). Intraserial repetition and coding processes in free recall.
Journal of Verbal Learning and Verbal Behavior, 8, 828–835.
Melton, A. W. (1970). e situation with respect to the spacing of repetitions and
memory. Journal of Verbal Learning and Verbal Behavior, 9, 596–606.
Morris, D. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing ver-
sus transfer appropriate processing. Journal of Verbal Learning and Verbal
Behavior, 16, 519–533.
Morris, P. E., & Fritz, C. O. (2002). e improved name game: Better use of
expanding retrieval practice. Memory, 10, 259–266.
Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Eect of
spacing presentations on retention of a paired associate over short inter-
vals. Journal of Experimental Psychology, 66, 206–209.
Rea, C. P., & Modigliani, V. (1985). e eect of expanded versus massed prac-
tice on the retention of multiplication facts and spelling lists. Human
Learning, 4, 11–18.
Robbins, D., & Wise, P. S. (1972). Encoding variability and imagery: Evidence for
a spacing-type eect without spacing. Journal of Experimental Psychology,
95, 229–230.
Roediger, H. L., & Crowder, R. G. (1975). Spacing of lists in free recall. Journal
of Verbal Learning and Verbal Behavior, 14, 590–602.
Roediger, H. L., & Karpicke, J. D. (2006a). e power of testing memory:
Basic research and implications for educational practice. Perspectives on
Psychological Science, 1, 181–210.
Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking
memory tests improves long-term retention. Psychological Science, 17,
249–255.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice:
Common principles in three paradigms suggest new concepts for train-
ing. Psychological Science, 3, 207–217.
Shaughnessy, J. J., & Zechmeister, E. B. (1992). Memory-monitoring accu-
racy as inuenced by the distribution of retrieval practice. Bulletin of the
Psychonomic Society, 30, 125–128.
Skinner, B. F. (1953). Science and human behavior. Oxford, England: Macmillan.
Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology,
30, 641–656.
Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning aer
retrieval-induced forgetting: e benet of being forgotten. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 34,
230–236.
Tulving, E., & omson, D. M. (1973). Encoding specicity and retrieval pro-
cesses in episodic memory. Psychological Review, 80, 352–373.
Tzeng, O. J. (1973). Stimulus meaningfulness, encoding variability, and the
spacing eect. Journal of Experimental Psychology, 99, 162–166.
Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Eects of spacing.
Journal of Verbal Learning and Verbal Behavior, 16, 465–478.
AU: Please cite in text.
Y109937_C002.indd 47 7/15/10 12:45:33 PM
Y109937_C002.indd 48 7/15/10 12:45:33 PM