ArticlePDF Available

Optimising self-regulated study: The benefits-and costs-of dropping flashcards


Abstract and Figures

Self-regulation of study activities is a constant in the lives of students - who must decide what to study, when to study, how long to study, and by what method to study. We investigated self-regulation in the context of a common study method: flashcards. In four experiments we examined the basis and effectiveness of a metacognitive strategy adopted almost universally by students: setting aside (dropping) items they think they know. Dropping has a compelling logic - it creates additional opportunities to study undropped items - but it rests on two shaky foundations: students' metacognitive monitoring and the value they assign to further study. In fact, being allowed to drop flashcards had small but consistently negative effects on learning. The results suggest that the effectiveness of self-regulated study depends on both the accuracy of metacognitive monitoring and the learner's understanding, or lack thereof, of how people learn.
Content may be subject to copyright.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
Optimising self-regulated study: The benefits
of dropping flashcards
Nate Kornell and Robert A. Bjork
University of California, Los Angeles, CA, USA
Self-regulation of study activities is a constant in the lives of students*who must decide what to study, when
to study, how long to study, and by what method to study. We investigated self-regulation in the context of a
common study method: flashcards. In four experiments we examined the basis and effectiveness of a
metacognitive strategy adopted almost universally by students: setting aside (dropping) items they think
they know. Dropping has a compelling logic*it creates additional opportunities to study undropped
items*but it rests on two shaky foundations: students’ metacognitive monitoring and the value they assign
to further study. In fact, being allowed to drop flashcards had small but consistently negative effects on
learning. The results suggest that the effectiveness of self-regulated study depends on both the accuracy of
metacognitive monitoring and the learner’s understanding, or lack thereof, of how people learn.
Self-regulated learning involves any number of
decisions, such as whether one has memorised a
word pair, or mastered Rachmaninov’s notor-
iously difficult piano concerto number 3, and
whether to test oneself or not. Research on self-
regulated lear ning has mainly focused on two
variables: the amount of time people spend on a
given item, and the likelihood that they will
choose to study an item at all (Metcalfe & Kornell,
2005). The current experiments represent an
attempt to inves tigate another common study
decision*whether it is time to stop studying.
Perhaps no memorisation technique is more
widely used than flashcards, especially during
homework. When people study with flashcards,
they often ‘‘drop’’*that is, put aside and stop
studying*items they think they know. Dropping
items that seem well learned has a compelling
logic: It creates more opportunities for the
remaining items to be studied and, in fact, the
best-selling flashcards available on the internet (a
set of GRE flashcards) is specially designed and
marketed to encourage dropping.
How effective, though, is the dropping strat-
egy? One potential problem is that dropping
items relies on metacognitive monitoring, which
can be flawed, as well as one’s understanding, or
lack thereof, of the value of future study oppor-
tunities. Another is that dropping items changes
the subsequent sequencing of events, including
the spacing of repetitions of items that are not
dropped. The goal of the flashcard-inspired ex-
periments we report is to clarify the memory and
metamemory processes and consequences that
characterise self-regulated study.
Self-regulated study relies on two basic aspects of
metacognition: making judgements about one’s
2007 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business
Address correspondence to: Nate Kornell, Department of Psychology, University of California, Los Angeles, 1285 Franz Hall,
Los Angeles, CA 90095-1563, USA. E-mail:
Grant 29192G from the McDonnell Foundation supported this research. We thank Bridgid Finn, Matt Hays, Jason Finley, and
Dan Fink for their help in running the experiments, and Mark A. McDaniel and Robert M. Nosofsky for their comments on an
earlier draft of this article.
MEMORY, 2008, 16 (2), 125!136 DOI:10.1080/09658210701763899
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
learning and memory (monitoring) and using
those judgements to guide study behaviour (con-
trol) (Nelson & Narens, 1994). Errors in either
aspect can lead to ineffective study decisions.
Metacognitive monitoring
Metacognitive judgements are made based on a
variety of cues (e.g., Koriat, 1997), such as
retrieval fluency (e.g., Benjamin, Bjork &
Schwartz, 1998; Kelley & Lindsay, 1993), cue
familiarity (e.g., Metcalfe, Schwartz & Joaquim,
1993; Reder & Ritter, 1992), and the success (or
lack thereof) of previous retrieval attempts (e.g.,
Dunlosky & Nelson, 1992; Spellman & Bjork,
The accuracy of metacognitive monitoring is
measured in two ways*by resolution, which is
high if better-learned items are given relatively
high ratings, and calibration, which is high to the
degree that people’s predicted recall levels match
their actual recall levels. Both types of monitoring
accuracy can affect study choices: Poor resolution
can lead people to prioritise the wrong items;
poor calibration, particularly overconfidence, can
lead to too much dropping and too little studying.
The type of monitoring required in the current
experiments*judgements of learning (JOLs)*
can be unreliable, but when participants are
allowed (or required) to test themselves, JOLs
have been shown to be quite accurate*in terms
of both resolution (e.g., Dunlosky & Nelson,
1992) and calibration (e.g., Koriat, Ma’ayan,
Sheffer, & Bjork, 2006). With respect to dropping
an item, therefore, a virtue of flashcar ds is that
self-testing is intrinsic to the test/study nature of
flashcard practice.
Metacognitive control
Metacognitive monitoring is useful only if
coupled with an effective control strategy. That
is, learners must decide, given their monitoring,
which items will profit most from additional
study. The Region of Proximal Learning (RPL)
model of study-time allocation, for example, says
that learners should give priority to items that are
close to being learned, not those already learned,
or those too difficult to learn (e.g., Kornell &
Metcalfe, 2006; Metcalfe & Kornell, 2003). In the
context of flashcards, people adopting the RPL
strategy should dro p cards they think they already
know, as well as cards they believe they cannot
Whether or not to drop an item is a deceptively
complex decision. With a fixed amount of time to
study, dropping an item leave s more time for the
remaining items to be studied, but any particular
item can always be dropped the next time around.
Participants must therefore decide which has
more value: studying the current item one addi-
tional time (at least), or dropping it in favour of
preserving one (or more) additional opportunity
to study some other item before the end of the
allotted time. According to the RPL idea, the
value of studying is highes t for items that are
closest to being learned, making it critical to
guard against dropping items too quickly.
A student who focuses too much on learning
the most difficult flashcards is in danger of
dropping easier items too soon. Study decisions
depend on a student’s goals (Dunlosky & Theide,
1998). Unlike students who set easily achievable
goals*who tend to choose relatively easy materi-
als to study (Dunlosky & Thiede, 2004; Thiede &
Dunlosky, 1999)*students who believe they can
master all of the too-be-learned materials typi-
cally focus on the most difficult materials (see
Son & Metcalfe, 2000, for a review). A strong
focus on difficult flashcards translates to a strong
desire to drop easy flashcards*even if doing so
means jeopardising one’s ab ility to recall the easy
ones later.
In addition to the perils of selecting an effective
dropping strategy, there are a number of draw-
backs to dropping items from study that students
may not be aware of. One is that spacing, as
opposed to massing, study opportunities on a
given item has been shown many times to have
large benefits for memory (e.g., Cepeda, Pashler,
Vul, Wixted & Rohrer, 2006). Dropping items has
the possible drawback that it decreases the
spacing of the repetitions of the remaining items.
Participants are unlikely to appreciate this
subtlety, given that they sometimes rate spaced
practice as less effective than massed practice
(e.g., Baddeley & Longman, 1978; Simon &
Bjork, 2001).
A second consideration is that dropping also
undermines the positive effects of overlearning*
that is, continuing to study an item one already
knows (Christina & Bjork, 1991; but see also
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
Rohrer, Taylor, Pashler, Wixted, & Cepeda, 2005,
for evidence that the effects of overlearning
diminish with time). Karpicke an d Roediger
(2007) have shown that there are tremendous
benefits to restudying something one already
knows, if the restudy takes the form of a test. In
their experiment, after participants answered an
item correctly once it was dropped, either from
future presentations or from future tests. Drop-
ping correct items from future presentations had
negligible effects, but dropping them from future
tests had dramatic and negative effects on long-
term retention. Because flashcards involve test-
ing, the implication of such findings is that
dropping flashcards after only one successful
recall attempt might be an extraordinarily bad
In Karpicke and Roediger’s (2007) research, it
was virtually impossible for dropping items to
have positive effects, because an item being
dropped simply resulted in less study time overal l.
In research by Pyc and Rawson (in press), on the
other hand, dro pping one item allowe d partici-
pants to spend more time on other items (as in the
experiments reported here). In that situation,
equivalent learning was achieved in the drop
and no-drop conditions, but the drop condition
required less study time, implying that dropping
has value. However, in Pyc and Rawson’s (in
press) study, dropping was controlled by a com-
puter not the participants. Before concluding that
students should drop flashcards when they study,
it is important to examine the effect of self-
regulated dropping.
In summary, then, there are good reasons to
expect the intuitive promise of dropping flash-
cards to be coupled with some actual benefits
(e.g., Pyc & Rawson, in press), especially given
that allowing people to decide how they study has
had positive results in previous experiments (e.g.,
Kornell & Metcalfe, 2006; Nelson, Dunlosky,
Graf, & Narens, 1994). However, there are a lso
reasons why dropping items might be perilous
and problematic. The experiments we report were
designed to clarify the memory and metamemory
processes that are intrinsic to self-regulated study
and the consequences of those processes for
Participants studied two lists of English !Swahili
translations for 10 minutes each. The procedure
was similar to studying flashcards: Participants
cycled through the same cards repeatedly, and on
each trial the front of the card was shown first,
allowing the participant to test himself or herself,
before the card appeared to flip and the back was
shown. Participants were allowed to drop items
while studying one of the two lists (Drop condi-
tion), but not the other (No-drop condition). A
cued-recall test on all of the words was adminis-
tered either immediately or afte r a week’s delay.
Participants. A total of 60 Columbia University
students participated during one of four lab
sessions to fulfil a class requirement. There were
31 and 29 participants in the immediate and
delayed conditions, respectively.
Materials. The materials were 40 English!Swa-
hili translations, 20 per list, selected from a set
published by Nelson and Dunlosky (1994). Each
list contained a mixture of easy (e.g., cloud-
wingu), medium (e.g., lung-pafu), and difficult
(e.g., forgery-ubini) pairs.
Design. The experiment was a 2 (Study- control:
Drop vs No-drop)" 2 (Delay: Immediate vs
Delayed) mixed design, with Study-control and
Delay manipulated within and between partici-
pants, respectively. The order of the Drop and
No-drop lists was counterbalanced across partici-
Procedure. The instructions described the ex-
periment as similar to studying with flashcards,
and explained the procedure in detail. Each of the
two lists was then presented for 10 minutes, and
participants were allowed to study as many items
as they could in that time. A clock at the top right
corner of the screen counted down the time
remaining for study.
The translations were presented one word at a
time. First the ‘‘front’’ of the card (the English
cue) was shown for 1.5 s; then the card appe ared
to flip, and the ‘‘back’’ of the card (the Swahili
target) appeared for 3 s. After the target dis-
appeared, participants in the Drop condition were
asked to choose*by selecting either a ‘‘Study
again later’’ button or a ‘‘Remove from stack’’
button*whether to keep the item in the stack
during subsequent cycles through the list (i.e., put
it at the back of the ‘‘stack’’), or drop it. In the
No-drop condition, only the ‘‘Study again later’’
button was presented. Participants were
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
prompted to hurry if they took longer than 4
seconds to make their choice.
If a participant dropped all of the word pairs,
the screen remained blank until 10 minutes was
up. This aspect of the procedure, which was
explained in the instructions, was necessary to
equate the time spent on the Drop and No-drop
lists, and also discouraged participants from trying
to hasten the end of the experiment by dropping
all of their cards.
During the final-test phase, the words from the
two lists were mixed and tested in random order.
The English cue was shown and participants were
asked to type in the Swahili target. Participants
were prompted to hurry if they took more than 12
seconds to respond.
Results and discussion
During the study phase, for participants in both
the Immediate and Delayed conditions, the aver-
age number of times an item was presented was
higher in the No-drop condition than in the Drop
condition, (5.29 vs 4.60 and 5.29 vs 4.92, respec-
tively; SDs#.37, 1.06, .26, .62, respectively). (The
distributions in each of the four conditions were
negatively skewed, because many participants
reached close to the maximum possible number
of study trials.) The effect of Study-control was
significant, F(1, 58)# 21.41, pB.0001, MSE# .39,
# .27. The effect of Delay condition on number
of study trials was not significant, F(1, 58)# 1.53,
p# .22, MSE#.47, h
# .026, nor was the inter-
action, F(1,58)# 1.96, p# .17, MSE# .39, h
.033. An average of 14.52 (SD# 6.54) and 13.79
(SD# 7.60) items were dropped from study in the
Immediate and Delayed conditions, respectively,
a difference that was not significant, t(58)# .40,
p# .69. (In both conditions, the distribution was
characterised by a large number of participants
who dropped the maximum possible number of
Participants did not benefit from being allowed
to control their study. On the contrary, as Figure 1
shows, test accuracy was significantly worse in the
Drop condition than the No-drop condition, F(1,
58)# 9.97, pB.01, MSE# .020, h
# .15. Not
surprisingly, performance was better on the Im-
mediate test than on the Delayed test, F(1, 58)#
56.88, pB.0001, MSE# .089, h
# .50, but delay
did not interact with the study-control manipula-
tion, F(1, 58)# .58, p# .45, MSE# .020, h
.010. In a separate analysis we excluded partici -
pants who, in their drop-condition list, dropped
all of the pairs before 10 minutes had elapsed (12
and 14 participants were excluded in the Immedi-
ate and Delayed conditions, respectively, leaving
19 and 15 participants in those conditions). Final
test accuracy remained better in the No-drop
condition (M# .36, SD# .30) than the Drop
condition (M# .33, SD# .31), but the effect was
no longer significant, F(1, 32)# .89, p# .35,
MSE#.018, h
# .026. We return to this point in
the General Discussion. There was also a sig-
nificant effect of delay, F(1, 32)# 40.42, pB .0001,
MSE#.077, h
# .56, but no significant interac-
tion, F(1, 32)# .41, p# .52, MSE# .018, h
Experiment 2 was designed to explore the relative
contributions of poor metacognitive monitoring
and bad study strategies to the negative effect of
self-regulation obtained in Experiment 1. With
respect to monitoring, it seemed possible that
overconfidence led partici pants in Experiment 1
to drop items sooner than they should have. To
explore this possibility, half of the participants in
Experiment 2 were asked to make a judgement of
learning (JOL) whenever they dropped an item
from study. We also explored whether partici-
pants had ill-conceived study strategies via a
questionnaire administered at the end of the
1 Week
Proportion Correct
No-drop Drop
Figure 1. Proportion correct on the final test in Experiment 1
as a function of Study-control condition and test delay. Error
bars represent standard errors.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
Participants and materials. The participants
were 112 UCLA undergraduates who participated
for course credit. There were 54 and 58 partici-
pants in the JOL and No-JOL conditions, respec-
tively. The materials were the same as in
Experiment 1.
Design. The experiment was a 2 (Study-control:
Drop vs No-drop)" 2 (JOL condition: JOL vs
No-JOL) mixed design. Participants in the JOL
group were asked to make a JOL immediately
after choosing to drop an item in the Drop
condition; participants in the No-JOL group
were not asked to make JOLs in the Drop
Procedure. The procedure was similar to Ex-
periment 1, but with four changes. First, partici-
pants in the JOL group were asked to make a
JOL each time they dropped an item in the Drop
condition. They did so*when prompted by
‘‘Chance you’ll remember that one on the
test’’*by selecting one of six buttons, which
were labelled 0%, 20%, 40%, 60%, 80%, and
100%. Second, the Swahili word became the cue
and the English word the target, making the task
easier. Third, to ensure that participants had time
to test themselves during study, the cue was
shown for 3 s (instead of 1.5 s). Finally, on the
final test, all of the pairs from the first list were
tested in random order, followed by all of the
pairs from the second list.
Between the study and test phases, there was a
5-minute distractor task, during which partici-
pants were asked to identify famous people based
on photographs presented upside-down. A post-
experimental questionnaire asked participants a
series of questions about their experience in the
experiment and their study habits outside the
Results and discussion
Study phase. During the study phase, in both
the JOL and No-JOL conditions, the average
number of times an item was presented was
higher in the No-Drop condition than in the
Drop condition (4.14 vs 3.80 and 4.15 vs 3.80,
respectively; SDs#.23, .69, .21, .75, respectively).
(The distributions were negatively skewed, as in
Experiment 1.) This difference, collapsed over
JOL condition, was significant, F(1, 110)# 27.40,
pB .0001, MSE# .25, h
# .20. As the numbers
make clear, the difference did not interact with
group, F(1, 110)# 0, p# .99, MSE# .25, h
# 0,
nor was the effect of making JOLs significant in
the Drop condition, F(1, 110)# .001, p# .97,
MSE#.52, h
# 0. An average of 14.01 (SD#
6.88) and 12.13 (SD# 6.77) items were dropped
from study by the No-JOL and JOL groups,
respectively, a difference that was not significant,
t(110)#$1.52, p#.13. (In both conditions, a
relatively large number of participants dropped
all 20 items, as in Experiment 1.)
Final recall. Figure 2 shows the proportion of
items correctly recalled by the No-JOL group
(left panel) and JOL group (right panel). Con-
sistent with the results of Experiment 1, the trend
was towards impaired learning when participants
were allowed to control their study (combined
over the JOL and No-JOL groups), although the
effect size was small and the effect was only
marginally significant, F(1, 110)# 3.01, p# .086,
MSE#.027, h
# .026. There was no overall main
effect of JOL condition, F(1, 110)# .88, p# .35,
MSE#.13, h
# .008, nor did JOL group interact
with Study-control, F(1, 110)#.092, p# .76,
MSE#.027, h
# 0.
Of the 112 participants, 37 (14 of 54 in the JOL
group and 23 of 58 in the No-JOL group) dropped
all of their pairs before 10 minutes had elapsed in
the Drop condition. When the data were re-
analysed including only participants who did not
drop all of their pairs, final test performance
No-JOL group
JOL group
Jol Condition
Proportion Correct
No-drop Drop
Figure 2. Proportion correct on the final test in Experiment 2
as a function of Study-control condition and JOL group. JOLs
were only made in one condition, the JOL/Drop condition,
represented by the rightmost bar.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
remained better in the No-drop condition (M#
.61, SD# .27) than the Drop con dition (M# .59,
SD# .26), but the effect was no longer significant,
F(1, 73)# .26, p# .61, MSE# .028, h
# .003. The
effect of JOL condition was not significant, F(1,
73)# .35, p# .55, MSE# .11, h
# .005, nor was
the interaction, F(1, 73)# .002, p# .97, MSE#
.028, h
# 0. Whether participants who dropped
all items should be included or excluded from the
analysis is discussed in the General Discussion.
Judgements of learning. Particip ants in the JOL
group were quite accurate in predicting the like-
lihood that they woul d be able to recall the items
they decided to drop. JOLs averaged 51% (SD#
25) and recall of items on which JOLs were made
averaged 56% (SD# 39), a small under-confi-
dence effect that was not significant, t(51)# 1.14,
p# .26. Participants’ JOLs were also accurate in
terms of resolution: The average Gamma correla-
tion between JOLs and test accuracy was sig-
nificantly greater than zero, M# .59 (SD# .51),
t(26)#6.03, pB .0001 (given how Gamma is
calculated, only the 27 participants with at least
one correct and one incorrect response, at a
minimum of two JOL levels, could be analysed).
Overall, then, the fact that participants did not
learn more when they were allowed to regulate
their study (in the Drop condition) than when
they were not (in the No-drop condition) appears
attributable to factors other than poor metacog-
nitive monitoring, as measured by calibration or
A possible contributor to the JOL participants’
high levels of metacognitive accuracy is that they
reported having tested themselves while studying,
which increases both resolution (e.g., Dunlosky &
Nelson, 1992) and calibration (e.g., Koriat et al.,
2006). A total of 80% of participants said ‘‘yes’’ in
response to the question ‘‘While you were study-
ing, did you try to retrieve the English word on
the ‘back’ of the card while you were looking at
the Swahili word (on the ‘front’ of the card)?’’
Surprisingly, the distribution of participants’
JOLs for dropped items was roughly normal, as
shown in Figure 3. This distribution is strikingly at
odds not only with our prior expectations, but also
with participants’ self-reported study strategies.
In response to the question ‘‘What made you
decide to drop a word from your stack (instead of
keeping it)?’’, 79% of participants repo rted
dropping items that were easy or items that they
felt they had learned. The remaining participants
reported dropping the hardest items (17%) or a
mixture of easy and hard items (4%). Given that
pattern, one might have expected the most
frequent responses to be JOLs of 100 (corre-
sponding to items that participants perceived as
already learned), followed by JOLs of 0 (corre-
sponding to items perceived as too hard)*but 0
and 100 were the least frequent responses.
A possible interpretation of the distribution of
JOLs, given participants’ self reports, is that they
adopted the strategy of studying a given item until
they knew it now*that is, on the tests embedded
in each cycle through the flashcards*even if they
thought they might not remember it on the final
test. Thus, as in other studies, the participants
apparently under-weighted the positive conse-
quences of additional study (see Koriat, Sheffer,
& Ma’ayan, 2002; Kornell & Bjork, 2006; Rohrer
et al., 2005) and self-tests (which appear to be
especially important for items that can already be
recalled; see Karpicke & Roediger, 2007).
Final recall revisited. If participants’ metacog-
nitive monitoring was accurate, then perhaps
their failure to benefit from dropping items was
caused by ineffective study strategies. We ana-
lysed each of the two general categories of self-
reported study strategies separately. When the
79% of participants who reported dropping easy/
known items were analysed, average accuracy was
identical in the Drop and No-drop conditions
(M#.62, SD# .30 in both cases). Thus, while not
effective, these participants’ study strategies were
not harmful. The 21% of participants who re-
ported dropping either the hard items or a
0 20 40 60 80 100
Figure 3. Frequency of responses at each JOL level in
Experiment 2. JOLs were made immediately, and only, after
a given item was dropped.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
mixture of hard and learned items, by contrast,
contributed heavily to the negative effect of
dropping items. Their test performance in the
Drop condition (M# .52, SD# .17) was signifi-
cantly lower than in the No-drop condition (M#
.70, SD# .22), F(1, 19)# 9.49, pB .01, MSE#
.029, h
# .33.
There was no main effect of JOL
condition, F(1, 19)#.69, p# .42, MSE# .053,
# .035, nor was there an interaction, F(1,
19)# .003, p# .96, MSE# .029, h
# 0. For these
participants in particular, the hypothesis that poor
study strategies contributed to the negative effect
of dropping was supported. We discuss the RPL
model in light of this finding in the General
Post-experimental questionnaire. On the post-
experimental questionnaire participants were
asked ‘‘Do you study with flashcards in real
life? If so, do you remove cards from your stack
as you go?’’ In response, 56% said they study with
flashcards and, of those, 75% said they dropped
items as they studied.
The results of Experiment 2 suggested that
participants failed to profit from being able to
drop items because they failed to appreciate the
benefits of continuing to study an item after they
could recall the targe t correctly. Experiment 3
was designed to explicitly examine the number of
times participants recalled a target correctly
before deciding to drop a given pair.
Participants and materials. The participants
were 25 UCLA unde rgraduates who participated
for course credit. The materials were the same as
in Experiments 1 and 2.
Procedure. The procedure was similar to Ex-
periment 2 with two exceptions: both lists were
assigned to the Drop condition and after the
Swahili cue word was presented for 3 s, partici-
pants were asked to type in the English response
word. A 3-s presentation of the correct English
word followed, after which the participant chose
to drop the pair or keep it in the list for further
study. The first time through each list participants
were not asked to type in responses, because they
had yet to be exposed to the correct answers.
Results and discussion
In order to analyse the data conservatively with
respect to the hypothesis that participants drop
items too quickly, we treated misspelled answers
as correct (because participants may have be-
lieved their answers were correct when they
decided to drop), and we included only the 22
participants who reported using a strategy of
dropping easy or known items (the other three
participants frequently dropped items that they
had never answered correctly).
Figure 4 shows the percentage of items that
were dropped after zero, one, two, three, and four
correct responses, pooled across lists and partici-
pants (no participant answered correctly five
times or more). The data in Figure 4 represent
all trials on which an item was dropped in either
list, pooled across participants. The majority of
items were dropped after one correc t response.
Surprisingly, 13% of the items were dropped after
no correct responses, despite the fact that all of
the participants included in the analysis reported
dropping easy or known items. A hindsight bias
0 1 2 3 4
Correct Answers Before Drop
Percentage of Items
Figure 4. Frequency of dropping after zero, one, two, three,
and four correct responses in Experiment 3.
Participants who did not drop any items and participants
who did not give an interpretable answer to the strategy
question on the ques tionnaire were excluded from this
analysis. When all 32 participants who did not repor t
dropping easy/known items are included, the effect remains
significant, F(1, 30) # 10.47, p B .01, h
# .26. The effect also
remains significant when the two participants from this group
who dropped all of the items in less than 10 minutes are
excluded, F(1, 28) # 7.44, p B .05, h
# .21.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
may have caused participants to believe that they
had actually known the answer after it was shown,
even though they could not recall it when tested.
In total, 75% of the items that were dropped were
dropped after less than two successful recall
The finding that people dropped items very
quickly, usually after a single correct recall
attempt, may help to explain why participants in
Experiments 1 and 2 did not benefit from being
allowed to control their study. The data support
that conclusion; final test accuracy for items
dropped after zero, one, or two correct responses
was .19, .62, and .88 (SD# .15, .24, .17, respec-
Waiting to recall an item twice before
dropping it increased final test performance by 26
percentage points compared to recalling it once,
and 69 percentage points compared to not recal-
ling it at all. Had participants waited to drop an
item until they had recalled it more times, it
appears as though they might have benefited
from dropping. The drawback of waiting to drop
an item, however, is that other items, which have
not been dropped, receive less extra attention and
may be learned less well as a result. In Experi-
ment 4 we examined how different dropping
strategies affect all items, by controlling the
number of times an item was recalled before it
was dropped.
There is a compelling, if counterproductive,
logic to terminating study after one successful
recall attempt. After a first successful recall,
future recall success is almost guaranteed in the
short term (e.g., Landauer & Bjork, 1978). Given
that pe ople seem to think of tests primarily as
diagnoses of memory, not as learning events
(Kornell & Bjork, 2007a; Kornell & Son, 2006),
they may, paradoxically, think that there is little
point in studying, or testing oneself on, an item
that has been successfully recalled. That is,
participants may reason as follows: There is no
point in returning to an item that I will surely get
correct next time anyway, and if I got it this time,
I will surely get it next time, so why not drop the
item? The flaw in this logic, of course, is that
restudying an item that one can already retrieve
correctly can have enormous memory benefits
(e.g., Karpicke & Roediger, 2007; Landauer &
Bjork, 1978).
The results of Experim ent 3 suggested that, in the
first two experiments, dropping flashcards was
ineffective because participants did so too ea-
gerly, usually after a single correct recall. Experi-
ment 4 tested that hypothesis. Each participant
completed a Drop list and a No-drop list, but in
three between-par ticipant conditions, items in the
drop list were dropped either (a) automatically
after one correct recall, (b) automatically after
two correct recalls, or (c) under participant
control (the third condition replicated the pre-
vious experiments).
Participants and materials. The participants
were 57 UCLA unde rgraduates who participated
for course credit. There were 21, 17, and 19
participants in the User-control, Autodro p-1 and
Autodrop -2 conditions, respectively. The materi-
als were the same as the materials in the previous
Design. The experiment was a 2 (Study-control:
Drop vs No-drop)" 3 (Drop rule: User-control,
Autodrop -1, Autodrop-2) mixed design. Items
were never dropped in the No-drop condition,
which was the same in all three between-partici-
pant conditions. The Drop condition differed
across groups. In the User-control group, partici-
pants were allowed to determine whether or not
they dropped items; in the Autodrop-1 condition
the computer dropped items automatically after
one correct response; in the Autodrop-2 condi-
tion, the computer dropped items automatically
after two correct responses.
Procedure. The User-control condition was a
replication of Experiments 1 and 2, using the trial
structure of Experiment 3: On each trial partici-
pants were shown a Swahili cue word for 3
seconds, and, except on the first encounter with
each pair, they were asked to type in its English
translation; then they were shown the correct
One might hypothesise that the small number of items
recalled multiple times and then dropped reflected a decision
not to drop items recalled multiple times. The opposite was
true: Participants were more likely to drop items that they had
recalled multiple times (83%) than items recalled less than
twice (66%).
Accuracy was computed separately for each participant,
and then the participants’ scores were averaged. Only 14 of the
22 participants, who had at least one observation at each of the
three levels, could be included in the analysis. Items dropped
after three or four correct responses could not be included due
to a lack of observations.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
answer. If the list was assigned to the Drop
condition, participants were then allowed to
decide whether to continue studying the item, or
drop it. However, if the list was assigned to the
No-drop condition, participants*unlike in the
prior experiments*were not required to press
‘‘Study again later’’ at the end of each trial. That
requirement was removed to make the No- drop
condition consistent with the two autodrop con-
ditions, in which participants were never shown
the ‘‘Study again later’’ or ‘‘Remove from stack’’
The Autodrop-1 condition was the same as the
User-control condition, except that participants
could not choose to drop an item; instead, the
program dropped items automatically after they
were answered correctly once. In the Autodrop-2
condition, items had to be answered correctly
twice, not necessarily consecutively, to be
dropped. As in Experiment 3, misspelled answers
were considered correct during the study phase
and the final test.
Results and discussion
Participants whose items had all been dropped
before the end of the list in the Drop condition
were excluded from all analyses. The number of
participants who were excluded in the User-
control, Autodrop-1, and Autodrop-2 conditions,
respectively, was 7, 9, and 3, leaving 14, 8, and 16
participants, respectively.
Participants in the User-cont rol condition dis-
played the same tendency to drop items quickly
as did participants in Experiment 3: 68% of the
items were dropped after only one correct re-
sponse, and an additional 8% were dropped
without having been answered correctly at all.
Final test accuracy was analysed using a 3
(Drop rule)" 2 (Study-control) ANOVA. As
Table 1 shows, there was a significant effect of
Drop rule on final test accuracy, F(2, 35)# 4.44,
pB .05, MSE# .13, h
# .20, with participants in
the Autodrop-1 condition showing relatively poor
performance. (Note, though, that comparing per-
formance between participants is problematic
because different numbers of participants were
excluded from the analyses in the different
conditions; a problem that, fortunately, does not
apply to the within-participant comparison of
Drop vs No-drop.) More importantly, there was
a significant effect of Study-control: Final test
accuracy was higher in the No-drop condition
than the Drop condition, F (1, 35)# 5.87, pB.05,
MSE#.030, h
# .14. Although the interaction
was not significant, F(2, 35)# 1.93, p#.16,
MSE#.030, h
# .10, the Autodrop-1 condition
appears to have contributed heavily to the main
A planned comparison showed that for parti-
cipants in the Autodrop-1 condition, final test
accuracy was significantly higher for lists on
which no items were dropped (in the No-drop
condition) than it was for lists on which items
were dropped after 1 correct response (in the
Drop condition), t(7)# 2.58, pB .05. Final test
accuracy was also higher in the No-drop condition
than it was in the Drop condition for participants
in the Autodrop-2 condition, although the effect
was not significant, t(15)# .85, p# .41. In the
User-control condition, final test accuracy was
higher in the No-drop condition than it was in the
Drop condition, although the difference did not
approach significance t(13)# .35, p# .74.
In summary, the results demonstrated that
dropping items after a single correct recall was a
maladaptive strategy. Nevertheless, participants
in the User-control condition dropped the major-
ity of their items after a single correct recall,
replicating Experiment 3. The small but consis-
tent disadvantage of allowing participants to drop
flashcards while studying was also replicated,
although the difference did not reach statistical
We found that participants did not profit from
being allowed to self-regulate their study time by
dropping items. If anything, dropping resulted in
a small but consistent disadvantage. The disad-
vantage was not significant in every analysis, nor
was it large in numerical terms, but it is truly
surprising because there is a compelling reason to
expect the opposite: Dropping ostensibly known
items allowed participants to focus more study
Mean proportion correct (SD) on the final test in Experiment 4
as a function of Study-control and Drop rule
Drop rule
Study-control User-control Autodrop-1 Autodrop-2
No-drop .67 (.19) .44 (.38) .56 (.32)
Drop .65 (.24) .21 (.16) .50 (.32)
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
time on items that they did not know. The average
student would find the idea of spending equal
time on all information when studying*even
information they feel they already know*very
foolish indeed. The participants were under no
obligation to drop items but did so, presumably,
because they believed that doing so would confer
an advantage.
The fruitlessness of being allowed to drop
items appears to be traceable to poor decision
making, not to poor metacognitive monitoring.
Participants’ relatively good monitoring, as mea-
sured by the resolution and calibration of their
JOLs in Experiment 2, was coupled with non-
optimal decisions as to what items to drop and
when to drop them. Other facto rs may have
played a role, such as the reduced spacing of
study trials on remaining items as other items are
dropped, but the principal implication is that
people misunderstand some basic aspects of
forgetting and learning, and, therefore, how to
manage their study activities.
Types of flawed decision making
Being able to drop items had especially negative
effects for the 20% of participants whose study
strategy was to drop items they judged difficult to
learn. Those participants seem to have believed,
mistakenly, that they would not have sufficient
time to learn the difficult items. The RPL model
suggests that, if there is insufficient time to learn a
difficult item, dropping it can be a good decision
(e.g., Metcalfe & Kornell, 2003, 2005). The
participants’ error was that, in reality, they did
have sufficient time to learn the difficult items. A
similar error has been demonstrated when people
have been asked to predict how much they will
learn by studyi ng once or, for example, four
times: Despite large differences in actual learning,
the predictions are essentially the same (Kornell
& Bjork, 2006). Thus there is one exception to the
assertion that participants’ metacognitive moni-
toring was accurate: Some participants seemed to
underestimate their ability to learn difficult items
across multiple study opportunities.
What about the remaining 80% of
participants*that is, those who reported drop-
ping items they knew or found easy? Assuming
that dropping items is a potentially useful strat-
egy, why did they fail to profit from being allowed
to do so? They, too, may have undervalued the
impact of future study opportunities. Perhaps the
most surprising result of the current experiments
is that participants dropped items that they did
not believe they had learned well enough to
remember on the final test. The nature of their
JOL ratings suggests that their strategy was ‘‘I
know this now, so I’ll drop it, even if I might not
get it on the test later.’’ If, as Experiment 3
suggests, such partici pants did not realise the
benefits of continuing to study and test oneself
past the point when one can initially produce an
answer, it points to their having a fundamental
misunderstanding of how learning works. In fact,
it is precisely those just-re trievable items, accord-
ing to the RPL model, that are most learnable,
and thus that should not be drop ped.
Finally, what about the participants who
dropped all of the items before the time allocated
for studying the list had expired? From one
perspective, they should be excluded from the
analysis. From another perspective, howeve r, they
illustrate some additional perils of self-regulated
study, and thus should be included. During the
post-experimental debriefing, for example, one
participant said that she was well aware*as the
instructions made clear*that a blank screen
would follow if she dropped everything, but did
so anyway because she did not think it would help
to study the items any longer. Moreover, students
are often motivated*by time and other pres-
sures*to stop studying as soon as possible. In
fact, some students drop cards not to allow more
time for others, but rather to hasten the end of a
study session, because they refuse to stop study-
ing until they have dropped all of their cards.
A final consideration is that because dropping
decreases spacing between items, it increases
performance levels in the short term (though not
necessarily in the long term; see Bjork, 1994,
1999). The incr eased performance owing to re-
duced spacing has the potential to increase
students’ confidence, leading them to stop study-
ing sooner than they otherwise would. Thus,
perhaps the fact that some participants spent less
time studying when they were allowed to drop
cards than when they were not is a realistic feature
of the present experiments; one that also argues
for including all participants in the analysis.
Practical recommendations
The current findings suggest that the effectiveness
of dropping flashcards depends on students be-
coming metacognitively so phisticated as learners.
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
Dropping has the potential to be effective, but
students need to understand the value of further
study, including that*as suggested by the RPL
model*items that can be remembered now, but
that may be forgotten later, should be given the
highest priority, not dropped (Metcalfe & Kor-
nell, 2003). They need to learn, too, the benefits
of continuing to be tested on items that one can
already recall (see Karpicke & Roediger, 2007).
In the interests of creating durable learning,
items, if dropped, should be returned to later.
Restudying previously dropped items provides
additional spaced-learning opportunities on those
items. It also identifies items that have not actually
been learned and are in need of further study. It is
important for students to realise that items that
seem ‘‘learned’’ may be forgotten. Informal con-
versations reveal that some students return to
dropped flashcards and some do not. Perhaps the
optimal way of returning to dropped items is via
an expanding schedule (Landauer & Bjork, 1978),
with increased spacing between each successive
study trial. An expanding schedule places less and
less emphasis on items that have been studied,
allowing more study time on other items.
Students need to understand, too, that a danger
of dropping is that it resul ts in a stack of
flashcards that has fewer and fewer cards, result-
ing in decreasing spacing between repetitions of
given item and relatively (and often unrealisti-
cally) easy recall during study. Such easy retrie-
vals, which are of limited value in terms of
fostering long-term recall, can result in illusions
of learning. Introducing difficulty, by increasing
the number of flashcards in a stack (and the
spacing between them), should facilita te long-
term learning (Kornell & Bjork, 2007b).
Finally, on the positive side, it is important that
students understand that studying with flashcards
has important virtues. It incorporates, in a natural
way, both testing and spaced practice, two fea-
tures that, when combined, support both efficient
learning and accurate metacognitive monitoring.
Concluding comment
In general, psychologists tend to think of self-
regulated study as involving decisions about how
and when to study. The present findings demon-
strate, however, that an equally important factor
in efficient self-regulation of study is deciding
when to stop studying*deciding when enough is
enough, so to speak (see Kornell & Bjork, 2007a).
The results also demonstrate that such decisions
require not only complex monitoring and control
processes, but also an understanding of how
people learn.
Manuscript received 23 January 2007
Manuscript accepted 19 October 2007
First published online 14 November 2007
Baddeley, A. D., & Longman, D. J. A. (1978). The
influence of length and frequency of training session
on the rate of learning to type. Ergonomics, 21, 627!
Benjamin, A. S, Bjork, R. A., & Schwartz, B. L. (1998).
The mismeasure of memory: When retrieval fluency
is misleading as a metamnemonic index. Journal of
Experimental Psychology: General, 127, 55!68.
Bjork, R. A. (1994). Memory and metamemory con-
siderations in the training of human beings. In J.
Metcalfe & A. Shimamura (Eds.), Metacognition:
Knowing about knowing (pp. 185!205). Cambridge,
MA: MIT Press.
Bjork, R. A. (1999). Assessing our own competence:
Heuristics and illusions. In D. Gopher & A. Koriat
(Eds.), Attention and performance XVII: Cognitive
regulation of performance: Interaction of theory and
application (pp. 435!459). Cambridge, MA: MIT
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., &
Rohrer, D. (2006). Distributed practice in verbal
recall tasks: A review and quantitative synthesis.
Psychological Bulletin, 132, 354!380.
Christina, R. W., & Bjork, R. A. (1991). Optimising
long-term retention and transfer. In D. Druckman &
R. A. Bjork (Eds.), In the mind’s eye: Enhancing
human performance (pp. 23!56). Washington, DC:
National Academy Press.
Dunlosky, J., & Nelson, T. O. (1992). Importance of
kind of cue for judgements of learning (JOL) and
the delayed-JOL effect. Memory & Cognition, 20,
Dunlosky, J., & Thiede, K. W. (1998). What makes
people study more? An evaluation of factors that
affect people’s self-paced study and yield ‘‘labor-
and-gain’’ effects. Acta Psychologica, 98, 37!56.
Dunlosky, J., & Thiede, K. W. (2004). Causes and
constraints of the shift-to-easier-materials effect in
the control of study. Memory & Cognition, 32, 779!
Karpicke, J. D., & Roediger, H. L. III. (2007). Repeated
retrieval during learning is the key to long-term
retention. Journal of Memory and Language, 57,
Kelley, C. M., & Lindsay, D. S. (1993). Remembering
mistaken for knowing: Ease of retrieval as a basis
for confidence in answers to general knowledge
questions. Journal of Memory and Language, 32,1!
Downloaded By: [CDL Journals Account] At: 18:51 19 February 2008
Koriat, A. (1997). Monitoring one’s own knowledge
during study: A cue-utilization approach to judge-
ments of learning. Journal of Experimental Psychol-
ogy: General, 126, 349!370.
Koriat, A., Ma’ayan, H., Sheffer, L., & Bjork, R. A.
(2006). Exploring a mnemonic debiasing account of
the underconfidence-with-practice effect. Journal of
Experimental Psychology: Learning, Memory, and
Cognition, 32, 595!608.
Koriat, A., Sheffer, L., & Ma’ayan, H. (2002). Compar-
ing objective and subjective learning curves: Judge-
ments of learning exhibit increased underconfidence
with practice. Journal of Experimental Psychology:
General, 131, 147!162.
Kornell, N., & Bjork, R. A. (2006, November). Pre-
dicted and actual learning curves. Paper presented at
the 47th annual meeting of the Psychonomic Society,
Houston, TX.
Kornell, N., & Bjork, R. A. (2007a). The promise and
perils of self-regulated study. Psychonomic Bulletin
& Review, 14, 219!224.
Kornell, N., & Bjork, R. A. (2007b, May). On the
illusory benefits of easy learning: Studying small
stacks of flashcards. Poster presented at the 19th
annual meeting of the Association for Psychological
Science, Washington DC.
Kornell, N., & Metcalfe, J. (2006). Study efficacy and
the region of proximal learning framework. Journal
of Experimental Psychology: Learning, Memory, &
Cognition, 32, 609!622.
Kornell, N., & Son, L. K. (2006, November). Self-
testing: A metacognitive disconnect between memory
monitoring and study choice. Poster presented at the
47th annual meeting of the Psychonomic Society,
Houston, TX.
Landauer, T. K., & Bjork, R. A. (1978). Optimum
rehearsal patterns and name learning. In M. M.
Gruneberg, P. E. Morris, & R. N. Sykes (Eds.),
Practical aspects of memory (pp. 625!632). London:
Academic Press.
Metcalfe, J., & Kornell, N. (2003). The dynamics of
learning and allocation of study time to a region of
proximal learning. Journal of Experimental Psychol-
ogy: General, 132, 530!542.
Metcalfe, J., & Kornell, N. (2005). A region of proximal
learning model of study time allocation. Journal of
Memory and Language, 52, 463!477.
Metcalfe, J., Schwartz, B. L., & Joaquim, S. G. (l993).
The cue familiarity heuristic in metacognition.
Journal of Experimental Psychology: Learning,
Memory, and Cognition, 19, 851!861.
Nelson, T. O., & Dunlosky, J. (1994). Norms of paired-
associate recall during multitrial learning of Swahili!
English translation equivalents. Memory, 2, 325!
Nelson, T. O., Dunlosky, J., Graf, A., & Narens, L.
(1994). Utilization of metacognitive judgements in
the allocation of study during multitrial learning.
Psychological Science, 5, 207!213.
Nelson, T. O., & Narens, L. (1994). Why investigate
metacognition? In J. Metcalfe & A. P. Shimamura
(Eds.), Metacognition: Knowing about knowing (pp.
1!25). Cambridge, MA: MIT Press.
Pyc, M. A., & Rawson, K. A. (in press). Examining the
efficiency of schedules of distributed practice. Mem-
ory & Cognition.
Reder, L. M., & Ritter, F. E. (1992). What determines
initial feeling of knowing? Familiarity with question
terms, not with the answer. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 13,
Rohrer, D., Taylor, K., Pashler, H., Wixted, J. T., &
Cepeda, N. J. (2005). The effect of overlearning on
long-term retention. Applied Cognitive Psychology,
19, 361!374.
Simon, D. A., & Bjork, R. A. (2001). Metacognition in
motor learning. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 27, 907!912.
Son, L. K., & Metcalfe, J. (2000). Metacognitive and
control strategies in study-time allocation. Journal
of Experimental Psychology: Learning, Memory, and
Cognition, 26, 204!221.
Spellman, B. A., & Bjork, R. A. (1992). Technical
commentary: When predictions create reality: Jud-
gements of learning may alter what they are
intended to assess. Psychological Science, 3, 315!
Thiede, K. W., & Dunlosky, J. (1999). Toward a general
model of self-regulated study: An analysis of selec-
tion of items for study and self-paced study time.
Journal of Experimental Psychology: Learning,
Memory, and Cognition, 25, 1024!1037.
... Neste estudo foram utilizados, neste ambiente, quizzes (jogo de perguntas e respostas), flashcards (pequenos cartões com dois lados, um dos lados, deve-se inserir uma pergunta ou palavra-chave, no outro lado, a resposta). (Kornell;Bjork, 2008). Em relação a esse recurso, 69% dos respondentes do estudo indicaram respostas que foram categorizadas como positivas em relação ao seu uso como contribuição para os estudos. ...
... Neste estudo foram utilizados, neste ambiente, quizzes (jogo de perguntas e respostas), flashcards (pequenos cartões com dois lados, um dos lados, deve-se inserir uma pergunta ou palavra-chave, no outro lado, a resposta). (Kornell;Bjork, 2008). Em relação a esse recurso, 69% dos respondentes do estudo indicaram respostas que foram categorizadas como positivas em relação ao seu uso como contribuição para os estudos. ...
Full-text available
RESUMO O estudo analisou a influência das metodologias ativas com a utilização de tecnologias digitais de informação e comunicação (TDIC) no processo de ensino e aprendizagem em anatomia humana no ensino superior. Pesquisa do tipo mista, realizada com alunos do curso de Educação Física de uma universidade privada do sul do Brasil. Foram utilizados o aplicativo WhattsApp para comunicação e para metodologias ativas, SimpleMind Free (mapas mentais) e GoConqr (Quizzes e Flashcards). Os acadêmicos avaliaram de forma positiva e produtiva a concretização das estratégias. Ao participar das atividades de aprendizagem utilizando TDIC e metodologias ativas, os estudantes tiveram melhor performance nos testes objetivos, comparado com a performance ao participar apenas de aulas expositivas e práticas (p = 0,01). Palavras-chave: ensino e aprendizagem, metodologias ativas, tecnologia digitais de informação e comunicação, anatomia humana, ensino superior. ABSTRACT The study analyzed the influence of active methodologies with the use of digital information and communication technologies (DICT) in the teaching and learning
... Finn 2008;Metcalfe and Finn 2008;Shanks and Serra 2014), or they might not reserve enough time to study all the content they needed to review before the exam (cf. Bjork et al. 2013;Dunlosky and Rawson 2012;Grimes 2002;Kornell and Bjork 2008b;Kornell and Metcalfe 2006). Additionally, inaccurate metacognitive monitoring can have negative consequences beyond studying for a single exam or for success in a single course. ...
... Depending on their situation, those courses might be in their major of study, and underperforming on a major course might have greater consequences for their academic standing than underpreparing for an exam in an elective course. As we noted earlier, inaccurate monitoring of learning not only impacts students' performance for a given exam (e.g., Bjork et al. 2013;Dunlosky and Rawson 2012;Kornell and Bjork 2008b;Kornell and Metcalfe 2006;Metcalfe and Finn 2008), but can also compound over time and lead students to question their abilities, change majors, or drop out (cf. Grimes 2002;Stinebrickner and Stinebrickner 2014). ...
Full-text available
The accuracy of judgments of learning (JOLs) is vital for efficient self-regulated learning. We examined a situation in which participants overutilize their prior knowledge of a topic (“domain familiarity”) as a basis for JOLs, resulting in substantial overconfidence in topics they know the most about. College students rank ordered their knowledge across ten different domains and studied, judged, and then completed a test on facts from those domains. Recall and JOLs were linearly related to self-rated knowledge, as was overconfidence: participants were most overconfident for topics they knew more about, indicating the overutilization of domain familiarity as a cue for JOLs. We examined aspects of the task that might contribute to this pattern, including the order of the task phases and whether participants studied the facts blocked by topic. Although participants used domain familiarity as a cue for JOLs regardless of task design, we found that studying facts from multiple topics blocked by topic led them to overutilize this cue. In contrast, whether participants completed the rank ordering before studying the facts or received a warning about this tendency did not alter the pattern. The relative accuracy of participants’ JOLs, however, was not related to domain familiarity under any conditions.
... Second, it is beneficial to examine the linkage between learners' perception and L2 learning (Lambert, 2017). In particular, establishing an accurate alignment between metacognitive judgment and skill improvement through practice is crucial, as the findings yielded by cognitive psychology research indicate that metacognitive judgment of learning influences future learning behavior, such as what to study and how much time to dedicate to specific tasks (Kornell & Bjork, 2008;Kornell & Metcalfe, 2006). Ideally, if learners can accurately judge the effectiveness of a given task-repetition practice and find it beneficial in improving their speech fluency, they would be more likely to invest additional time and effort into task-repetition practice both inside and outside the classroom. ...
... Although spacing was effective for maintaining learners' engagement, the long interval (i.e., 1 week) led to less accurate judgment of their own learning than the spaced repetition practice with a 45-minute interval between performances. Given that metacognitive judgment can affect future learning behaviors (Kornell & Bjork, 2008), it is recommended that teachers select a relatively short interval that is long enough for learners to forget some of the elaborated task content but short enough to remember some of their own previous performance. ...
Full-text available
While task repetition is effective for improving oral fluency, some teachers are reluctant to use it in their classrooms due to the alleged negative perceptions of learners toward repetitive practice. To address this concern, the participants in the current study completed a posttask questionnaire probing their perceptions toward task repetition practice, focusing on metacognitive judgment (i.e., the number of task repetitions considered effective) and emotional engagement (i.e., enjoyment and concentration). Prior to taking the survey, 64 second language learners individually performed the same picture-description task six times under one of the three repetition schedules (massed, short-spaced, and long-spaced condition). Their posttask questionnaire results indicated that task repetition was perceived as an effective and engaging activity (about four to five performances were deemed to be most optimal). Relative to massed task repetition, short-and long-spaced schedules led to higher perceived effectiveness and emotional engagement. Moreover, while the short-spaced group made accurate metacognitive judgment of their fluency gains, learners in the massed practice condition overestimated their fluency gains, possibly due to enjoyment and illusion of high competence. These findings indicate that spacing between task repetitions can influence learners' engagement in the task, which can impact their fluency development.
... Medical students position themselves as recipients of information rather than active participants in making sense of what feedback means and how to enact it. Much work supports that even learners with high levels of self-regulation struggle to develop their learning strategies independently [36,37]. Learners identify a need to take control of learning as they progress [38] but guidance through this progression appears critical. ...
Full-text available
Introduction While feedback aims to support learning, students frequently struggle to use it. In studying feedback responses there is a gap in explaining them in relation to learning theory. This study explores how feedback experiences influence medical students’ self-regulation of learning. Methods Final-year medical students across three campuses (Ireland, Bahrain and Malaysia) were invited to share experiences of feedback in individual semi-structured interviews. The data were thematically analysed and explored through the lens of self-regulatory learning theory (SRL). Results Feedback interacts with learners’ knowledge and beliefs about themselves and about learning. They use feedback to change both their cognitive and behavioural learning strategies, but how they choose which feedback to implement is complex. They struggle to generate learning strategies and expect teachers to make sense of the “how” in addition to the “what”” in planning future learning. Even when not actioned, learners spend time with feedback and it influences future learning. Conclusion By exploring our findings through the lens of self-regulation learning, we advance conceptual understanding of feedback responses. Learners’ ability to generate “next steps” may be overestimated. When feedback causes negative emotions, energy is diverted from learning to processing distress. Perceived non-implementation of feedback should not be confused with ignoring it; feedback that is not actioned often impacts learning.
... Even in those experiments, allowing participants to self-regulate their encoding session, most memory researchers cut short participants' control of the number of items they needed to study (Nelson & Narens, 1994). Although the theoretical and practical importance of learning termination was often stressed in the literature (Benjamin, 2007;Kornell & Finn, 2016), only a handful of studies investigated its consequences for memory retention (Karpicke, 2009;Kornell & Bjork, 2008;Krogulska et al., 2021;Le Ny et al., 1972;Murayama et al., 2016). Their main implication was that learners should avoid learning termination or dropping items from further restudy (cf. ...
Full-text available
This study explores whether people's preference to restrict to-be-learned material is influenced by memory test timing. In Experiments 1a and 2a, participants studied word lists. For control groups, lists were displayed in their entirety, whereas participants in other groups could stop the lists early. We investigated whether participants decided to terminate learning when they expected their free-recall memory to be tested after a short (Experiment 1a) or long (Experiment 2a) delay. Experiments 1b and 2b tested participants' theoretical assumptions about learning termination. Participants who terminated learning recalled fewer words than those who saw all to-be-remembered materials. When the memory test immediately followed the learning phase, more than half of the participants decided to stop learning. However, when there was any time delay between learning and testing, only around a quarter of them decided to stop. Delayed testing can effectively discourage a maladaptive learning strategy of learning termination. Sometimes a tendency to avoid cognitive effort (Kool et al., 2010) leads people to create useful solutions and heuristics. More often, however, it results in choosing a suboptimal strategy to deal with various cognitive tasks. Studies by Murayama et al. (2016) and Kro-gulska et al. (2021) showed that a great number of people have a strong tendency to stop learning prematurely, which has detrimental consequences for memory performance. These observations were made based on experimental procedures where the retention interval (RI) between learning and testing was negligible. The aim of the current study is to determine whether an expected delay before testing influences the decision about how many to-be-remembered items to study during learning. This problem is important not only from a practical standpoint, but it also sheds light on factors determining metamemory control decisions in self-regulated learning episodes. The strategic allocation of encoding resources has mostly been tested in the context of study-time allocation (Gönül et al.
... As a consequence, they typically avoid using retrieval until later phases of study when they believe retrieval is more likely to be successful (Ariel & Karpicke, 2018;Janes et al., 2018). When retrieval is successful, students prefer to drop material from further practice (Karpicke, 2009;Kornell & Bjork, 2008). Thus, younger adults may practice retrieval in some instances to monitor their learning, but they engage in limited repeated spaced retrieval practice and do not recall content to optimal criterion levels during practice to maintain it long-term (Wissman et al., 2012). ...
Retrieval practice can reduce associative memory deficits for older adults but they underutilize this potent learning tool during self-regulated learning. The current experiment investigated whether teaching older adults to use retrieval practice more can improve their self-regulated learning. Younger and older adults made decisions about when to study, how often to engage in retrieval practice, and when to stop learning a list of medication-side effect pairs. Some younger and older adults received instructions before learning that emphasized the mnemonic benefits of retrieval practice over restudying material and described how to schedule retrieval practice to learn to a goal criterion level. This minimal intervention was effective for improving both younger and older adults' associative memory. These data indicate that a simple strategy for improving older adults self-regulated learning is to provide them with instructions that teach them how to use criterion learning to schedule their retrieval practice for to-be learned material.
... A few examples of digital FCs include Anki, Cram, Open cards, Osmosis, and Quizlets (Hart-Matyas et al. 2019). Flashcards were reported to be commonly used in undergraduate colleges (Kornell and Bjork 2008) and allied health professional schools McAndrew et al. 2016;Wanda et al. 2016). A couple of publications reported on the use of flashcards as a study tool in medical schools (Allen et al. 2008;Taveira-Gomes et al. 2015). ...
Full-text available
Is self-assessment enough to keep physicians' cognitive skills-such as diagnosis, treatment, basic biological knowledge, and communicative skills-current? We review the cognitive strengths and weaknesses of self-assessment in the context of maintaining medical expertise. Cognitive science supports the importance of accurately self-assessing one's own skills and abilities, and we review several ways such accuracy can be quantified. However, our review also indicates a broad challenge in self-assessment is that individuals do not have direct access to the strength or quality of their knowledge and instead must infer this from heuristic strategies. These heuristics are reasonably accurate in many circumstances, but they also suffer from systematic biases. For example, information that feels easy to process in the moment can lead individuals to overconfidence in their ability to remember it in the future. Another notable phenomenon is the Dunning-Kruger effect: the poorest performers in a domain are also the least accurate in self-assessment. Further, explicit instruction is not always sufficient to remove these biases. We discuss what these findings imply about when physicians' self-assessment can be useful and when it may be valuable to supplement with outside sources.
Full-text available
Although tests and assessments-such as those used to maintain a physician's Board certification-are often viewed merely as tools for decision-making about one's performance level, strong evidence now indicates that the experience of being tested is a powerful learning experience in its own right: The act of retrieving targeted information from memory strengthens the ability to use it again in the future, known as the testing effect. We review meta-analytic evidence for the learning benefits of testing, including in the domain of medicine, and discuss theoretical accounts of its mechanism(s). We also review key moderators-including the timing, frequency, order, and format of testing and the content of feedback-and what they indicate about how to most effectively use testing for learning. We also identify open questions for the optimal use of testing, such as the timing of feedback and the sequencing of complex knowledge domains. Lastly, we consider how to facilitate adoption of this powerful study strategy by physicians and other learners.
The U.S. military is faced with expanding logistical challenges to train students effectively. Providing students with adaptive training (AT) systems during their courses can help address these challenges. It is unclear, however, what individual differences lead to students using AT systems as course aids. To answer this question, we conducted the current research to investigate usage of a flashcard-based AT system and its association with individual differences in U.S. Marine Corps students. We chose to examine self-regulated learning (SRL), intrinsic motivation (IM), and achievement goal orientation in relation to training system usage, as previous research has revealed associations between these variables and improved learning outcomes and positive learning behaviors. Students were provided an AT flashcard system on their military-issued laptops and told they could utilize it as a study aid as much or as little as they preferred during their course. Results revealed varying degrees of system usage overall. Additionally, we uncovered positive associations between achievement goals and IM as they related to AT system usage. We discuss implications for AT system usage in live classrooms, as well as provide suggestions for future AT system developers as they seek to improve system usage among students.KeywordsAdaptive trainingFlashcard trainingMastery learningIndividual differences
Full-text available
When participants studied a list of paired associates for several study-test cycles, their judgments of learning (JOLs) exhibited relatively good calibration on the 1st cycle, with a slight overconfidence. However, a shift toward marked underconfidence occurred from the 2nd cycle on. This underconfidence-with-practice (UWP) effect was very robust across several experimental manipulations, such as feedback or no feedback regarding the correctness of the answer, self-paced versus fixed-rate presentation, different incentives for correct performance, magnitude and direction of associative relationships, and conditions producing different degrees of knowing. It was also observed both in item-by-item JOLs and in aggregate JOLs. The UWP effect also occurred for list learning and for the memory of action events. Several theoretical explanations for this counterintuitive effect are discussed.
Full-text available
Metacognition offers an up-to-date compendium of major scientific issues involved in metacognition. The twelve original contributions provide a concise statement of theoretical and empirical research on self-reflective processes or knowing about what we know. Self-reflective processes are often thought to be central to what we mean by consciousness and the personal self. Without such processes, one would presumably respond to stimuli in an automatized and environmentally bound manner—that is, without the characteristic patterns of behavior and introspection that are manifested as plans, strategies, reflections, self-control, self-monitoring, and intelligence. Bradford Books imprint
Full-text available
Nelson and Dunlosky (Psychological Science, July 1991) reported that subjects making judgments of learning (JOLs) can be extremely accurate at predicting subsequent recall performance on a paired-associate task when the JOL task is delayed for a short while after study. They argued that this result is surprising given the results of earlier research, as well as their own current experiment, indicating that JOLs are quite inaccurate when made immediately after study. We note that the delayed-JOL procedure used by Nelson and Dunlosky invited covert recall practice (which was reported by their subjects). Retrieval practice is a well-known determinant of subsequent recall. Accordingly, Nelson and Dunlosky's findings can be explained by the simple assumption that people base delayed JOLs on an assessment of retrieval success, which in turn influences their retrieval success on the subsequent recall test.
Full-text available
People of all ages are more likely to choose to restudy items (or allocate more study time to items) that are perceived as more difficult to learn than as less difficult to learn. Existing models of self-regulated study adequately account for this inverse relation between perceived difficulty of learning and these 2 measures of self-regulated study (item selection and self-paced study). However, these models cannot account for positive relations between perceived difficulty of learning and item selection, which are demonstrated in the present investigation. Namely, in Experiments 1 and 2, the authors described conditions in which people more often selected to study items judged as less difficult than as more difficult to learn. This positive relation was not demonstrated for self-paced study, which was always negatively correlated with judged difficulty to learn. In Experiments 3 through 6, the authors explored explanations for this dissociation between item selection and self-paced study. Discussion focuses on a general model of self-regulated study that includes planning, discrepancy reduction, and working-memory constraints. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Full-text available
The reading we take of our own competence is arguably as important in many real-world contexts as is our actual competence. For example, in settings where on-the-job learning can be disastrous from a personal or societal standpoint, such as air traffic control or nuclear plant operations, it can be imperative that we possess the skills and knowledge we think we possess. Individuals who overestimate their own current level of skill and knowledge pose a unique hazard to themselves and others. More broadly, the reading we take of our current level of learning and knowledge is a given domain determines such important matters as how we allocate our time, whether we seek further study or practice, whether we volunteer for or avoid certain assignments, and whether we instill confidence in others. Recent findings demonstrate, however, that humans frequently misassess their own competence, and that such misassessments typically take the form of overconfidence. At the root of such overconfidence, it is argued, is a misinterpretation of the meaning and predictive value of certain indices of current performance. The authors' goal is to summarize the types of illusions of comprehension and competence that have been identified and to outline the implications for real-world instruction. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Full-text available
[provide an answer to the question:] Why should researchers of cognition investigate metacognition / focus . . . on the metacognitive aspects of learning and memory / [examine] some shortcomings in previous research on memory [i.e., lack of a target for research, overemphasis on a nonreflective-organism approach, and short-circuiting via experimental control] / offer the beginnings of a foundation designed to facilitate cumulative progress [in human learning and memory research] (PsycINFO Database Record (c) 2012 APA, all rights reserved)
We contrasted several ways that an individual's judgments of learning (JOLs) can be utilized when allocating additional study (“restudy”) during the learning of Swahili-English translation equivalents The findings demonstrate how metacognitive monitoring can be utilized to benefit multitrial learning Computer-controlled allocation of restudy based on people's JOLs was equivalent to most people's own allocation of restudy (indicating that the computer algorithm can provide a sufficient account of people's allocation of restudy) and was more effective than a computer-controlled allocation based on normative performance (indicating that people's metacognitive monitoring of idiosyncratic knowledge has functional utility in causal chains for learning)
We propose that confidence in potential answers to general knowledge questions is based, in part, on the ease with which those answers come to mind. Consistent with this hypothesis, prior exposure to correct and to related but incorrect answers to general knowledge questions increased the speed, frequency, and confidence with which subjects gave those answers on a subsequent test of general knowledge. Similar effects were obtained even when subjects were warned that the list included incorrect answers (Experiment 2). The results of Experiment 3 indicated that the effects do not rely on deliberate search of memory for the list: Subjects who read a list with correct answers to half of the questions on a subsequent test gained full benefit of exposure to correct answers relative to subjects who read a list with correct answers to all of the questions, yet showed no cost on questions for which answers were not in the list relative to subjects who read a list of unrelated fillers. Finally, Experiments 4a and 4b demonstrated that prior exposure to incorrect answers can give rise to illusions of knowing even when subjects know that all of the answers on the study list were incorrect. In those studies, subjects were correctly informed that all of the answers on the list were incorrect, yet those who had studied the list with divided attention nonetheless tended to give the studied incorrect answers as responses to the knowledge questions. We discuss these findings in terms of Jacoby, Kelley. and Dywans (1989) attributional approach to subjective experience.
Four groups of postmen were trained to type alpha-numeric code material using a conventional typewriter keyboard. Training was based on sessions lasting for one or two hours occurring once or twice per day. Learning was most efficient in the group given one session of one hour per day, and least efficient in the group trained for two 2-hour sessions. Retention was tested after one, three or nine months, and indicated a loss in speed of about 30%. Again the group trained for two daily sessions of two hours performed most poorly. It is suggested that where operationally feasible, keyboard training should be distributed over time rather than massed.