ArticlePDF Available

Abstract and Figures

Previous studies investigating posttest feedback have generally conceptualized feedback as a method for correcting erroneous responses, giving virtually no consideration to how feedback might promote learning of correct responses. Here, the authors show that when correct responses are made with low confidence, feedback serves to correct this initial metacognitive error, enhancing retention of low-confidence correct responses. In 2 experiments, subjects took an initial multiple-choice test on general knowledge facts and made a confidence judgment after each response. Feedback was provided for half of the questions, and retention was assessed by a final cued-recall test. Taking the initial test improved retention relative to not testing, and feedback further enhanced performance. Consistent with prior research, feedback improved retention by allowing subjects to correct initially erroneous responses. Of more importance, feedback also doubled the retention of correct low-confidence responses, relative to providing no feedback. The function of feedback is to correct both memory errors and metacognitive errors.
Content may be subject to copyright.
Correcting a Metacognitive Error: Feedback Increases Retention of Low-
Confidence Correct Responses
Andrew C. Butler
Washington University in St. Louis
Jeffrey D. Karpicke
Purdue University
Henry L. Roediger, III
Washington University in St. Louis
Previous studies investigating posttest feedback have generally conceptualized feedback as a method for
correcting erroneous responses, giving virtually no consideration to how feedback might promote
learning of correct responses. Here, the authors show that when correct responses are made with low
confidence, feedback serves to correct this initial metacognitive error, enhancing retention of low-
confidence correct responses. In 2 experiments, subjects took an initial multiple-choice test on general
knowledge facts and made a confidence judgment after each response. Feedback was provided for half
of the questions, and retention was assessed by a final cued-recall test. Taking the initial test improved
retention relative to not testing, and feedback further enhanced performance. Consistent with prior
research, feedback improved retention by allowing subjects to correct initially erroneous responses. Of
more importance, feedback also doubled the retention of correct low-confidence responses, relative to
providing no feedback. The function of feedback is to correct both memory errors and metacognitive
errors.
Keywords: feedback, testing, metacognition, confidence
Testing of information can have a powerful positive effect on
future retention of the tested material, a phenomenon known as the
testing effect (Butler & Roediger, 2007; Carpenter & DeLosh,
2006; Carpenter & Pashler, 2007; Chan, McDermott, & Roediger,
2006; Karpicke & Roediger, 2008; Roediger & Karpicke, 2006a,
2006b). Testing often improves later retention even when students
are not given feedback following the test, and providing feedback
produces even greater gains in long-term retention (Butler &
Roediger, 2008; Karpicke & Roediger, 2007; McDaniel & Fisher,
1991). In this article, we examined whether feedback improves
long-term retention only of responses that are incorrect on an
initial test or whether feedback also improves retention of initially
correct responses. A large body of research has investigated factors
that determine the effectiveness of feedback, such as how different
types and schedules of feedback affect learning (see Azevedo &
Bernard, 1995; Bangert-Drowns, Kulik, Kulik, & Morgan, 1991;
Kulhavy & Stock, 1989; Kulik & Kulik, 1988). The theoretical
synthesis of this research has yielded many suggestions for how
and when feedback should be given. Although practical recom-
mendations for maximizing the efficacy of feedback vary consid-
erably (and sometimes contradict each other), the majority of these
suggestions are derived from research aimed at only one aspect of
feedback, viz., that the purpose of feedback is to correct errors. In
fact, the current zeitgeist has so sharply shifted toward conceptu-
alizing feedback as an error-correction mechanism that the possi-
ble effect of feedback on correct responses is often minimized or
completely neglected. For example, some recent investigations of
feedback have exclusively dealt with how feedback influences the
correction of errors, without examining how feedback affects
learning of correct responses (e.g., Butterfield & Metcalfe, 2001,
2006; Meyer, 1986).
The purpose of the current research is to reexamine the effect of
feedback on retention of initially correct responses. Of course, we
are not arguing against the fact that correcting memory errors is a
key purpose of feedback. Instead, we believe that feedback also
functions as an error-correction mechanism for correct responses,
albeit for a different type of error. When individuals make a correct
response but are not confident in the response, there is a discrep-
ancy between the subjective and objective correctness of their
answers. In other words, low-confidence correct responses reflect
an error of metacognitive monitoring, which in this context refers
to the ability to assess the accuracy of one’s own performance on
a test (Barnes, Nelson, Dunlosky, Mazzoni, & Narens, 1999;
Koriat & Goldsmith, 1996; Nelson & Narens, 1990). Feedback that
confirms the correctness of low-confidence responses should en-
able learners to reduce the discrepancy between their perceived
and actual performance by allowing them to adjust their subjective
assessments of their knowledge. Further, if feedback allows learn-
ers to correct initial metacognitive errors, then it should also
Andrew C. Butler and Henry L. Roediger, III, Department of Psychol-
ogy, Washington University in St. Louis; Jeffrey D. Karpicke, Department
of Psychological Sciences, Purdue University.
This research was supported by a Collaborative Activity Award from the
James S. McDonnell Foundation and Grant R305H030339 from the Insti-
tute of Education Sciences.
Correspondence concerning this article should be addressed to Andrew
C. Butler, Department of Psychology, Campus Box 1125, Washington
University in St. Louis, One Brookings Drive, St. Louis, MO 63139-4899.
E-mail: butler@wustl.edu
Journal of Experimental Psychology: Copyright 2008 by the American Psychological Association
Learning, Memory, and Cognition
2008, Vol. 34, No. 4, 918–928
0278-7393/08/$12.00 DOI: 10.1037/0278-7393.34.4.918
918
enhance long-term retention of the correct responses and improve
the accuracy of metacognitive monitoring on subsequent tests.
Thus, our hypothesis in this research was that, just as feedback
helps correct memory errors, feedback will also help correct meta-
cognitive errors and will improve retention of low-confidence
correct responses.
Before describing the current research, we briefly discuss the
historical origins of the current zeitgeist focused on the role of
feedback in correcting memory errors. We then outline the theo-
retical basis for our reexamination of the role of feedback after
correct responding and review previous research that has investi-
gated the effect of feedback on correct responses.
Feedback as a Mechanism for Correcting Memory Errors
The heavy emphasis on the correction of erroneous responses in
feedback research is in large part a product of the effort to dismiss
the notion that feedback acts as a reinforcer, an idea popular in
earlier literature (e.g., Skinner, 1954). Kulhavy (1977) argued
against any reinforcing quality of feedback by demonstrating that
feedback did not benefit learning in verbal conditioning paradigms
as reinforcement does in other situations. For example, reinforce-
ment increases the future probability of a response and thus should
have its greatest effect on correct responses. In addition, reinforce-
ment is most effective when given immediately after the response.
From an extensive review of the feedback literature, Kulhavy
concluded that there was scant evidence to support the notion that
the principles derived from behavioral research on reinforcement
apply to the provision of feedback on learning of educational
materials in humans. As an example, he cited evidence that de-
layed feedback has greater positive effects than immediate feed-
back in some situations (e.g., Brackbill & Kappy, 1962; Surber &
Anderson, 1975; Sassenrath & Yonge, 1968). Indeed, Kulhavy
argued that the whole idea of conceptualizing the complexities of
learning in educational settings purely within an operant condi-
tioning framework may be of limited utility. Nevertheless, in the
process of eliminating the idea that feedback may be conceived as
a reinforcer, researchers began to overlook how feedback may
benefit correct responses, a trend that continues today.
Feedback as a Mechanism for Correcting Metacognitive
Errors
The impetus for reexamining the function of feedback after
correct responding stems in part from studies that assessed sub-
jects’ confidence in their responses followed by self-paced study
of feedback. Kulhavy, Yekovich, and Dyer (1979) had subjects
complete a program of instruction in which they learned about
heart disease and answered multiple-choice questions on each
section of the tutorial. After each multiple-choice question, sub-
jects were given feedback and allowed to study that feedback for
as long as they wanted. Among other dependent measures, Kul-
havy et al. reported feedback study times as a function of initial
response outcome (correct or incorrect) and response confidence.
After a correct response, subjects studied feedback for a signifi-
cantly longer period of time if that response was made with low
confidence. In addition, feedback study times were roughly equiv-
alent for low-confidence correct responses and low-confidence
incorrect responses. Overall, this study and others (for a review,
see Kulhavy, 1977; Kulhavy & Stock, 1989) showed that subjects
spent a substantial amount of time processing feedback after a
low-confidence correct response.
Why do subjects spend more time processing feedback after a
low-confidence than a high-confidence correct response? One po-
tential explanation is that feedback is processed differently when
there is a large discrepancy between the subjective assessment and
the objective correctness of a response. Consider a test in which
subjects are required to respond to every question on the test (a
forced-response test; cf. Koriat & Goldsmith, 1994; Roediger &
Payne, 1985). For each test question, they retrieve information
from memory and monitor the accuracy of that information (as-
sessed as a confidence judgment), but they are also required to
make a response to each question. On such a test, an individual’s
confidence in his or her responses may not correspond well to the
correctness of the responses, leading to the production of low-
confidence correct responses (Roediger, Wheeler, & Rajaram,
1993) and high-confidence incorrect responses (Butterfield & Met-
calfe, 2001; 2006). When subjects become aware of a metacogni-
tive error through feedback, they may attempt to resolve the
discrepancy between the subjective assessment and the objective
correctness of their response by devoting additional cognitive
resources to processing the feedback.
Research that has examined the role of feedback in correcting
errors committed with high confidence provides some evidence to
support this idea (Butterfield & Metcalfe, 2001, 2006). Butterfield
and Metcalfe (2001) had subjects answer general knowledge ques-
tions and rate their confidence in their initial answers. The subjects
were given feedback following all of their responses on the first
test. After a brief delay, the subjects took a second test over the
general knowledge questions. Butterfield and Metcalfe (2001)
examined the relationship between subjects’ initial confidence in
incorrect responses and the likelihood they would correct those
responses on the final test. They found that subjects were espe-
cially likely to correct high-confidence incorrect responses, a result
they called the “hypercorrection effect” (see also Kulhavy, Yek-
ovich, & Dyer, 1976). In a follow-up study, Butterfield and Met-
calfe (2006) replicated the hypercorrection effect and found that
subjects tend to miss or ignore tones presented in a secondary tone
detection task while processing feedback after high-confidence
errors. They concluded that subjects pay more attention to feed-
back after high-confidence errors because of the surprise of being
highly confident but incorrect, and this additional attention to the
correct response produces better retention (Butterfield & Metcalfe,
2006).
Our hypothesis is that feedback serves to correct the metacog-
nitive error inherent in low-confidence correct responses, much
like it does for high-confidence errors in the hypercorrection
effect. However, we believe that the correction of these two types
of metacognitive error probably leads to better retention through
different mechanisms. As described above, retention may be en-
hanced following high-confidence errors because a feeling of
surprise causes subjects to pay more attention to feedback (But-
terfield & Metcalfe, 2006). In contrast, we think that providing
feedback after low-confidence correct responses might enhance
retention by enabling learners to strengthen the association be-
tween the cue and response and to inhibit any competing re-
sponses. We turn now to summarizing previous research on feed-
back after correct responses; we return to the idea of why
919
CORRECTING A METACOGNITIVE ERROR
providing feedback after low-confidence correct responses might
produce better retention in the General Discussion.
Previous Research on the Effect of Feedback on Correct
Responses
Previous researchers have typically treated the effect of feed-
back on correct responses as a secondary concern rather than as the
focus of their research efforts. These researchers have also widely
concluded that when a correct response is produced, feedback
makes little or no difference for learning (Anderson, Kulhavy, &
Andre, 1971; Guthrie, 1971; Kulhavy & Anderson, 1972; Pashler,
Cepeda, Wixted, & Rohrer, 2005; Pashler, Rohrer, Cepeda, &
Carpenter, 2007). For example, Pashler et al. (2005) had subjects
study a list of 20 Luganda–English word pairs twice and then take
two consecutive cued-recall tests during an initial learning session.
Of importance, subjects were free to withhold responses on the
initial tests (a point we elaborate below). Subjects made confi-
dence ratings after each response and were given feedback for
some of the items and no feedback for other items. Finally,
long-term retention was measured on a final cued-recall test 1
week later. When final test performance was examined as a func-
tion of whether responses on the first test were correct or incorrect,
Pashler et al. (2005) found that feedback did not enhance the
retention of correct responses, even those made with medium or
low confidence, using their procedure. They concluded, “When the
learner makes a correct response, feedback makes little difference
for what can be remembered 1 week later” (Pashler et al., 2005, p.
7; see, too, Pashler et al., 2007).
However, a careful consideration of the methodology used in the
experiment by Pashler et al. (2005), and in other studies, raises the
question of whether the experiments were capable of properly
assessing the potential benefit of providing feedback after correct
responses. We argue that previous researchers may have failed to
find benefits of feedback for correct responses because few (if any)
low-confidence correct responses were made on the initial free-
report tests used in prior experiments. On a free-report test, when
subjects can choose to volunteer or withhold responses, there is a
strong correspondence between subjective confidence (metacogni-
tive monitoring) and the willingness to volunteer a response (meta-
cognitive control). That is, subjects generally volunteer high-
confidence responses and withhold low-confidence responses,
even if the low-confidence responses are correct (see Barnes et al.,
1999; Kelley & Sahakyan, 2003; Koriat & Goldsmith, 1996). In
the previous research discussed above, subjects were free to with-
hold responses to test items, and, therefore, low-confidence re-
sponses were likely withheld even if they were in fact correct.
Presumably, the previous studies used free-report tests because
none of them was designed with the specific intention of examin-
ing the effects of feedback on low-confidence correct responses.
Although multiple-choice and cued-recall tests are generally
forced and free report, respectively, it is important to note that
either report option can be used with either test format. Thus,
report option is the critical variable, not test format. In the present
research, we used a multiple-choice test with a forced-responding
procedure in which subjects were required to respond to each test
question and to indicate their confidence in their responses. This
procedure ensured that subjects would produce low-confidence
responses on the initial test.
Experiment 1
In Experiment 1, subjects took a multiple-choice general knowl-
edge test on which they received feedback for half of the questions
(test with feedback condition) and no feedback on the other half of
the questions (test with no feedback condition). Subjects were
required to make a response to each question (a forced-report test),
and they then made a confidence judgment after each response
(always before receiving any feedback). After a brief distracter
task, subjects took a final cued-recall test, in which they answered
the general knowledge questions but without the aid of response
alternatives. The final test included previously tested items and
new items as a no-test control condition. Of specific interest was
the effect of feedback on retention of low-confidence correct
responses.
Method
Subjects. Thirty undergraduate psychology students at Wash-
ington University in St. Louis participated for course credit. All
subjects were treated in accordance with the “Ethical Principles of
Psychologists and Code of Conduct” put forth by the American
Psychological Association (2002).
Materials and counterbalancing. Stimuli consisted of general
knowledge questions that were created from 60 facts taken out of
the World Book Encyclopedia (World Book, Inc., 2002). Test
items were constructed from the facts by forming a question stem
and target response (e.g., What is the longest river in the world?
Answer: the Nile river). For the purposes of the multiple-choice
test, three plausible lure responses were generated for each test
item and paired with the target to form a four-alternative multiple-
choice test.
The experiment was counterbalanced in two ways. First, the
questions were separated into three groups and each group of
questions appeared in each of the three learning conditions (no test,
test with no feedback, test with feedback) equally across subjects.
To accomplish this counterbalance, the groups of questions were
rotated through the learning conditions, creating three versions of
the experiment. Second, the position of the correct answer relative
to the lures was systematically varied such that the target appeared
equally often in each of the four possible positions across all the
questions on each version of the multiple-choice test.
Procedure. Subjects were tested in groups of 1 to 5 people.
The stimuli were presented, and responses were collected individ-
ually on a PC using E-Prime software (Schneider, Eschman, &
Zuccolotto, 2002). First, a self-paced multiple-choice test was
given in which 40 questions were presented sequentially in a
random order determined by the computer. Each question was
displayed on the top of the screen with four alternative answers
below it. Subjects were required to respond to each question (i.e.,
forced report) by pushing the button of the number that corre-
sponded to the correct answer (1,2,3, or 4). After responding to
the question, subjects were prompted to rate their confidence in the
response on the following 4-point scale: 1 !guess,2!low
confidence,3!medium confidence, and 4 !high confidence.
Immediately after choosing an answer and rating their confidence,
subjects either received feedback on their answer or were pre-
sented with a screen that instructed them to wait for the next
question. Feedback consisted of a re-presentation of the multiple-
920 BUTLER, KARPICKE, AND ROEDIGER
choice question stem along with the correct response. The feed-
back and wait instructions were both displayed for 4 s, so that total
time spent on each question was equated in the test with feedback
and test with no feedback conditions. After the multiple-choice
test, subjects played a computer game for 5 min as a filler task.
Finally, a cued-recall test was given in which the complete set of
60 questions (40 from the initial multiple-choice test plus 20
untested questions) were tested. Again, the questions were pre-
sented in a random order determined by the computer and answer-
ing was self-paced. Subjects were told to type in the correct answer
to each question but were warned to respond only if they were
reasonably sure that the answer was correct. Thus, the final test
was free report, not forced report. If they did not know the correct
answer, then they were instructed to push the Enter key to skip that
question. After the cued-recall test was complete, subjects were
debriefed and dismissed.
Results
All results, unless otherwise stated, were significant at the .05
level. Pairwise comparisons were Bonferroni-corrected to the .05
level. In the analysis of repeated measures, the Geisser
Greenhouse epsilon correction was used for violations of the
sphericity assumption (Geisser & Greenhouse, 1958).
Initial multiple-choice test. Performance on the initial
multiple-choice test was equivalent in the test with no feedback
and test with feedback conditions (.52 vs. .55; t"1), which was
expected because no manipulation had been introduced yet.
Final cued-recall test. There were large effects of testing and
feedback on recall on the final test (see the left panel of Figure 1).
The test with feedback condition produced a greater proportion of
correct responses on the final test than the test with no feedback
condition (.87 vs. .41), t(29) !19.1, SEM !.024, d!1.77, p
rep
!
1.00 ( p
rep
is an estimate of the probability of replicating the
direction of an effect; see Killeen, 2005), which, in turn, led to a
greater proportion of correct responses than the no-test condition
(.41 vs. .24), t(29) !7.5, SEM !.023, d!1.06, p
rep
!1.00. A
one-way repeated-measures analysis of variance (ANOVA) re-
vealed a significant difference among the three learning condi-
tions, F(2, 58) !355.5, MSE !.009, #
p
2
!.93. Thus, we
observed a strong testing effect even without feedback, but it was
greatly enhanced when feedback was given.
Conditional analyses. Conditional analyses were performed to
examine performance on the final cued-recall test as a function of
(a) the response outcome on the initial multiple-choice test (correct
or incorrect), (b) the presence or absence of feedback, and (c) the
level of confidence in the initial response. For each subject, per-
formance on the final cued-recall test in the test with no feedback
and test with feedback conditions was broken down as a function
of response outcome on the initial multiple-choice test. The left
panel of Figure 2 shows the proportion of correct responses on the
final cued-recall test as a function of initial response outcome. As
expected, initially incorrect responses benefited substantially from
feedback. When feedback was provided, most of the initially
incorrect responses were corrected, whereas few initially incorrect
responses were (spontaneously) corrected on the final test without
feedback (.82 vs. .03), t(29) !34.4, SEM !.023, d!6.08, p
rep
!
1.00. It is important to note that feedback also benefited correct
responses: A greater proportion of initially correct responses were
reproduced on the final test when they had been followed by
feedback than when they had been followed by no feedback (.93
vs. .79), t(29) !4.4, SEM !.032, d!0.76, p
rep
!.99.
The conditionalized results were further partitioned as a func-
tion of the confidence rating (“guess,” “low confidence,” “medium
confidence,” “high confidence”) given after each question on the
initial multiple-choice test. On the basis of the literature reviewed
above, one concern was whether the forced-report procedure we
used would produce a good distribution of confidence responses
and, in particular, a sufficient number of low-confidence correct
responses. Table 1 shows the proportion of items (averaged across
subjects) that were assigned to each confidence rating as a function
of response outcome (correct or incorrect) and initial learning
condition (test with no feedback or test with feedback). Overall,
the items were well distributed across the four confidence levels,
with 38% and 40% of correct responses assigned ratings of
“guess” or “low confidence” in the no feedback and feedback
conditions, respectively.
Figure 1. Proportion of correct responses on the final cued-recall test as a function of initial learning condition
for Experiment 1 (left panel) and Experiment 2 (right panel). Error bars indicate the standard error of the mean.
921
CORRECTING A METACOGNITIVE ERROR
The key results of Experiment 1 are shown in Figure 3, which
depicts the proportion correct on the final cued-recall test for
initially correct responses as a function of initial response
confidence, learning condition, and initial response outcome.
Focusing first on the initially incorrect responses (the left panel
of Figure 3), the proportion of correct responses on the final test
generally did not differ as a function of response confidence,
regardless of whether feedback was provided. There was, how-
ever, one exception: When feedback was given, a significantly
greater proportion of high-confidence incorrect responses were
corrected relative to all other confidence levels (.93 vs. .80),
t(29) !3.4, SEM !.035, d!0.59, p
rep
!.99, replicating the
hypercorrection effect that has been found in previous research
(Butterfield & Metcalfe, 2001, 2006; see also Kulhavy et al.,
1976).
A different pattern of results emerged for initially correct re-
sponses (the right side of Figure 3). As described above, a greater
proportion of initially correct responses were maintained to the
final cued-recall test when feedback was provided relative to no
feedback. Figure 3 also shows that a greater proportion of correct
responses were maintained as the initial response confidence in-
creased in both the test with no feedback and test with feedback
conditions. In addition, these two factors interacted such that the
test with feedback condition produced a greater proportion of
correct responses than the test with no feedback condition at every
confidence level except high confidence (which approached ceil-
ing). The difference between the two learning conditions increased
as response confidence decreased. For initial responses labeled as
a “guess” (the lowest level of confidence), .85 were maintained
with feedback whereas only .40 were maintained without feed-
back. A 4 $2 repeated-measures ANOVA confirmed these ob-
servations: There were significant main effects of initial learning
condition, F(1, 29) !42.4, MSE !.054, #
p
2
!.59, and response
confidence, F(3, 87) !27.4, MSE !.059, #
p
2
!.49, as well as
Figure 2. Proportion of correct responses on the final cued-recall test as a function of response outcome
(correct or incorrect) on the initial multiple-choice (MC) test and learning condition for Experiment 1 (left panel)
and Experiment 2 (right panel). Error bars indicate the standard error of the mean.
Table 1
Proportion of Items (With Total Number of Items in Parentheses) Assigned Each Confidence Rating on the Initial Multiple-Choice
Test as a Function Response Outcome and Initial Learning Condition
Experiment and confidence
Correct on multiple choice Incorrect on multiple choice
Test with no feedback Test with feedback Test with no feedback Test with feedback
Experiment 1
Guess .18 (52) .20 (66) .44 (133) .41 (116)
Low .20 (61) .20 (62) .26 (79) .34 (97)
Medium .24 (73) .22 (69) .20 (60) .19 (55)
High .38 (115) .38 (120) .10 (27) .06 (15)
Total 1.00 (301) 1.00 (317) 1.00 (299) 1.00 (283)
Experiment 2
Guess .10 (33) .11 (33) .32 (91) .29 (87)
Low .25 (80) .22 (66) .39 (112) .49 (145)
Medium .13 (39) .18 (55) .18 (51) .14 (41)
High .52 (164) .49 (149) .11 (30) .08 (24)
Total 1.00 (316) 1.00 (303) 1.00 (284) 1.00 (297)
Note. Confidence responses are binned for Experiment 2 (.25 !guess, .26 –.50 !low confidence, .51–.75 !medium confidence, and .76 –1.00 !high
confidence).
922 BUTLER, KARPICKE, AND ROEDIGER
a significant interaction between learning condition and response
confidence, F(3, 87) !17.9, MSE !.042, #
p
2
!.38.
We carried out one final analysis to examine the effect of
feedback on the relationship between confidence and memory
performance. Specifically, we asked how confidence on the initial
multiple-choice test is related to the production of answers on the
final cued-recall test and how providing feedback affects this
relationship. To address these questions, we computed the within-
subject Goodman–Kruskal gamma correlations between (a)
multiple-choice performance and initial confidence and (b) initial
confidence and final cued recall. Not surprisingly, the gamma
correlations between initial multiple-choice performance and ini-
tial confidence were nearly identical in the test with feedback and
test with no feedback conditions (.58 vs. .55; t"1, SEM !.063,
p!.63). In contrast, providing feedback significantly reduced the
gamma correlation between initial confidence and final cued re-
call, relative to the test with no feedback condition (.70 vs. .40),
t(25) !2.9, SEM !.101, d!0.75, p
rep
!.97.
1
This result
indicates that when subjects did not receive feedback after the
initial multiple-choice test, final test performance corresponded
well with initial confidence. However, providing feedback allowed
subjects to correct erroneous responses and maintain correct re-
sponses, thereby reducing the relationship between initial response
confidence and final cued recall.
Discussion
The results of Experiment 1 show that taking an initial multiple-
choice test led to better performance on the final cued-recall test
relative to not taking the test and that providing feedback after the
initial test substantially increased the benefit of prior testing. Of
more importance for present purposes, the conditional analyses
revealed that both initially incorrect and correct responses bene-
fited from feedback. After making an incorrect response, subjects
used feedback to learn the correct response. When feedback was
not provided after an incorrect response, the response almost
always remained incorrect on the final test. This pattern of results
did not differ as a function of the level of initial response confi-
dence, except for the high-confidence incorrect responses, which
were hypercorrected when feedback was provided.
The novel result of Experiment 1 was that feedback increased
retention of initially correct responses. When feedback was not
provided, initially correct responses were more likely to be
changed or omitted on the final test. The level of response confi-
dence on the initial multiple-choice test modulated this pattern of
results. Almost all high-confidence correct responses were main-
tained to the final test, regardless of whether feedback was pro-
vided. However, as the level of initial response confidence de-
creased, feedback became increasingly important for maintaining
correct responses on the final test. Providing feedback doubled
retention of initially correct “guess” responses, relative to when
feedback was not provided (.85 vs. .40).
Experiment 2
In contrast with previous studies, the results of Experiment 1
showed that feedback is critical for retention of low-confidence
correct responses. Experiment 2 was conducted to replicate the
results of Experiment 1 and to further investigate whether feed-
back benefits low-confidence correct responses. As described in
the introduction, our hypothesis is that feedback enables subjects
to correct a metacognitive error that occurs when the subjective
correctness of a response (assessed by a confidence rating) does
not correspond with its objective correctness. On the basis of our
hypothesis, we predicted that feedback should not only increase
retention of low-confidence correct responses but also improve the
accuracy of confidence judgments made during a final delayed
test. Thus, in Experiment 2, subjects were required to respond to
each question on the final cued-recall test and to make confidence
judgments for each response. Our prediction was that feedback
would enhance the accuracy of metacognitive monitoring during
the final test.
The procedure was the same as in Experiment 1 except for the
following changes. First, on the final cued-recall test, subjects
were required to respond to each test item and to make a confi-
dence judgment for each item. Second, subjects made their confi-
dence judgments on scales using cardinal values. On the initial
four-alternative multiple-choice test, subjects rated their confi-
dence on a scale from 25–100%, and on the final cued-recall test,
subjects rated their confidence on a scale from 0 –100%. This was
done so that we could assess calibration (the absolute correspon-
dence between test performance and confidence) in addition to
resolution (the relative correspondence between performance and
confidence; for elaboration, see Koriat & Goldsmith, 1996; Nel-
son, 1984; Nelson & Dunlosky, 1991). Finally, the retention in-
terval before the final test was lengthened to 2 days so that we
could generalize the results we observed on a relatively immediate
test in Experiment 1 to a delayed final test in Experiment 2.
Method
Subjects. Thirty undergraduate psychology students at Wash-
ington University in St. Louis participated for course credit.
1
Four subjects were excluded from this analysis because a gamma
correlation could not be calculated for one of the two feedback conditions.
Figure 3. Proportion of correct responses on the final cued-recall test in
Experiment 1 as a function of response confidence and learning condition
for responses that were incorrect (left side) and correct (right side) on the
initial multiple-choice (MC) test.
923
CORRECTING A METACOGNITIVE ERROR
Materials and counterbalancing. The materials and counter-
balancing scheme from Experiment 1 were used.
Procedure. The procedure was identical to that used in Exper-
iment 1 with the following exceptions. First, the final cued recall
was changed to a forced-report test, in which subjects produced a
response and made a confidence rating for every question. Second,
subjects made their confidence judgments on the initial multiple-
choice test on a scale from 25–100%, where 25% represented no
confidence and 100% represented complete confidence. Subjects
were told that 25% represented guessing because chance proba-
bility of a correct response in a four-alternative multiple-choice
test is 25%. On the final cued-recall test, subjects made their
confidence judgments on a scale from 0 –100%, where 0% repre-
sented guessing. Finally, subjects were dismissed after completing
the initial multiple-choice test and returned 2 days later for the
final recall test.
Results
Initial multiple-choice test. Performance on the initial
multiple-choice test was virtually identical in the test with no
feedback and test with feedback conditions (.53 vs. .51; t"1).
Final cued-recall test. As in Experiment 1, testing improved
retention relative to not taking an initial test, and testing with
feedback produced better retention than testing without feedback
(see the right panel of Figure 1). A one-way repeated-measures
ANOVA showed a significant difference among the learning con-
ditions, F(2, 58) !207.1, MSE !.009, #
p
2
!.88. As in Exper-
iment 1, the test with feedback condition produced a greater
proportion of correct responses relative to the test with no feed-
back condition (.83 vs. .47), t(29) !15.9, SEM !.023, d!2.89,
p
rep
!1.00, which in turn produced a greater proportion of correct
responses than the no-test condition (.47 vs. .33), t(29) !5.5,
SEM !.026, d!1.00, p
rep
!.99.
Conditional analyses. Performance on the final cued-recall
test in the test with no feedback and test with feedback conditions
was broken down as a function of response outcome on the initial
multiple-choice test for each subject. The right panel of Figure 2
shows the proportion of correct responses on the final cued-recall
test as a function of initial response outcome. The pattern of results
was the same as in Experiment 1. A substantially greater propor-
tion of initially incorrect responses were corrected with feedback
relative to without feedback (.73 vs. .09), t(29) !14.9 SEM !
.043, d!2.70, p
rep
!1.00, and a greater proportion of initially
correct responses were maintained with feedback than without
feedback (.93 vs. .78), t(29) !5.3, SEM !.028, d!0.99, p
rep
!
.995.
The conditionalized data were further analyzed as a function of
initial response confidence. For the purpose of data presentation
and some analyses, the confidence responses have been separated
into four bins that roughly corresponded to the categories used in
Experiment 1 (.25 !guess, .26 –.50 !low confidence, .51–.75 !
medium confidence, and .76 –1.00 !high confidence). Table 1
shows the proportion of items (averaged across subjects) that were
assigned to each confidence rating as a function of response
outcome (correct or incorrect) and initial learning condition (test
with no feedback or test with feedback). As in Experiment 1, the
items were well distributed across the four feedback levels and at
least a third of the correct responses were assigned a rating of
“guess” (.25) and “low confidence” (.26 –.50).
Figure 4 shows the proportion of correct responses on the final
cued-recall test as a function of initial response confidence, learn-
ing condition, and initial response outcome. Despite a number of
differences in the procedures of Experiments 1 and 2, including the
forced-response procedure on the final test and the 2-day delay
used in Experiment 2, the overall pattern of results was similar to
that of Experiment 1. When a response was answered incorrectly
on the initial multiple-choice test and feedback was not given, it
was unlikely for that response to be (spontaneously) corrected on
the final cued-recall test. The one exception was the incorrect
responses that were made with a confidence of .25 (“guess”).
These responses were more likely to be corrected on the final test
relative to responses in the other three confidence bins (.22 vs.
.04), t(29) !3.3, SEM !.055, d!1.00, p
rep
!.98. When forced
to respond on the final test, subjects may have decided to switch
their response to another alternative (whereas they might have
omitted the response rather than switch in Experiment 1). In
contrast to the hypercorrection effect observed in Experiment 1,
initial response confidence did not influence the correction of
errors when feedback was provided in Experiment 2, F(3, 87) !
1.2, MSE !.076, p!.31. Even when the analysis was restricted
to initially incorrect items that were given 100% confidence judg-
ment, no hypercorrection effect emerged. In fact, as Figure 4
shows, if anything, a hypocorrection effect seems to appear after 2
days because a somewhat lower proportion of high-confidence
incorrect responses were corrected relative to all other confidence
levels; however, this difference was not significant (.62 vs. .73),
t(29) !1.7, SEM !.065, p!.10. We address this failure to find
the hypercorrection effect and the possibility of a hypocorrection
effect below.
When a response was answered correctly on the initial multiple-
choice test and feedback was not given, the pattern of performance
Figure 4. Proportion of correct responses on the final cued-recall test in
Experiment 2 as a function of response confidence and learning condition
for responses that were incorrect (left side) and correct (right side) on the
initial multiple-choice (MC) test. Confidence responses have been sepa-
rated into four bins that roughly corresponded to the categories used in
Experiment 1 (.25 !guess, .26–.50 !low confidence, .51–.75 !medium
confidence, and .76 –1.00 !high confidence).
924 BUTLER, KARPICKE, AND ROEDIGER
depended on the level of initial response confidence. As the level
of confidence increased, a greater proportion of initially correct
responses were maintained on the final test. In contrast, when
feedback was provided after a correct response, that response
tended to be maintained regardless of initial response confidence.
Thus, the conditionalized data for initially correct responses ex-
hibited the same sort of interaction as in Experiment 1. To confirm
these observations, we conducted a 4 $2 repeated-measures
ANOVA. The ANOVA confirmed significant main effects of
initial learning condition, F(1, 29) !57.4, MSE !.044, #
p
2
!.66,
and response confidence, F(3, 87) !29.1, MSE !.051, #
p
2
!.50,
as well as a significant interaction between learning condition and
response confidence, F(3, 87) !6.3, MSE !.064, #
p
2
!.18.
The effect of feedback on the relation between confidence and
memory performance. The final set of analyses examined the
effects of feedback on (a) the accuracy of initial confidence judg-
ments relative to initial multiple-choice performance, (b) the rela-
tion between initial confidence and final test performance, and (c)
the accuracy of confidence judgments made during the final cued-
recall test.
As in Experiment 1, we first examined the relationship between
initial confidence judgment and initial multiple-choice test perfor-
mance (see Figure 5, Panel A). On the initial multiple-choice test,
as expected, the mean gamma correlation (or resolution) was
roughly equivalent in the test with no feedback and test with
feedback conditions (.55 vs. .59), t(29) !0.6, SEM !.063, p!
.54. Subjects exhibited overconfidence in both the test with no
feedback (M
Confidence
!.61, M
Correct
!.53) and the test with
feedback (M
Confidence
!.59, M
Correct
!.51) conditions, but there
was no difference in overconfidence between these two conditions.
A2$2 ANOVA confirmed that absolute confidence judgments
were significantly higher than multiple-choice accuracy (.60 vs.
.52), F(1, 29) !28.6, MSE !.007, #
p
2
!.50, but there was no
interaction (F"1, MSE !.005, p!.82).
Gamma correlations were again computed to investigate the
relationship between initial confidence judgments and final cued-
recall test performance (see Figure 5, Panel B). As in Experiment
1, the no-feedback condition produced a significantly higher
gamma correlation than did the feedback condition (.57 vs. .38),
t(25) !1.86, SEM !.104, d!0.47, p
rep
!.89.
2
However, on the final cued-recall test (see Figure 5, Panel C), a
different pattern emerged: Resolution was significantly better in
the test with feedback condition relative to both the test with no
feedback (.94 vs. .67), t(28) !5.1, SEM !.054, d!0.92, p
rep
!
.99, and no-test (.94 vs. .69), t(28) !6.3, SEM !.041, d!0.98,
p
rep
!1.00, conditions.
3
On the final cued-recall test, there was a
difference among the conditions: Subjects in the test with no
feedback (M
Confidence
!.56, M
Correct
!.47) and the no-test
(M
Confidence
!.38, M
Correct
!.33) conditions both showed over-
confidence, whereas those in the test with feedback condition
(M
Confidence
!.83, M
Correct
!.82) were almost perfectly cali-
brated. A 3 $2 ANOVA revealed a significant interaction, F(1,
58) !10.1, MSE !.003, #
p
2
!.26.
Discussion
Overall, Experiment 2 replicated and extended the main results
of Experiment 1 after a 2-day retention interval, providing gener-
ality to the feedback effect along this dimension. Taking a prior
test led to better performance on the final test relative to the no-test
control, and feedback increased the benefit of prior testing. Again,
the benefit of feedback stemmed from both the correction of
erroneous responses and the confirmation of low-confidence cor-
rect responses. Experiment 2 also showed that when feedback was
provided on the initial test, subjects were better able to discrimi-
nate between correct and incorrect responses on the final test. The
improvement in the accuracy of metacognitive judgments in the
feedback condition supports the idea that feedback helps to elim-
inate the discrepancy between perceived and actual correctness for
low-confidence correct responses.
General Discussion
In summary, the results of both experiments show that taking an
initial multiple-choice test produced superior performance on a
subsequent cued-recall test and that this benefit of prior testing was
further enhanced when feedback was provided. When an incorrect
response was given on the initial multiple-choice test, it was
unlikely to be corrected spontaneously on the final cued-recall test
without the presentation of feedback. Thus, consistent with con-
siderable prior research, we found that feedback helps learners
correct memory errors. Of more importance, the results of the
present experiments demonstrate that correct responses benefited
from feedback, and this positive effect of feedback was greatest for
low-confidence correct responses. Thus, feedback also helps learn-
ers correct the metacognitive error that occurs when they are
correct on an initial test but lack confidence in their response,
resulting in enhanced retention of low-confidence correct re-
sponses. Experiment 2 also showed that feedback produced a
metacognitive benefit on the final test by improving resolution and
calibration of confidence judgments made on the final test.
2
Four subjects were excluded from this analysis because a gamma
correlation could not be calculated for one of the two feedback conditions.
3
One subject was excluded from this analysis because a gamma corre-
lation could not be calculated for one of the two feedback conditions.
Figure 5. Mean gamma correlations between initial confidence judg-
ments and initial multiple-choice performance (A), initial confidence judg-
ments and final cued-recall performance (B), and final confidence judg-
ments and final cued-recall performance (C). All results are from
Experiment 2. Error bars indicate the standard error of the mean.
925
CORRECTING A METACOGNITIVE ERROR
Overall, the results obtained in these two experiments confirm
several points uncovered in previous research. Many studies have
found that taking a prior test improves performance on a future test
(e.g., Carpenter & DeLosh, 2006; Carpenter & Pashler, 2007;
Roediger & Karpicke, 2006b; Wheeler & Roediger, 1992; for a
review, see Roediger & Karpicke, 2006a) and that providing
feedback after an initial test enhances subsequent test performance
(e.g., Butler, Karpicke, & Roediger, 2007; Butler & Roediger,
2008; Karpicke & Roediger, 2007; McDaniel & Fisher, 1991;
Pashler et al., 2005). In addition, several studies have found that
providing feedback after incorrect responses is critical to correct-
ing errors (e.g., Lhyle & Kulhavy, 1987; see Bangert-Drowns et
al., 1991), especially those errors committed with high confidence
(Butterfield & Metcalfe, 2001, 2006). This previous research has
largely emphasized the role of feedback in correcting erroneous
responses.
The present research provides two novel findings. First, provid-
ing feedback for low-confidence correct responses on an initial
multiple-choice improved recall on a final test given either a few
minutes or 2 days later. As explained in the introduction, this
outcome differs from the findings reported in previous studies
(e.g., Pashler et al., 2005), which have led to the conclusion that
providing feedback after correct responses has no effect. There are
many methodological differences between previous studies and the
present research, but we argue that the key factor is whether
subjects are required to respond to every item on an initial test
(forced responding). When the initial test is free report, as was the
case in many prior studies, subjects are likely to withhold low-
confidence responses, even if they are correct (cf. Barnes et al.,
1999; Koriat & Goldsmith, 1996). Therefore, when subjects are
free to withhold low-confidence responses, no effect of feedback
should be observed. In addition to forced responding, our proce-
dure included many other features that increased the number of
low-confidence correct responses. For example, the four-
alternative multiple-choice format gave subjects at least a 25%
chance of guessing the correct response. This procedure succeeded
in producing a relatively large proportion of low-confidence cor-
rect responses (at least one third of all responses in each experi-
ment). The finding that low-confidence correct responses benefit
from feedback is also consistent with studies in which feedback
study time is allowed to vary. For example, test takers generally
view feedback on low-confidence correct responses for more time
than they do after high-confidence correct responses (e.g., Kulhavy
et al., 1976, 1979; Webb, Pridemore, Stock, Kulhavy, & Henning,
1997). Taken as a whole, these findings show that learners utilize
feedback after low-confidence correct responses to improve sub-
sequent retention.
Why does feedback enhance retention for low-confidence cor-
rect responses? As we briefly described in the introduction, pro-
viding feedback after low-confidence correct responses might en-
hance retention in two ways: (a) strengthening the association
between the cue and response and (b) inhibiting competing re-
sponses. To understand why this might be true, it helps to consider
why low-confidence correct responses are produced. Some re-
searchers consider all low-confidence responses, whether correct
or incorrect, to represent a situation in which the subject has
insufficient knowledge of the material and thus would benefit
more from further instruction than from feedback (Kulhavy, 1977;
Kulhavy & Stock, 1989). Certainly, low-confidence correct re-
sponses can be lucky guesses, especially on a multiple-choice test
where the chance of selecting the correct response is often 20% or
greater. However, there are at least two other possible causes.
First, subjects might produce a correct response based on partial
knowledge and/or familiarity but not be confident that it is the
correct response. In this situation, the correct response is already
known, and therefore what is needed is for the association between
the response and the cue to be strengthened. Second, subjects
might give a correct response a low-confidence judgment because
they had trouble choosing between two equally attractive re-
sponses but happened to volunteer the correct one. In this situation,
the association between the cue and correct response must be
strengthened, but the competing response must be also inhibited.
In both situations, feedback is critical because it first corrects the
metacognitive error and then enables the subject to engage the
appropriate mechanisms to enhance retention.
The second novel finding from our experiments was that pro-
viding feedback after the initial multiple-choice test enhanced the
accuracy of confidence judgments on the final test. Subjects were
better able to discriminate between correct and incorrect responses
on the final test if they had been given feedback on the prior test.
This improvement in metacognitive monitoring was evident in
both global (overall mean proportion of correct responses and
confidence judgments) and relative (item-by-item correspondence
between confidence judgments and the proportion correct) assess-
ments of metacognitive accuracy. Previous studies that have in-
vestigated the effect of feedback on subsequent confidence judg-
ments have found that feedback can improve both calibration
(Lichtenstein & Fischoff, 1980) and resolution (Sharp, Cutler, &
Penrod, 1988). However, these studies differ from ours in that
feedback was provided in the form of a global assessment of
performance (e.g., overall proportion correct) rather than for each
item. Presumably, such global feedback might lead subjects to
change their overall pattern of responding on future tests (e.g.,
being more conservative in their confidence judgments). Although
feedback on individual responses might have produced overall bias
in future metacognitive judgments in our study, it seems more
likely that the improvement in metacognitive accuracy is the result
of eliminating any discrepancy between perceived and actual cor-
rectness of responses.
Finally, it is interesting to note that in Experiment 1, we ob-
served the hypercorrection effect (Butterfield & Metcalfe, 2001,
2006), wherein high-confidence errors were more likely to be
corrected on a final test than low-confidence errors. However, we
did not observe the effect in Experiment 2, and there was a
numerical trend toward a hypocorrection effect in which high-
confidence errors were less likely to be corrected relative to
low-confidence errors. This result suggests that the hypercorrec-
tion effect may be relatively transient. All the studies that have
reported a hypercorrection effect have used brief retention inter-
vals (e.g., 5 min; Butterfield & Metcalfe, 2001). Studies that have
used longer retention intervals (e.g., 1 week; Pashler et al., 2005)
have failed to find the effect, as did we after a 2-day retention
interval (but see Kulhavy et al., 1976). The transience of the
hypercorrection effect may be the result of the gradual recovery of
the original error response as the retention interval increases,
similar to the recovery of the A–B pair (and extinction of the A–C
pair) in the classic retroactive inference paradigm (Briggs, 1954).
Nevertheless, relatively few high-confidence errors were produced
926 BUTLER, KARPICKE, AND ROEDIGER
in the current experiment and thus this finding should be inter-
preted with caution. Of course, in both experiments we showed
that feedback increased retention of low-confidence correct re-
sponses, regardless of the retention interval before the final test.
The present results have implications for the importance of
providing feedback in educational settings. Many of the methods
commonly used in education undermine the potential benefits of
testing. A prime example is the variable sorts of feedback provided
after classroom tests. As shown in many studies, feedback is a
critical aspect to learning, but instructors’ policies in providing it
vary considerably, ranging from comprehensive feedback after
each testing occasion to little or no feedback at all. The latter
situation is increasingly prevalent in university settings, where
large class sizes and repeated teaching responsibilities lead edu-
cators to retain completed examinations to guard their test banks.
Nevertheless, when feedback is given (i.e., other than a grade or
numerical score), the focus is generally on incorrect answers.
Students tend to look for the red ink and concentrate on figuring
out why they got the answer wrong. The current research suggests
that educators should try to give comprehensive feedback (i.e., on
both correct and incorrect answers) whenever possible. Such feed-
back may be particularly important after multiple-choice tests that
expose test takers to incorrect information in the form of lures.
Attractive alternatives can lead test takers to change their response
on a later test (Higham & Gerrard, 2005) and to endorse a lure on
an initial test often resulting in it being produced on a subsequent
test (Butler, Marsh, Goode, & Roediger, 2006; Butler & Roediger,
2008; Roediger & Marsh, 2005).
Finally, it is worth noting that the tests used in most educational
settings are essentially forced report, regardless of test format (e.g.,
multiple-choice, short answer, and so forth). There is usually no
penalty for an incorrect response, which encourages students to
provide an answer to every question, even if they have to guess, to
maximize their score on the test. (An exception is standardized
tests, like the SAT, which penalize students for incorrect responses
by deducting points.) The use of such a strategy by students is
likely to be even more prevalent in multiple-choice testing, where
there is a good chance of guessing the correct answer. Thus, one
could argue that the use of an initial forced-report test in the
present research is more consistent with the methods used in
education than the initial free-report tests used in most of the
previous studies that have investigated the effect of feedback on
initially correct responses.
In conclusion, the current experiments provide clear evidence
that low-confidence correct responses do benefit from feedback
and that feedback improves students’ metacognitive judgments
about their knowledge. Taken together, the two novel findings
support the idea that a low-confidence correct response represents
an error in metacognitive monitoring that can be corrected through
feedback. Providing feedback after low-confidence correct re-
sponses enables learners to eliminate the discrepancy between
perceived and actual correctness of the response. Feedback after
both correct and incorrect responses on tests is a critical aspect of
learning.
References
American Psychological Association. (2002). Ethical principles of psy-
chologists and code of conduct. Retrieved June 1, 2003, from http://
www.apa.org/ethics/code2002.html
Anderson, R. C., Kulhavy, R. W., & Andre, T. (1971). Feedback proce-
dures in programmed instruction. Journal of Educational Psychology,
62, 148 –156.
Azevedo, R., & Bernard, R. M. (1995). A meta-analysis of the effects of
feedback in computer-based instruction. Journal of Educational Com-
puting Research, 13, 111–127.
Bangert-Drowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. (1991).
The instructional effect of feedback in test-like events. Review of Edu-
cational Research, 61, 213–238.
Barnes, A. E., Nelson, T. O., Dunlosky, J., Mazzoni, G., & Narens, L.
(1999). An integrative system of metamemory components involved in
retrieval. In D. Gopher & A. Koriat (Eds.), Attention and performance
XVII: Cognitive regulation of performance: Interaction of theory and
application (pp. 287–313). Cambridge, MA: MIT Press.
Brackbill, Y., & Kappy, M. S. (1962). Delay of reinforcement and reten-
tion. Journal of Comparative and Physiological Psychology, 55, 14 –18.
Briggs, G. E. (1954). Acquisition, extinction, and recovery functions in
retroactive inhibition. Journal of Experimental Psychology, 47, 285–
293.
Butler, A. C., Karpicke, J. D., & Roediger, H. L., III. (2007). The effect of
type and timing of feedback on learning from multiple-choice tests.
Journal of Experimental Psychology: Applied, 13, 273–281.
Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III (2006).
When additional multiple-choice lures aid versus hinder later memory.
Applied Cognitive Psychology, 20, 941–956.
Butler, A. C., & Roediger, H. L., III (2007). Testing improves long-term
retention in a simulated classroom setting. European Journal of Cogni-
tive Psychology, 19, 514 –527.
Butler, A. C., & Roediger, H. L., III (2008). Feedback enhances the
positive effects and reduces the negative effects of multiple-choice
testing. Memory & Cognition, 36, 604 – 616.
Butterfield, B., & Metcalfe, J. (2001). Errors committed with high confi-
dence are hypercorrected. Journal of Experimental Psychology: Learn-
ing, Memory, and Cognition, 27, 1491–1494.
Butterfield, B., & Metcalfe, J. (2006). The correction of errors committed
with high confidence. Metacognition and Learning, 1, 69 – 84.
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support
enhances subsequent retention: Support for the elaborative retrieval
explanation of the testing effect. Memory & Cognition, 34, 268 –276.
Carpenter, S. K., & Pashler, H. (2007). Testing beyond words: Using tests
to enhance visuospatial map learning. Psychonomic Bulletin & Review,
14, 474 – 478.
Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval
induced facilitation: Initially nontested material can benefit from prior
testing. Journal of Experimental Psychology: General, 135, 553–571.
Geisser, S., & Greenhouse, S. W. (1958). An extension of Box’s results on
the use of F distribution in multivariate analysis. Annals of Mathematical
Statistics, 29, 885– 891.
Guthrie, J. T. (1971). Feedback and sentence learning. Journal of Verbal
Learning and Verbal Behavior, 10, 23–28.
Higham, P. A., & Gerrard, C. (2005). Not all errors are created equal:
Metacognition and changing answers on multiple-choice tests. Canadian
Journal of Experimental Psychology, 59, 28 –34.
Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval promotes
short-term retention, but equal interval retrieval enhances long-term
retention. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 33, 704 –719.
Karpicke, J. D., & Roediger, H. L. (2008, February 15). The critical
importance of retrieval for learning. Science, 319, 966 –968.
Kelley, C. M., & Sahakyan, L. (2003). Memory, monitoring, and control in
the attainment of memory accuracy. Journal of Memory and Language,
48, 704 –721.
Killeen, P. R. (2005). An alternative to null-hypothesis significance tests.
Psychological Science, 16, 345–353.
927
CORRECTING A METACOGNITIVE ERROR
Koriat, A., & Goldsmith, M. (1994). Memory in naturalistic and laboratory
contexts: Distinguishing the accuracy-oriented and quantity oriented
approaches to memory assessment. Journal of Experimental Psychol-
ogy: General, 123, 297–315.
Koriat, A., & Goldsmith, M. (1996). Monitoring and control processes in
strategic regulation of memory accuracy. Psychological Review, 103,
490 –517.
Kulhavy, R. W. (1977). Feedback in written instruction. Review of Edu-
cational Research, 47, 211–232.
Kulhavy, R. W., & Anderson, R. C. (1972). Delay-retention effect with
multiple-choice tests. Journal of Educational Psychology, 63, 505–512.
Kulhavy, R. W., & Stock, W. A. (1989). Feedback in written instruction:
The place of response certitude. Educational Psychology Review, 1,
279 –308.
Kulhavy, R. W., Yekovich, F. R., & Dyer, J. W. (1976). Feedback and
response confidence. Journal of Educational Psychology, 68, 522–528.
Kulhavy, R. W., Yekovich, F. R., & Dyer, J. W. (1979). Feedback and
content review in programmed instruction. Contemporary Educational
Psychology, 4, 91–98.
Kulik, J. A., & Kulik, C. C. (1988). Timing of feedback and verbal
learning. Review of Educational Research, 58, 79 –97.
Lhyle, K. G., & Kulhavy, R. W. (1987). Feedback processing and error
correction. Journal of Educational Psychology, 79, 320 –322.
Lichtenstein, S., & Fischhoff, B. (1980). Training for calibration. Organi-
zational Behavior and Human Performance, 26, 149 –171.
McDaniel, M. A., & Fisher, R. P. (1991). Tests and test feedback as
learning sources. Contemporary Educational Psychology, 16, 192–201.
Meyer, L. A. (1986). Strategies for correcting students’ wrong responses.
The Elementary School Journal, 87, 227–241.
Nelson, T. O. (1984). A comparison of current measures of the accuracy of
feeling-of-knowing predictions. Psychological Bulletin, 95, 109 –133.
Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of
learning (JOLs) are extremely accurate at predicting subsequent recall:
The “delayed-JOL-effect.” Psychological Science, 5, 207–213.
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework
and new findings. In G. H. Bower (Ed.), The psychology of learning and
motivation (Vol. 26, pp. 125–141). New York: Academic Press.
Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When does
feedback facilitate learning of words? Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 31, 3– 8.
Pashler, H., Rohrer, D., Cepeda, N. J., & Carpenter, S. K. (2007). Enhanc-
ing learning and retarding forgetting: Choices and consequences. Psy-
chonomic Bulletin & Review, 14, 187–193.
Roediger, H. L., III, & Karpicke, J. D. (2006a). The power of testing
memory: Basic research and implications for educational practice. Per-
spectives on Psychological Science, 1, 181–210.
Roediger, H. L., III, & Karpicke, J. D. (2006b). Test-enhanced learning:
Taking memory tests improves long-term retention. Psychological Sci-
ence, 17, 249 –255.
Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative
consequences of multiple-choice testing. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition, 31, 1155–1159.
Roediger, H. L., III, & Payne, D. G. (1985). Recall criterion does not affect
recall level or hypermnesia: A puzzle for generate/recognize theories.
Memory & Cognition, 13, 1–7.
Roediger, H. L., Wheeler, M. A., & Rajaram, S. (1993). Remembering,
knowing and reconstructing the past. In D. L. Medin (Ed.), The psychol-
ogy of learning and motivation: Advances in research and theory (Vol.
30., pp. 97–134). New York: Academic Press.
Sassenrath, J. M., & Yonge, G. D. (1968). Delayed information feedback,
feedback cues, retention set, and delayed retention. Journal of Educa-
tional Psychology, 59, 69 –73.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime reference
guide. Pittsburgh, PA: Psychology Software Tools, Inc.
Sharp, G. L., Cutler, B. L., & Penrod, S. D. (1988). Performance feedback
improves the resolution of confidence judgments. Organizational Be-
havior and Human Performance, 42, 271–283.
Skinner, B. F. (1954). The science of learning and the art of teaching.
Harvard Educational Review, 24, 86 –97.
Surber, J. R., & Anderson, R. C. (1975). Delay-retention effect in natural
classroom settings. Journal of Educational Psychology, 67, 170 –173.
Webb, J. M., Pridemore, D. R., Stock, W. A., Kulhavy, R. W., & Henning,
J. E. (1997). Remembering responses and cognitive estimates of know-
ing: The effects of instructions, retrieval sequences, and feedback. Con-
temporary Educational Psychology, 22, 147–164.
Wheeler, M. A., & Roediger, H. L., III. (1992). Disparate effects of
repeated testing: Reconciling Ballard’s (1913) and Bartlett’s (1932)
results. Psychological Science, 3, 240 –245.
World Book, Inc. (2002). The 2002 world book encyclopedia (Vols. 1–25).
Chicago: Author.
Received November 16, 2007
Revision received February 25, 2008
Accepted February 29, 2008 !
928 BUTLER, KARPICKE, AND ROEDIGER
... For instance, in (self-)assessments, students will be trained to reflect on their reasoning and to improve their metacognitive ability to assess whether their confidence is justified. If combined with knowledge assessments, targeted interventions, and feedback (Butler et al., 2008), this will likely become an effective tool to increase student understanding and conceptual change. In addition to being a preparatory activity that will become more central (and complex) later on, when it comes to clinical decisionmaking, where multiple dimensions-including the needs and preferences of individual patients-need to be integrated, it also provides information about individual learning progressions in physiology. ...
... Retrieval practice comprises the resolution of problems, answering or formulating questions about the contents at stake, writing summaries about what was learned in one's own words, and holding peer-to-peer debates (Agarwal et al., 2014;Agarwal et al., 2017). The benefits tend to be proportional to the difficulty level, and they can be obtained even when the students choose incorrect answers, as long as there is clear feedback (Butler, Karpicke and Roediger, 2008 (Schoenborn, Adams and Peregoy, 2013). Poor sleep is a risk factor for health disorders such as malnutrition (Beebe et al., 2013), obesity (Jarrin, McGrath, and Drake, 2013), diabetes and hypertension (Spiegel et al., 1999), and correlates with academic deficits across a wide range of intellectual quotients (IQ) (Erath et al., 2015). ...
Chapter
Full-text available
The overall goal of the ISEE Assessment is to pool multi-disciplinary expertise on educational systems and reforms from a range of stakeholders in an open and inclusive manner, and to undertake a scientifically robust and evidence based assessment that can inform education policy-making at all levels and on all scales. Its aim is not to be policy prescriptive but to provide policy relevant information and recommendations to improve education systems and the way we organize learning in formal and non-formal settings. It is also meant to identify information gaps and priorities for future research in the field of education.
... The important components are schools. They proposed that they base their curriculum and learning after the most successful and innovative law schools (Butler et al., 2008). ...
Article
Full-text available
Background: Globally thousands of MBAs graduates annually who opt for employment opportunity in the market. It creates strong competitive job market for new MBAs. In past, the only qualifications required for jobs is technical skills, also known as hard skills; but today’s hiring process indicates only technological skills do not provide adequate opportunities fresh MBA graduates. Objectives: This study aims to analyze employers’ perception on need of professional skills in MBA in Kathmandu valley. Methods: The study adopted explanatory research design. Purposive sampling technique was used to select 280 employers from bank and financial institutions as a respondent. Structured Questionnaire was used and data collection was done with the KoBo-Toolbox. In order to analyze data, Ordered Logit Model was used to identify the perception of employers’ towards need of professional skills in MBA graduates in Kathmandu valley. Results: Findings revealed that employers seek soft and hard skills in MBA graduates. Further, Ordered Logistic Result indicates that age, education level, recommendation, adaptability skills, learning skills, conflict resolution skills and teamwork skills are the major skills statistically significant. It clearly indicates that in employers’ perception soft skills have high importance than hard skills as need of professional skills in MBA graduates. Conclusion: Following employers’ perspectives, MBA colleges should priorities to provide professional skills, more focusing soft skills, to their graduate. Such skills not only help MBA students to understanding their market, but also produce readymade manpower to the job market, that reduce training and other preparatory cost of the organizations.
Chapter
This text introduces students, scholars, and interested educated readers to the issues of human memory broadly considered, encompassing both individual memory, collective remembering by societies, and the construction of history. The book is organised around several major questions: How do memories construct our past? How do we build shared collective memories? How does memory shape history? This volume presents a special perspective, emphasising the role of memory processes in the construction of self-identity, of shared cultural norms and concepts, and of historical awareness. Although the results are fairly new and the techniques suitably modern, the vision itself is of course related to the work of such precursors as Frederic Bartlett and Aleksandr Luria, who in very different ways represent the starting point of a serious psychology of human culture.
Article
p style="text-align: justify;">This study aims to examine the effect of feedback strategies on understanding and applying the concept of National ideology to students who have different achievement motivation, on learning Citizenship Education in vocational high schools. This research uses quasi experiment research design (Quasi Experiment). The subjects of this study were 133 vocational high school students. The research instrument used the achievement motivation questionnaires and test appraisal tools of understanding and application of the concept. Research data were analyzed using MANOVA (Multivariate Analysis of Variance) method with 2x2factorial. The results reveal three findings, namely: (1) there are differences in posttest score of understanding and application of concepts between groups of direct and indirect feedback strategies; (2) there is difference of posttest score of understanding and application of concept between high and low achievement motivation group: (3) interaction between feedback strategy with achievement motivation does not influence simultaneously to understanding and application of concept.</p
Article
Background Metacognition is a cognitive process that involves self-awareness of thinking, understanding, and performance. This study assesses pathologists’ metacognition by examining the association between their diagnostic accuracy and self-reported confidence levels while interpreting skin and breast biopsies. Design We studied 187 pathologists from the Melanoma Pathology Study (M-Path) and 115 pathologists from the Breast Pathology Study (B-Path). We measured pathologists’ metacognitive ability by examining the area under the curve (AUC), the area under each pathologist’s receiver operating characteristic (ROC) curve summarizing the association between confidence and diagnostic accuracy. We investigated possible relationships between this AUC measure, referred to as metacognitive sensitivity, and pathologist attributes. We also assessed whether higher metacognitive sensitivity affected the association between diagnostic accuracy and a secondary diagnostic action such as requesting a second opinion. Results We found no significant associations between pathologist clinical attributes and metacognitive AUC. However, we found that pathologists with higher AUC showed a stronger trend to request secondary diagnostic action for inaccurate diagnoses and not for accurate diagnoses compared with pathologists with lower AUC. Limitations Pathologists reported confidence in specific diagnostic terms, rather than the broader classes into which the diagnostic terms were later grouped to determine accuracy. In addition, while there is no gold standard for the correct diagnosis to determine the accuracy of pathologists’ interpretations, our studies achieved a high-quality reference diagnosis by using the consensus diagnosis of 3 experienced pathologists. Conclusions Metacognition can affect clinical decisions. If pathologists have self-awareness that their diagnosis may be inaccurate, they can request additional tests or second opinions, providing the opportunity to correct inaccurate diagnoses. Highlights Metacognitive sensitivity varied across pathologists, with most showing higher sensitivity than expected by chance. None of the demographic or clinical characteristics we examined was significantly associated with metacognitive sensitivity. Pathologists with higher metacognitive sensitivity were more likely to request additional tests or second opinions for their inaccurate diagnoses.
Chapter
When entering higher education, students must become more autonomous in their learning, particularly know how to take stock of their ways of learning: identify what they know, and also what they do not know, then adapt their learning strategies. They must therefore develop metacognitive skills. This article analyzes the responses of 3830 newly arrived undergraduate students through a pre-requisites test including confidence levels. Focus is given on both their success rate, i.e., their achievement at the test, and their realism, i.e., if they were predictive in their confidence judgement. To compute a relevant realism index, previous work by Prosperi [1] is extended to our context. First, an expected course effect is observed: one of the seven proposed courses reveals a lower realism index, and at the same time, its success rate is lower too. Moreover, a gender impact is highlighted: females reach a higher realism index than males and this gap fluctuates over the 4 last years. This gender effect is probably different from the course effect because success rates of males and females remain equivalent, thus success rate and realism seem to be dissociated in this case. Finally, students who perform poorly on the pre-requisites test and choose to take a second session after a remediation period improve their results: both gaps of success rate and realism are closed. That could prove the relevance of the remediation, and/or the effect of metacognition feed-back provided just at the end of the pre-requisites test.KeywordsMetacognitionRealism indexGender effectUndergraduate studentsConfidence levels
Chapter
The COVID-19 crisis emphasizes the importance of Self-Regulated Learning (SRL), one of today’s most valuable skills, with which learners set their learning goals, monitor and control their cognition, motivation, and behavior, and reflect upon them. In the current experimental study, an intervention program based on short online interactive videos was developed to promote SRL skills. This paper presents the impact of the intervention on students’ use of SRL skills and grades. It also explores four key pedagogical processes (teacher-student relationships, collaboration, autonomy, and feedback) as mediators for SRL strategies use and grades. The experimental and control groups were randomly assigned (N = 290 students, 18 classes, grades 7–12). Each teacher taught the same subject in two classes for a month, an amount of time that allows intervention to take effect. One of the classes participated in the video-based intervention program (experimental group), whereas the other performed all activities but did not have access to the videos (control group). Data was collected through an SRL and pedagogies usage questionnaire, SRL video prompts, and knowledge tests and was analyzed using the quantitative method. In addition to the theoretical contribution, a practical tool has been developed for educators who wish to employ online SRL training.KeywordsSRL - Self-Regulated LearningVideo-assisted learningERT - Emergency remote teachingSRL intervention programCOVID-19
Article
Purpose: Research has demonstrated that learners who practice self-testing have superior long-term retention compared to those rereading the material alone, a phenomenon called test-enhanced learning. This testing effect can be leveraged by spacing out the testing practice over time, a technique called spaced repetition. In 2017, we provided dental students at the school with access to Osmosis, a web-based platform that supports test-enhanced learning and spaced repetition through flashcards. This exploratory study examined students' adoption of self-testing with flashcards and its impact on learning performance in basic sciences. Methods: Participants were 143 first-year predoctoral students at a dental school in the US. The platform analytics revealed the number of flashcards students answered throughout the first academic year (2019-2020). Regression analyses examined how self-testing with flashcards impacted students' exam scores in basic sciences. Analysis of variance (ANOVA) tests examined the difference in students' exam performance among the non, minimal, occasional, and regular flashcard users who answered 0, 1-99, 100-499, and over 500 flashcards, respectively. Results: Students answered 82,766 flashcards during the year. Additionally, they created 17,973 flashcards using the platform's flashcard authoring tool. Regression analyses showed that self-testing with flashcards correlated positively with students' exam performance in anatomy, biochemistry, nutrition, and physiology. ANOVA results revealed a statistically significant difference in students' exam performance in anatomy, biochemistry, and nutrition among the four groups. Conclusions: This study is the first in dental education to examine students' self-testing on the Osmosis platform. Results revealed that there was widespread adoption of self-testing with flashcards. The study provided additional evidence to support the value of self-testing for dental students. It has practical implications of how test-enhanced learning can be incorporated into dental education to support student learning. The study contributed to the test-enhanced learning literature in dental education, an area that has been underexplored.
Article
Full-text available
Two experiments examined the effects of feedback on a person's ability to recall their responses to multiple-choice items and the confidence with which they made those responses within the context of a control theory of feedback processing. In Experiment 1, instructions to remember answers and confidence ratings (instructions vs no instructions) were varied with feedback for responding (feedback vs no feedback). In Experiment 2, all subjects received instructions to remember their responses, and feedback was varied with retrieval cue sequence: answer given first vs confidence rating given first. Feedback reduced the rate of error perseveration and decreased the likelihood of retrieving original answers. Subjects allocated more time to study feedback when they erred with a high level of confidence that they were correct. Subjects took less time to retrieve confidence ratings when they were given the answer first as a retrieval cue. Instructions to remember had no effect on recall. Results suggest that learners closely monitor their personal estimates of knowing in an effort to regulate future learning. The relationship of these personal estimates to feedback also provides an explanation for the phenomenon of hindsight bias.
Article
Full-text available
A quantitative research synthesis (meta-analysis) was conducted on the literature concerning the effects of feedback on learning from computer-based instruction (CBI). Despite the widespread acceptance of feedback in computerized instruction, empirical support for particular types of feedback information has been inconsistent and contradictory. Effect size calculations from twenty-two studies involving the administration of immediate achievement posttests resulted in a weighted mean effect size of .80. Also, a mean weighted effect size of .35 was obtained from nine studies involving delayed posttest administration. Feedback effects on learning and retention were found to vary with CBI typology, format of unit content and access to supplemental materials. Results indicate that the diagnostic and prescriptive management strategies of computer-based adaptive instructional systems provide the most effective feedback. The implementation of effective feedback in computerized instruction involves the computer's ability to verify the correctness of the learner's answer and the underlying causes of error.
Article
A theoretical system of metacognitive components for self-directed memory retrieval is described, and relevant empirical data are reported. The metacognitive components include (1) a preliminary feeling of knowing for an answer; (2) a confidence judgment about a retrieved answer after a search of memory; (3) a decision of whether to output a retrieved answer; (4) a subsequent feeling of knowing for a nonretrieved answer; and (5) a decision of whether to continue or terminate searching memory for the unretrieved answer. Some of these components have been investigated previously but only in isolation. Here we integrate them into a theoretical system for directing one's own retrieval. The system gives a good account of relevant older findings and of several new findings, in particular, those demonstrating how people trade off the costs and benefits of continued searching and how the threshold for the decision to continue searching varies in a predictable way. The theoretical system also accounts for several newly reported findings from earlier research conducted on Mount Everest (and related findings in the literature) by postulating two separable major subdivisions in the system: one that gives rise to guesses (including commission errors and correct responses) and another that gives rise to omission errors. Different metacognitive mechanisms are proposed to have the major responsibility for each of those subdivisions.
Article
Two experiments examined the influence of test taking and feedback in promoting learning. Participants were shown a list of trivia facts during an incidental learning task. Some facts were later tested (plus feedback provided), whereas other facts were not presented for further processing. Tested facts were better recalled on a final criterion test than untested facts, showing the beneficial effects of testing. Tested facts were also better recalled than facts that were presented for additional study (Experiment 1). Although testing plus feedback enhanced learning, there were no effects of whether the participants were required simply to repeat the feedback or elaborate it.
Article
The theoretical functions of external feedback in S-R and close-loop models of verbal learning are presented. Contradictory predictions from the models were tested with a 3 × 3 factorial experiment including three types of feedback and three amounts of rehearsal. Ninety adult Ss were tested individually and were required to learn 39 sentences verbatim. The results were: (a) Feedback facilitated learning when it followed wrong responses. (b) Feedback had no effect on learning following correct responses. (c) Feedback consisting of both the stimulus and the response was superior to no feedback, whereas feedback consisting of only the response did not differ from no feedback.
Article
Feedback is an essential construct for many theories of learning and instruction, and an understanding of the conditions for effective feedback should facilitate both theoretical development and instructional practice. In an early review of feedback effects in written instruction, Kulhavy (1977) proposed that feedback’s chief instructional significance is to correct errors. This error-correcting action was thought to be a function of presentation timing, response certainty, and whether students could merely copy answers from feedback without having to generate their own. The present meta-analysis reviewed 58 effect sizes from 40 reports. Feedback effects were found to vary with control for presearch availability, type of feedback, use of pretests, and type of instruction and could be quite large under optimal conditions. Mediated intentional feedback for retrieval and application of specific knowledge appears to stimulate the correction of erroneous responses in situations where its mindful (Salomon & Globerson, 1987) reception is encouraged.
Article
Bartlett (1932) gave subjects a prose passage and showed how recall dropped when they were tested repeatedly. Ballard (1913), using poetry, and Erdelyi and Becker (1974), using pictures, reported improvements in performance (or hypermnesia) over repeated testing. We investigated two likely factors leading to the discrepant results: the type of material and the interval between tests. The primary cause of the differing outcomes is the interval between tests. In general, when the intervals between successive tests are short improvement occurs between tests. When these intervals are long, forgetting occurs. The type of material used plays little role: Hypermnesia in recall of prose (even “The War of the Ghosts”) occurred with short intervals between tests. We also report a striking confirmation of the power of tests to enhance memory: Repeated tests shortly after study greatly improved recall a week later.
Article
Judgments of learning (JOLs), which pertain to knowing what one knows and which help to guide self-paced study during acquisition, have almost never been very accurate at predicting subsequent recall. We recently discovered a situation in which the JOLs can be made to be extremely accurate. Here we report the conditions under which such high accuracy occurs, namely, when the JOL made on the stimulus cue is delayed until shortly after study rather than being made immediately after study. Discussion is focused both on theoretical explanations (to be explored in future research) and on potential applications of the delayed-JOL effect.