Content uploaded by Janet Metcalfe
Author content
All content in this area was uploaded by Janet Metcalfe on Feb 20, 2015
Content may be subject to copyright.
Journal
of
Experimental Psychology:
Learning, Memory, and Cognition
2001,
Vol. 27, No. 6,
1491-1494
Copyright 2001
by the
American Psychological Association,
Inc.
0278-7393/01/S5.00
DOI:
1O.1037//0278-7393.27.6.1491
Errors Committed With High Confidence
Are
Hypercorrected
Brady Butterfield
and
Janet Metcalfe
Columbia University
The relation between people's confidence
in the
accuracy
of an
erroneous response
and
their later
performance
was
investigated. Most models
of
human memory suggest that
the
higher
a
person's
confidence,
the
stronger
the
item
(in the
context
of the
eliciting
cue)
that
is
retrieved from memory.
In
recall, stronger associates
to a cue
interfere with competing associates more than
do
weaker associates.
This state
of
affairs implies that errors endorsed with high, rather than
low,
confidence should
be
more
difficult
to
correct
by
learning
the
correct response feedback.
In
contrast
to the
authors' expectations,
highly confident errors were
the
most likely
to be
corrected
in a
subsequent retest. Participants nearly
always endorsed
the
correct response
in
cases
in
which both
the
correct response
and the
original
erroneous response were generated
at
retest, suggesting that people possess
a
refined metacognitive
ability
to
know what
is
correct
and
incorrect.
Given
the
pervasiveness
of
mistakes
and
errors
in all
realms
of human cognition,
it is
surprising
how
inadequate understand-
ing
of how
people
are
able
to
overcome them
is.
Indeed,
the
question of how people learn hinges critically
on
how they
are
able
to replace misinformation with correct information,
and a
partic-
ularly important kind
of
misinformation
is
that which
is self-
generated.
The
issue that
is
addressed
in
this article, then,
is
both
simple
and
foundational
for any
theory
of
learning that claims
to
be comprehensive:
How do
people correct their mistakes?
There
is a
long tradition
in
which
it is
considered that
the
best
learning strategy
may be the
complete avoidance
of
errors,
and
there
is a
large literature
on how
ordinary incorrect
and
intention-
ally misleading information worsen memory (Ayers
&
Reder,
1998;
Loftus,
1979;
Wright
&
Loftus, 1998).
If
self-generated
mistakes
are
like experimenter-presented misinformation—and
it
seems reasonable
to so
assume—then
we are in a
position
to
make
some data-based predictions about (a)
how
errors might impact
on
later performance
and (b)
which kinds
of
errors—for example
strong, highly confident errors
as
opposed
to, say,
errors about
which the participant voices considerable uncertainty—will impact
on later performance. Koriat (1998) argued that when
a
person
retrieves
an
item from memory,
he or
she does
not
have
any
direct
way
of
knowing whether that item
is
actually correct
or
incorrect
and that "illusions
of
knowing occur when
the
accessibility
of
information
is not
diagnostic
of its
accuracy"
(p. 16). The
partic-
ipant may assign confidence ratings
or
other metacognitive assess-
Brady Butterfield
and
Janet Metcalfe, Department
of
Psychology,
Co-
lumbia University.
This research
was
supported
by a
CSEP grant from
the
James
S.
McDonnell Foundation
and
National Institute
of
Mental Health Grant
MH8066.
We
thank Nate Kornell, Jason Kruk, Jennifer Mangels, Jasia
Pietrzak, Jacqui Rick,
and
Lisa
Son for
their help.
We
also thank Asher
Koriat
and an
anonymous reviewer
for
their comments
on an
earlier
version
of
this article.
Correspondence concerning this article should
be
addressed
to
Brady
Butterfield, Department
of
Psychology, Schermerhorn Hall, Columbia
University,
New
York,
New
York 10027. Electronic mail
may be
sent
to
butterfi@paradox.columbia.edu.
ments inferentially,
on the
basis
of
factors such
as the
amount
of
information that comes
to
mind
and the
ease with which that
information came
to
mind,
but
Koriat assumed that normally,
if
an
item comes
to
mind, people take
it to be
correct. According
to
Koriat, this
is all the
information the person
has
available (because
he
or
she
has no
privileged access
to
either
the
truth
or to
what
the
experimenter thinks
is
the truth, i.e.,
the
scoring system used
in the
experiment).
He
argued, however, that this information
is
nearly
always correct.
Thus,
allowing
for the
moment that Koriat (1998)
is
correct that
at
the
time
a
mistake
is
generated
the
person does
not
know that
it
is
a
mistake, then
all of the
memory-enhancement processes that
strengthen correctly generated items
on
their retrieval should
be in
force.
The
self-generated mistake should
be
strongly registered
in
memory
as a
result
of
its very generation (Slamecka
& Graf, 1978;
but
see
also Caroll
&
Nelson, 1993).
There
is a
considerable research literature, going back
to
Miiller
and Schumann (1894), Webb (1917), Melton
and
Irwin (1940),
McGeoch (1942), Osgood (1949),
and
Barnes
and
Underwood
(1959) through Loftus (1979)
and J. R.
Anderson
and
Reder
(1999),
showing that competing information results
in
interfer-
ence.
There
are a
variety of theories about why and how competing
information produces ostensible decrements
in
memory (e.g.,
J. A.
Anderson,
1973; J. R.
Anderson
&
Bower,
1972;
Atkinson
&
Shiffrin,
1968;
Eich,
1982;
Gillund
&
Shiffrin,
1984;
Hintzman,
1984;
Metcalfe, 1990; Raaijmakers
&
Shiffrin, 1981),
but
there
is
little doubt that such interference phenomena exist. Thus, when
a
person makes
a
mistake answering
a
question such
as
"What
is the
last name of the Union general who defeated the Confederate army
at
the
Civil
War
battle
of
Gettysburg?"
by
generating
the
response
"Grant," that self-generated misinformation should harm
the re-
membrance
of the
correct answer (i.e., Meade). Presumably,
the
incorrect information
is
stronger than
the
correct information
or it
would
not
have been produced overtly. Furthermore,
it is
made
stronger
yet by the
memory enhancement processes that accom-
pany
its
generation.
The
original strength
of the
original mistake
augmented
by
the memory strength gained from its own generation
should impair
the
probability
of
producing
the
correct answer.
1491
1492
BUTTERFIELD AND METCALFE
Finally, there is general agreement that, except under very
circumscribed circumstances, responses about which a person is
highly confident tend to be items that are the strongest and most
fluent in memory, though confidence does not appear to be scaled
directly from perceived familiarity (see Van Zandt, 2000). We use
strength as a shorthand, without ascribing to either a unidimen-
sional view of items or a particular scaling assumption. High-
confidence items are, presumably, the dominant and easily retriev-
able items—even if objectively they are mistakes. Items about
which the person exhibits low confidence are presumably less
strong. Indeed, there is good evidence that retrieval fluency is a
causative factor in confidence ratings (e.g., Kelley & Lindsay,
1993) and in other memory judgments (e.g., Jacoby, Woloshyn, &
Kelley, 1989). By this reasoning, it should be most difficult to
overcome highly confident errors because the reasons for the high
confidence in the errors are presumably the very factors that make
those errors most likely to interfere with the learning of the correct
response.
To evaluate this hypothesis, we asked participants to answer
general information questions (taken from Nelson & Narens,
1980).
We used general information questions as stimuli because
they are a good way to elicit high-confidence errors from infor-
mation acquired outside of a laboratory setting. After each re-
sponse, the participant was asked to rate his or her confidence that
the response was correct. Then, if the answer was incorrect, the
participant was shown the correct answer. At a later time, we
retested the participants on some of the questions that elicited
errors as well as some that elicited correct answers.
Our core question concerned the relationship between initial
confidence in an error and the likelihood of answering the same
question correctly at retest. Our first hypothesis was that errors
endorsed with higher confidence would be less likely to be cor-
rected at retest than would lower confidence errors. Because we
were also interested in seeing whether these high-confidence errors
were stronger, we asked participants to give three retest responses,
write them down, and put a star beside the correct
one.
Our second
hypothesis was that higher confidence errors, as stronger errors,
would be more likely to appear in this list of three responses than
would lower confidence errors. Thus, we were interested not only
in whether a participant could think of a response (the correct one,
say) but also in whether a participant could not help but think of a
response (the original mistake, in particular).
Method
Participants
Participants were 19 Columbia undergraduates (10 women and 9 men,
mean age 19.1 years) who participated to partially satisfy a requirement of
an introductory psychology course. Participants were treated in accordance
with the "Ethical Principles of Psychologists and Code of Conduct" (Amer-
ican Psychological Association, 1992).
Materials
We used 150 general information questions taken from the set published
by Nelson and Narens (1980). A sample question is "What poison did
Socrates take at his execution?" Each question and its correct answer were
printed on separate sides of an index card.
Ten questions were removed from the original set and replaced with 10
other questions randomly selected from the same source. Stimuli were
discarded if world events had unduly compromised 1980 normative data
(e.g., a capital of Iraq question) or had either nullified or changed the
correct answer (e.g., a capital of Czechoslovakia question). Questions for
which there seemed to be more than one correct answer (e.g., astrolabe and
sextant, twister and tornado, tsunami and tidal wave) were also discarded.
Finally, the question concerning the last name of the famous Civil War
photographer was discarded after some pilot work because its correct
answer was the same as one experimenter's first name. This was generally
distracting and easily recalled at retest.
Procedure
At the beginning of each trial, the experimenter showed the participant
the question side of one index card. The participant read the question out
loud and then wrote down an answer to that question. The participant then
rated his or her confidence in that answer's correctness on a 7-point scale:
- 3 (sure wrong) to 0 (unsure) to +3 (sure correct). Participants had to
give an answer and a confidence rating for every presented question. If the
answer was correct, the experimenter said "You're right" and proceeded to
the next trial. If the answer was incorrect, the participant was shown the
correct answer (which was on the other side of the index card) for 2 s. The
experimenter simultaneously said "Actually, the answer is [correct an-
swer]"
and then proceeded to the next trial. This continued until at least 15
correct and 15 incorrect responses had been given. At no time were
participants warned of a retest.
After the initial test, participants were given an interpolated task—a long
and complicated logic problem. Meanwhile, because there were usually
more than 15 questions in either the correct or incorrect stack from the first
part of the experiment, the experimenter randomly selected questions from
the larger stack to come up with exactly 15 answered correctly and 15
answered incorrectly. These cards were then shuffled together for retest.
After 5 min, the experimenter asked the participant to stop working on the
logic problem and started the retest.
In the retest, after reading each question to themselves, participants were
asked to say the first three responses that came to mind. Participants wrote
these three responses down and rated their confidence in each answer's
correctness as before. Participants had to put a star by one of the three
responses to indicate their final answer. A retest question was considered
to have been answered correctly if the correct answer was starred. We
retested questions that were answered correctly on the first test as a check
on reliability and also to countermand a potential exclusion strategy
whereby a participant could automatically eliminate as correct any answer
that they remembered having given on the first test. Feedback was given in
the same manner as in the initial test. At the end of the experiment,
participants were debriefed and thanked for their participation.
Results
We used a criterion of p < .05 for significance. Each partici-
pant's response probability was weighted equally in the calculation
of the mean probabilities displayed in Table 1. These values were
almost indistinguishable from those that weighted each trial
equally. However, because no participant gave errors of every
level of confidence, the within-subject gammas might be consid-
ered more trustworthy.
Basic Data
Participants answered .45 of the questions correctly at the initial
test and .82 of the questions correctly at retest. At retest, the
conditional probability of correctly answering a question that had
HYPERCORRECTED ERRORS
1493
Table 1
Probabilities of Response Types by Confidence Rating
Probability
M
SE
P(C2|W1)
M
SE
P(pe2|Wl)
M
SE
- 3
.02
.01
.60
.05
.45
.06
- 2
.14
.05
.79
.09
.57
.10
Confidence rating
- 1
.24
.10
.65
.02
.54
.15
0
.40
.10
.84
.07
.64
.13
1
.76
.09
.67
.21
1.00
.00
2
.92
.03
.60
.25
.80
.20
3
.97
.01
1.00
.00
.67
.33
Note. P = probability; Cl = correct at first test; C2 = correct at retest;
Wl = wrong at first test; pe2 = the presence of the original error at retest.
been answered correctly at first test was .99. At retest, the condi-
tional probability of correctly answering a question that had been
answered incorrectly on the initial test was .64.
The first row of Table 1 shows that there was an extremely
systematic and well-behaved relation between confidence and pro-
portion correct at initial test, such that higher confidence generally
yielded higher accuracy. This result suggests that confidence, in
this experiment as in many others in the literature, was an indica-
tion of something that might be ascribed the shorthand term
memory strength (though, as mentioned above, cf. Van Zandt,
2000).
Relation of Initial Confidence in Error to Later
Performance
The middle row of Table 1 shows the relation between confi-
dence in an error and the likelihood of error correction at retest. In
contrast to our major hypothesis (that high-confidence errors
should be especially difficult to correct), high confidence in the
initial error predicted that the correct response would be both
produced and subscribed to (i.e., starred) on the second test. The
gamma correlation between confidence in the original error and
correctness on the second test was significantly positive (mean
G = .36, SEM = .116), t(\&) = 3.07. We had predicted that this
would be a negative correlation. In this experiment, questions on
which people had initially made high-confidence errors were more
(not less) likely to be answered correctly at retest than were
low-confidence errors.
Relation of Initial Confidence in Error to Its Production
at Retest
This ability to correct highly confident mistakes was not due to
those mistakes' absence from memory. The bottom row of Table 1
shows data consistent with the idea that confidence reflects mem-
ory strength, as higher confidence errors were more likely than
lower confidence errors to show up at retest. The number of trials
(and participants) represented is especially low in the + 3 confi-
dence column, however, so that mean is less reliable than the
standard error of the mean would suggest. A gamma correlation
between confidence rating on the initial test and the presence of the
original error as one of the retest responses given to that question
was calculated for each participant. The average gamma was
significantly greater than zero (G = .39, SEM = .124),
1(18) = 3.15.
Contingency in Error and Correct Answer Production at
Retest
Confidence in an error, then, was positively correlated with both
correct performance at retest and the presence of the error at retest.
This pattern of results suggests that there might be some relation
between the presence of the correct answer and the original error
at retest. If such a contingency existed, it would provide support
for the idea that participants were remembering correct answers by
associating them with their original errors (i.e., mediating).
No such relation was found, however. Among questions an-
swered incorrectly at first test, the presence of the original error in
the list of three retest responses did not predict the presence of the
correct answer in the same list. The mean conditional probability
that the correct answer was present in the list of three retest
responses when the original error was present was .64, and the
mean conditional probability that the correct answer was present
when the original error was absent was .63. These two probabili-
ties were not significantly different from one another (t < 1).
Discussion
The data suggest that high-confidence errors are indeed stronger
than are low-confidence errors, as they are more likely to show up
as one of the three retest responses. Though our intuition was that
this strength would interfere with the learning of the correct
answer, these high-confidence errors were actually more likely to
be corrected than were low-confidence errors. Although we found
our results to be counterintuitive, and certainly they go against
most theories of memory, on searching the literature, we came
across one group of researchers who found that errors endorsed
with high confidence were more likely to be corrected in a sub-
sequent retest (e.g., Kulhavy, Stock, Hancock, Swindell, & Hamm-
rich,
1990;
Kulhavy, Yekovich, & Dyer, 1976). This effect was
found in a task different from the one we used—a multiple-choice
recognition task, with feedback, in which study time was left free
to vary and participants were warned of the retest. Kulhavy and his
colleagues (Kulhavy et al., 1976, 1990) attributed their findings to
an increase in postfeedback study time for items eliciting high-
confidence errors. Though the Kulhavy et al. (1990) paradigm was
different from ours, it would be of considerable interest to see
whether people in our task, when given unlimited, self-paced time
after presentation of the feedback, take longer after a high-
confidence error. This possibility, though, appears not to be the
whole explanation of the high-confidence hypercorrection effect,
however, insofar as the postfeedback time was held constant in our
study.
Other research has found that when people are in a tip-of-the-
tongue state and make an error of commission, they are more likely
to get the correct response eventually than when they make errors
of omission (Schwartz, Travis, Castro, & Smith, 2000). One pos-
sible explanation may be that a participant's familiarity with the
domain of the question influences both the type of error made and
the likelihood of subsequent error correction. It is possible that in
1494
BUTTERFIELD AND METCALFE
our paradigm, as well, domain familiarity had a similar mediating
influence. Finally, it is possible that corrective feedback to a
high-confidence error is more arousing than corrective feedback to
a low-confidence error (because people may be more surprised that
they are wrong) and that this increased arousal facilitates the
encoding or storage of the correct answer (see Butterfield &
Mangels, 2000, 2001).
References
American Psychological Association. (1992). Ethical principles of psy-
chologists and code of conduct. American Psychologist, 47, 1597-1611.
Anderson, J. A. (1973). A theory for the recognition of items from short
memorized lists. Psychological Review, 80, 417-438.
Anderson, J. R., & Bower, G. H. (1972). Recognition and retrieval pro-
cesses in free recall. Psychological Review, 79, 97-123.
Anderson, J. R., & Reder, L. M. (1999). The fan effect: New results and
new theories. Journal of Experimental Psychology: General, 128, 186-
197.
Atkinson, R. C , & Shiffrin, R. M. (1968). Human memory: A proposed
system and its control processes. In K. W. Spence & J. T. Spence (Eds.),
The psychology of learning and motivation: Advances in research and
theory (Vol. 2, pp. 89-195). New York: Academic Press.
Ayers, M. S., & Reder, L. M. (1998). A theoretical review of the misin-
formation effect: Predictions from an activation-based memory model.
Psychonomic Bulletin & Review, 5, 1-21.
Barnes, J. M , & Underwood, B. J. (1959). "Fate" of first-list associations
in transfer theory. Journal of Experimental Psychology, 58, 97-105.
Butterfield, B., & Mangels, J. A. (2000, October). Neural correlates of
metamemory mismatch and subsequent error correction in a semantic
retrieval
task.
Poster session presented at the annual meeting of the
Society for Psychophysiological Research, San Diego, CA.
Butterfield, B., & Mangels, J. A. (2001). Neural correlates of metamemory
mismatch and error correction in a semantic retrieval
task.
Manuscript
in preparation.
Caroll, M., & Nelson, T. O. (1993). Failure to obtain a generation effect
during naturalistic learning. Memory & Cognition, 21, 361-366.
Eich, J. M. (1982). A composite holographic associative recall model.
Psychological Review, 89,
627-661.
Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both recog-
nition and recall. Psychological Review, 91, 1-67.
Hintzman, D. L. (1984). MINERVA 2: A simulation model of human
memory. Behavior Research Methods, Instruments, & Computers, 16,
96-101.
Jacoby, L. L., Woloshyn, V., & Kelley, C. (1989). Becoming famous
without being recognized: Unconscious influences of memory produced
by dividing attention. Journal of Experimental Psychology: General,
188, 115-125.
Kelley, C. M., & Lindsay, S. D. (1993). Remembering mistaken for
knowing: Ease of retrieval as a basis for confidence in answers to
general knowledge questions. Journal of Memory and Language, 34,
1-24.
Koriat, A. (1998). Illusions of knowing: The link between knowledge and
metaknowledge. In V. Y. Yzerbyt & G. Lories (Eds.), Metacognition:
Cognitive and social dimensions (pp. 16-34). Thousand Oaks, CA:
Sage.
Kulhavy, R. W., Stock, W. A., Hancock, T. E., Swindell, L. K., &
Hammrich, P. L. (1990). Written feedback: Response certitude and
durability. Contemporary Educational Psychology, 15, 319-332.
Kulhavy, R. W., Yekovich, F. R., & Dyer, J. W. (1976). Feedback and
response confidence. Journal of Educational Psychology, 68, 522-528.
Loftus, E. F. (1979). Eyewitness testimony. Cambridge, MA: Harvard
University Press.
McGeoch, J. A. (1942). The psychology of human learning. New York:
Longmans.
Melton, A. W., & Irwin, J. McQ. (1940). The influence of degree of
interpolated learning on retroactive inhibition and the overt transfer of
specific responses. American Journal of
Psychology,
53, 173-203.
Metcalfe, J. (1990). Composite Holographic Associative Recall Model
(CHARM) and blended memories in eyewitness testimony. Journal of
Experimental Psychology: General, 119, 145—160.
Muller, G. E., & Schumann, F. (1894). Experimentelle Beitrage zur Un-
tersuchung des Gedachtnisses [Experimental contributions to the inves-
tigation of memory]. Zeitschrift fur Psychologie, 6, 81-190, 257-339.
Nelson, T. O., & Narens, L. (1980). Norms of 300 general-information
questions: Accuracy of recall, latency of recall, and feeling-of-knowing
ratings. Journal of Verbal Learning & Verbal Behavior, 19, 338-368.
Osgood, C. E. (1949). The similarity paradox in human learning: A
resolution. Psychological Review, 56, 132-143.
Raaijmakers, J. G., & Shiffrin, R. M. (1981). Search of associative mem-
ory. Psychological Review, 88, 93-134.
Schwartz, B. L., Travis, D. M., Castro, A. M., & Smith, S. M. (2000). The
phenomenology of real and illusory tip-of-the-tongue states. Memory &
Cognition, 28, 18-27.
Slamecka, N. J., &
Graf,
P. (1978). The generation effect: Delineation of
a phenomenon. Journal of Experimental Psychology: Human Learning
& Memory, 4, 592-604.
Van Zandt, T. (2000). ROC curves and confidence judgments in recogni-
tion memory. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 26, 582-600.
Webb, L. W. (1917). Transfer of training and retroaction: A comparative
study. Psychological Monographs, 24(2), 1—90.
Wright, D. B., & Loftus, E. F. (1998). How misinformation alters memo-
ries.
Journal of Experimental Child Psychology, 71, 155-164.
Received March 14, 2001
Revision received May 21, 2001
Accepted May 21, 2001 •
A preview of this full-text is provided by American Psychological Association.
Content available from Journal of Experimental Psychology: Learning, Memory, and Cognition
This content is subject to copyright. Terms and conditions apply.