A preview of this full-text is provided by American Psychological Association.
Content available from Journal of Experimental Psychology: Applied
This content is subject to copyright. Terms and conditions apply.
Journal
of
Experimental
Psychology: Applied
2000,
Vol.
6, No,
2,
130-147
Copyright
2000
by the
American
Psychological
Association,
Inc.
1076-898XAKVJ5.00
DO1:
I0.1037//1076-898X.6.2.130
The
Effects
of
Averaging Subjective Probability Estimates
Between
and
Within Judges
Dan
Ariely
Massachusetts Institute
of
Technology
Wing
Tung
Au
The
Chinese
University
of
Hong
Kong
Randall
H.
Bender
Research Triangle Institute
David
V.
Budescu
University
of
Illinois
at
Urbana-Champaign
Christiane
B.
Dietz,
Hongbin
Gu,
Thomas
S.
Wallsten
University
of
North
Carolina
Gal
Zauberman
Duke
University
The
average
probability
estimate
of J > 1
judges
is
generally
better
than
its
components.
Two
studies
test
3
predictions regarding averaging that follow
from
theorems
based
on a
cognitive
model
of the
judges
and
idealizations
of the
judgment situation.
Prediction
1 is
that
the
average
of
conditionally
pairwise
independent
estimates
will
be
highly
diagnostic,
and
Prediction
2 is
that
the
average
of
dependent
estimates (differing only
by
independent
error
terms)
may be
well
calibrated.
Prediction
3
contrasts
between-
and
within-subject
averaging.
Results demonstrate
the
predictions'
robustness
by
showing
the
extent
to
which they hold
as the
information
conditions
depart
from the
ideal
and as J
increases.
Practical
consequences
are
that
(a)
substantial improvement
can be
obtained
with
as few as 2- 6
judges
and (b) the
decision
maker
can
estimate
the
nature
of the
expected
improvement
by
considering
the
information
conditions.
On
many
occasions, experts
are
required
to
provide decision
makers
or
policymakers
with
subjective probability estimates
of
uncertain
events (Morgan
&
Henrion,
1990).
The
extensive liter-
ature
(e.g.,
Harvey,
1997;
McClelland
&
Bolger,
1994)
on the
topic
shows
that
in
general,
but
with
clear exceptions, subjective
probability
estimates
are too
extreme, implying
overconfidence
on
the
part
of the
judges.
The
theoretical
challenge
is to
understand
the
conditions
and the
cognitive processes that lead
to
this over-
confidence.
The
applied challenge
is to
figure
out
ways
to
obtain
more
realistic and
useful
estimates.
The
theoretical developments
of
Wallsten,
Budescu,
Erev,
and
Diederich
(1997) provide
one
route
to the
applied goals,
and
they
are the
focus
of
this article.
Dan
Ariely,
School
of
Management,
Massachusetts
Institute
of
Tech-
nology,
Boston,
Massachusetts;
Wing Tung
Au,
Department
of
Psychol-
ogy,
The
Chinese University
of
Hong Kong, Hong Kong, China; Randall
H.
Bender,
Statistics
Research
Division,
Research
Triangle Institute,
Re-
search
Triangle Park, North
Carolina;
David
V.
Budescu, Department
of
Psychology,
University
of
Illinois
at
Urbana-Champaign; Christiane
B.
Dietz,
Hongbin
Gu, and
Thomas
S.
Wallsten, Department
of
Psychology,
University
of
North
Carolina;
Gal
Zauberman,
Fuqua
School
of
Business,
Duke University.
The
authorship
is
intentionally
in
alphabetical
order;
all
authors
con-
tributed equally
to
this
article.
This
research was
supported
by
National
Science
Foundation
Grants
SBR-9632448
and
SBR-9601281.
We
thank
Peter
Juslin
and
Anders
Winman
for
generously sharing
their
data with
us
and
Neil
Bearden
for
comments
on an
earlier
version
of the
article.
Correspondence
concerning
this
article
should
be
addressed
to
Thomas
S.
Wallsten,
Department
of
Psychology, University
of
North
Carolina,
Chapel
Hill,
North Carolina
27599-3270.
Electronic
mail
may be
sent
to
tom.wallsten@unc.edu.
Specifically,
this
research
tests three predictions
regarding the
consequences
of
averaging multiple estimates
that
an
event
will
occur
or is
true.
The
predictions
follow
from
two
theorems
pro-
posed
by
Wallsten
et
al.
(1997)
and
proved rigorously
by
Wallsten
and
Diederich
(in
press). They
are
based
on
idealizations
that
are
unlikely
to
hold
in the real
world.
If,
however,
the
conditions
are
approximated
or if the
predicted
results are
robust
to
departures
from
them,
then
the
theorems
are of
considerable practical
use.
We
next provide
a
brief
overview
of
background
material
and
then
develop
the
predictions
in
more
detail.
We
test
them
by
reanalyzing
data collected
for
other purposes
and
with
an
original
experiment.
We
defer
discussion
of the
practical
and
theoretical
consequences
to the
final
section.
Researchers
have
studied
subjective
probability estimation
in
two
types
of
tasks.
In the
no-choice
full-scale
task,
respondents
provide
an
estimate
from
0 to 1 (or
from
0% to
100%)
that
statements
or
forecasts
are or
will
be
true.
In the
other, perhaps
more
common
task, choice half-scale, respondents select
one of
two
answers
to a
question
and
then
give
confidence
estimates
from
0.5 to 1.0 (or 50% to
100%)
that
they
are
correct. Instructions
in
both
the
choice
and
nonchoice
paradigms
generally
limit
re-
spondents
to
categorical
probability
estimates
in
multiples
of 0.1
(or of
10).
When
judges
are not
restricted
to
categorical
responses,
the
estimates generally
are
gathered
for
purposes
of
analysis
into
categories
corresponding
to
such multiples.
The
graph
of
fraction
correct
in
choice half-scale
tasks
or of
statements
that
are
true
in
no-choice
full-scale
tasks
as a
function
of
subjective
probability
category
is
called
a
calibration
curve.
The
most
common
finding
in
general-knowledge
or
forecasting
domains
is
that
probability
estimates
are too
extreme,
which
is
interpreted
as
indicating overconfidence
on the
part
of the
judge.
130
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.