ArticlePDF Available

Why Expertise Matters: A Response to the Challenges

Authors:

Abstract and Figures

Five different scientific communities are challenging the abilities of experts and even the very concept of expertise: the decision research community, the sociology community, the heuristics and biases community, the evidence-based practices community, and the computer science community (including the fields of artificial intelligence, automation, and big data). Although each of these communities has made important contributions, the challenges they pose are misguided. This essay describes the problems with each challenge and encourages researchers in the five communities to explore ways of moving forward to improve the capabilities of experts.
Content may be subject to copyright.
NOVEMB ER/DECEMBER 2017 1541-1672/17/$33.00 © 2017 IEEE 67
Published by the IEEE Computer Societ y
HUMAN-CENTERED COMPUTING
Editor: Robert R. Ho ffman, Institute for Human and Machine Cognition, rhoffman@ihmc.us
Why Expertise Matters:
A Response to the
Challenges
Gary Klein, MacroCognition LLC
Ben Shneiderman, University of Maryland
Robert R. Hoffman and Kenneth M. Ford, Institute for Human and Machine Cognition
We dedicate this article to our colleague Robe rt Wears,
who tragically died in July 2017 just before we started
to work on this article.
Overwhelming scientic evidence demon-
strates that experts’ judgments can be highly
accurate and reliable. As dened in the scientic
literature,1 experts
employ more effective strategies than others,
and do so with less effort;
perceive meaning in patterns that others do not
notice;
form rich mental models of situations to support
sensemaking and anticipatory thinking;
have extensive and highly organized domain
knowledge; and
are intrinsically motivated to work on hard
problems that stretch their capabilities.
Our society depends on experts for mission-critical,
complex technical guidance for high-stakes decision
making because they can make decisions despite in-
complete, incorrect, and contradictory information
when established routines no longer apply.2 Experts
are the people the team turns to when faced with
difcult tasks.
Despite this empirical base, we witness a num-
ber of challenges to the concept of expertise. Tom
Nichols’ The Death of Expertise presents a strong
defense of expertise,3 a defense to which we are
adding in this article. We address the attempts
made by ve communities to diminish the credibil-
ity and value of experts (see Figure 1). These chal-
lenges come from
decision researchers who show that simple lin-
ear models can outperform expert judgment;
heuristics and biases researchers who have
claimed that experts are as biased as anyone else;
sociologists who see expertise as just a social
attribution;
practice-oriented researchers seeking to replace
professional judgments with data-based pre-
scriptions and checklists; and
technophiles who believe that it is only a matter
of time before articial intelligence (AI) surpasses
experts.
Each of these communities has questioned the
value of expertise, using different arguments, per-
spectives, and paradigms.
Society needs experts, even though they are fal-
lible. Although we are expert-advocates, eager to
highlight the strengths of experts, we acknowledge
that experts are not perfect and never will be. Our
purpose is to correct the misleading claims and im-
pressions being spread by the expertise-deniers.
Then we hope to engage productively with each of
these communities to improve human performance
through better training, workows, and technology.
We begin with the challenge from the decision
research community because this body of work
can be traced back the furthest, to the mid-1960s,
and echoes to this day.
68 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
The Challenge from
the Decision Research
Community
Research by some experimental
psychologists shows that in judg-
ment tasks, simple linear mod-
els will be more consistent in their
performance than human judges.
Examples are faculty ratings of
graduate students versus a model
based on grades and test scores, or
physicians’ ratings of cancer biopsy
results versus a model based on sur-
vival statistics:4,5
There are some, but only a few, truly re-
markable judges, whereas there are many
so-called experts who are no better than
complete novicesthe picture of the ex-
pert painted in broad brush strokes by this
research is relatively un atteringwhen-
ever possible, human judges should be re-
placed by linear models.6
A few aspects of this literature are
noteworthy:
The linear models are derived in
the rst place from the advice of
experts about what the key vari-
ables are—the variables that ex-
perts themselves use in making
judgments.
The decision research tends to re-
duce expertise to single measures,
such as judgment hit rate, ignoring
more qualitative contributions to
performance.
For many of the studies, it is not
obvious that the particular judg-
ment task that is presented to the
participants is actually the same
task that the experts routinely
perform, and therefore might
not be the task at which they are
procient.
For many of the studies, there is
scant evidence that the participants
who are called experts actually
qualify for that designation, apart
from their having had a certain
numbers of years of experience.
Advocates of this view go beyond
their empirical base by generalizing
from studies using college students
as judges to argue for the fallibility
of all judges.
Although linear models are con-
sistent, when they fail, they fail
miserably. One problem involves
“broken leg cues.” A linear model
might do a decent job of predicting
whether a given person is likely to
go to the movies this weekend, but
will fail because it is blind to the
fact that the person in question just
broke a leg.7 Experts will perform
better than the model if they have
information to which the model is
insensitive.8
Who are the experts? In some
studies, the linear models were com-
pared to college students, sometimes
called “judges.” And even in studies
in which the judges were profession-
als, perhaps the linear models should
be compared to the best of the profes-
sionals rather than to the average.
Even when the linear models out-
performed the experts, it is a mistake
to infer that the linear models got it
right and the experts failed miserably.
The linear models had their greatest
edge in domains6 and prediction tasks
involving human activity (that is,
clinical psychologists, psychiatrists,
counselors, admissions ofcers, pa-
role ofcers, bank loan ofcers, and
so on). However, the linear models
weren’t very accurate—it was just
that the experts were even worse. In
one often-cited study of cancer di-
agnosis, the linear model performed
better than oncologists at predicting
patient longevity, but a closer look
shows that the model only accounted
for 18 percent of the variance in the
judgment data.9 The clearest conclu-
sion from this and other studies on
linear modeling is that some things
are of intrinsic low predictability.
Next is the challenge from the
heuristics and biases (HB) commu-
nity, which can be traced to the early
1970s.
The Challenge from the
Heuristics and Biases
Community
Led by Daniel Kahneman and Amos
Tver s k y,10 the HB community has
called into question assumptions
about rationality by demonstrating
that people fall prey to a wide variety
of biases in judgment and probabilis-
tic reckoning, and that even experts
sometimes show these biases. This
nding helped create a mindset that
experts are not to be trusted. The
proliferation of HB research in aca-
demic psychology departments has
strengthened the impression that ex-
pert judgments are not accurate.
The HB paradigm typically uses
participants who are not experts (col-
lege students) and gives them arti-
cial laboratory tasks that require little
training and have little or no ecologi-
cal validity. The tasks, conveniently
enough, can be performed in a col-
lege class period. Bias effects found
using this “paradigm of convenience”
Challenges to expertise
Evidence-based
practices
Heuristics
and biases
Computer
science
Sociology Decision
research
Expertise
Figure 1. The ve communities that have
challenged the concept of expertise.
NOVEMB ER/DECEMBER 2017 www.computer.org/intelligent 69
diminish or disappear when research-
ers add context11 or when research-
ers have genuine experts engage in
their familiar environments rather
than work on articial puzzles
and probability-juggling tasks. Vari-
ations in the materials, instructions,
procedures, or experimental design
can cause bias effects to diminish or
disappear.12
Although some studies have shown
that bias can occur in expert reason-
ing,13 several studies show that bias
effects are much smaller than those of
the college students.14 There is mixed
evidence for the claim that experts
tend to be overcondent, and what
evidence there is stems from narrow
methods for measuring condence.
Experts such as weather forecasters
and reghters are careful to keep
their judgments within their core spe-
cialty and to use experience and ac-
curate feedback to attain reasonable
levels of condence in their judg-
ments. Weather forecasters search for
evidence that conrms a hypothesis;
it would be irrational not to. On the
other hand, weather forecasters de-
liberately and deliberatively look for
evidence that their hypotheses might
be wrong.
The HB researchers’ antipathy to-
ward experts opened the way for
the emergence of the naturalistic
decision-making movement,15 which
regards heuristics as strengths ac-
quired through experience, rather
th an we akne sse s. Tver sky a nd
Kahneman were careful to state that,
“In general these heuristics are quite
useful, but sometimes they lead to se-
vere and systematic errors.”16 How-
ever, the HB eld usually ignores this
caveat and emphasizes the downside
of heuristics.
We now come to the challenge
from sociology, which began in the
1970s and emerged most forcefully in
the 1980s.
The Challenge from
Sociology
Sociological analysis of the conse-
quences of occupational special-
ization has considered the value of
professions to society.17 Given its
close association to the concept of
professions, the concept of expertise
was also assessed from the sociologi-
cal perspective, referred to as “science
and technology studies.” Ethnogra-
phers, sociologists, and philosophers
of science researched expertise in do-
mains including astronomy, physics,
and endocrinology.18–20 Their reso-
nant paradigms have been referred
to as “situated cognition,” “distrib-
uted cognition,” and the “sociology
of scientic knowledge.”21–24 Some
individuals have dened their para-
digm, in part, as a reaction to cogni-
tive psychology:
If one relegates all of cognition to in-
ternal mental processes, then one is re-
quired to pack all the explanatory ma-
chinery of cognition into the individual
mind as well, leading to misidentica-
tion of the boundaries of the cognitive
syste m, and the over-at tribution to the
individual mind alone all of the process-
es that give rise to intelligent behavior.25
Proponents of the situated cogni-
tion approach offer many good ex-
amples of why one should dene “the
cognitive system” as persons acting
in coordination with a social group
to conduct activities using tools and
practices that have evolved within a
culture.25 The core claim is that ex-
pertise and cognition reside in the in-
teraction among the individual and
the team, community, or organiza-
tion. The strongest view is that ex-
pertise is a social attribution or role,
a matter of prestige and authority. A
moderate view is that individual cog-
nition is an enabling condition for ex-
pertise, which just happens to be a
condition that is not of particular in-
terest in a sociological analysis.
One of the valuable aspects of this
perspective is to sensitize us to the
importance of external resources and
community relationships for the ac-
quisition, expression, and valuation
of expertise. Thus, we respect these
researchers and their contributions.
The importance of context has been
recognized in cognitive psychology
for decades,26 and in computer sci-
ence as well.27 We agree completely
that resources for cognition are in
the world. We agree that teamwork
and organizational issues are an im-
portant part of naturalistic decision
making. Indeed, the notion of “mac-
rocognition”28 refers to coordinating
and maintaining common ground as
primary functions. However, cog-
nitive scientists are disappointed by
any approach that takes the strong
stance that expertise is merely a so-
cial attribution, a stance that dis-
counts the importance and value of
individual cognition, knowledge, and
expertise.27
There is overwhelming empirical
evidence that individual knowledge is
crucial in expert reasoning and prob-
lem solving. If you plug experts and
nonexperts into the same work set-
tings you will nd huge differences
in the quality of the outputs of the
groups/teams. The claims derived from
a dismissive reaction to cognitive-
individualist views move the pendu-
lum too far. Sociologists including
Harry Collins29 and Harald Mieg30
have taken the balanced view, which
we advocate, describing the impor-
tance of individual expertise along
with social and contextual factors
that can be essential for developing
and maintaining expertise. We re-
main hopeful that over time, this bal-
anced view will predominate.
We now come to challenges that
have emerged most recently.
70 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
The Challenge from the
Evidence-Based Practices
Community
The evidence-based practices com-
munity argues that professionals need
to nd the best scientic evidence, de-
rive prescriptive procedures for deci-
sions, and adhere to these procedures
rather than rely on their own judg-
ments.31 This approach has been ad-
vocated in healthcare, where it is
referred to as evidence-based medi-
cine. This community argues against
trusting experts because they rely
on anecdotal practices and out-of-
date and ineffective remedies. This
takeaway message seeks to replace
reliance on experts with faith in
dened scientic studies. Clearly, em-
pirical evaluation studies have great
value, but we do not believe that such
studies deserve uncritical acceptance.
Witness how the evidence seems to
change every two years. Witness also
the difculty of sorting out the evi-
dence base for a patient with multiple
medical problems. Clinicians must
consider the individual patient, who
may differ from the criteria on which
the evidence is based. We seek a bal-
ance between scientic evidence and
broad experience.32
One way that evidence is compiled
is through checklists. These are valu-
able safety tools to prevent decision
makers from omitting important
steps in a process, but they are not
decision-support tools. We believe
that reducing complex judgments
to simple checklists often misses es-
sential aspects of decision making.33
Checklists work for stable, well-
dened tasks, and have to be carefully
crafted with a manageable number of
steps. If the checklist is sequential,
each step must lead to a clear out-
come that serves as the trigger for the
next step. However, in complex and
ambiguous situations, the anteced-
ent conditions for each step are likely
to be murky; expert decision makers
must determine when to initiate the
next step or whether to initiate it at
all. Although checklists can be help-
ful, it is risky to have individuals use
checklists for complex tasks that de-
pend on considerable tacit knowledge
to judge when a step is appropriate,
how to modify a step, and how to de-
cide whether the checklist is work-
ing.34 Experts must decide what to
do when the best practices conict
with their own judgments. They must
revise plans that do not seem to be
working. It is one thing to hold physi-
cians to task for relying on ineffective
remedies and ignoring scientic evi-
dence that the procedures that they
were once taught that have since been
shown ineffective, but it is another
thing to compel physicians to rely
on scientic evidence by procedural-
izing clinical judgment in a checklist
and penalize them for not following
the steps.
Guidelines, rules, and checklists
raise the oor by preventing silly
errors—mistakes that even a rst-
year medical student might recog-
nize as an error. But they also lower
the ceiling, making it easy to shift to
an unthinking, uncritical mode that
misses subtle warning signs and does
not serve the needs of patients.
Finally, we come to the challenge
from within computer science itself.
The Challenge from
Computer Science
This challenge has been presented
on three fronts: AI, big data, and
automation. It is claimed that these
technologies are smarter and more re-
liable than any human. Since experts
are the gold standard of performance,
demonstrations of smart technology
win big when they beat out an expert.
AI successes have been widely
publicized. IBM’s Deep Blue beat
Garry Kasparov, the reigning chess
champion at the time. IBM’s Watson
beat a panel of experts at the game of
Jeopardy. AlphaGo trounced one of
the most highly regarded Go masters.
These achievements have been inter-
preted as showing that AI can out-
perform humans at any cognitively
challenging task. But the successes in-
volve games that are well-structured,
with unambiguous referents and de-
nitive correct answers. In contrast,
most decision makers face wicked
problems with unclear goals in am-
biguous and dynamic situations.
Roger Schank, an AI pioneer, stated
atly that “Watson is a fraud.”35 He
objected to IBM’s claims that Watson
could outthink humans and nd in-
sights within large datasets. Although
Watson excels at keyword searches,
it does not consider the context of
the passages it is searching, and as
a result is insensitive to underlying
messages in the material. Schank’s
position is that counting words is not
the same as inferring insightful con-
clusions. Our experience is that AI
developers have much greater appre-
ciation for human expertise than the
AI popularizers.
A good example of the challenge
to expertise comes from the weather
forecasting domain. Articles with
titles such as “All Hail the Com-
puter!”36 promulgate the myth that
if more memory and faster process-
ing speeds could be thrown at the
problem, the need for humans would
evaporate. Starting in the late 1980s,
as more computer models were in-
troduced into operational forecast-
ing, prognostications were made
that computer models would out-
perform humans within the next 10
years—for example, “[The] human’s
advantage over the computer may
eventually be swamped by the vastly
increased number crunching ability
of the computer ... as the computer
driven models will simply get bigger
NOVEMB ER/DECEMBER 2017 www.computer.org/intelligent 71
and better.37 Articles in the scientic
literature as well as the popular press
continue to present the stance of hu-
man versus machine, asking whether
“machines are taking over.36 This
stance conveys a counterproductive
attitude of competition in which the
experts cannot beat the computers.
A more productive approach would
be to design technologies that enhance
human performance. The evidence
clearly shows that the expert weather
forecaster adds value to the outputs of
the computer models. Furthermore,
“numerical prediction models do not
produce a weather forecast. They pro-
duce a form of guidance that can help
a human being decide upon a forecast
of the weather.”38,39
Next, we turn to the denigration
of expertise that has been expressed
by advocates of big data analytics.
Despite their widely publicized suc-
cesses, a closer look often tells a dif-
ferent story. For instance, Google’s
FluTrends project initially seemed
successful at predicting u outbreaks,
but over time it misled public health
planners.40 Advocates of big data
claim that the algorithms can detect
trends, spot problems, and generate
inferences and insights; no human,
no matter how expert, could pos-
sibly sift through all of the avail-
able sensor data; and no human can
hope to interpret even a fraction of
these data sources. These statements
are all true. But the big data com-
munity wants to reduce our trust in
domain experts so decision makers
become comfortable using automated
big data analyses. Here is a typical
and dangerous claim: “The big tar-
get here isn’t advertising, though. It’s
science … faced with massive data,
this approach to science—hypothesize,
model, test—is becoming obsolete
Petabytes allow us to say: Correlation
is enough. We can stop looking for
models.”41
A balanced view recognizes that
big data analytics can identify pat-
terns where none exist. Big data al-
gorithms can follow historical trends
but might miss departures from these
trends, as in the broken leg cues, cues
that have implications that are clear
to experts but aren’t part of the algo-
rithms. Further, experts can use ex-
pectancies to spot missing events that
may be highly signicant. In contrast,
big data approaches, which crunch
the signals received from a variety of
sources, are unaware of the absence
of data and events.
Finally, we consider the challenge
offered by proponents of automation.
Some researchers in the automation
community have promulgated the
myth that more automation can ob-
viate the need for humans, including
experts. The enthusiasm for technol-
ogies is often extreme.42 Too many
technologists believe that automa-
tion can compensate for human lim-
itations and substitute for humans.
They also believe the myth that tasks
can be cleanly allocated to either the
human or the machine. These mis-
leading beliefs have been questioned
by cognitive systems engineers for
more than 35 years, yet the debunk-
ing has to be periodically refreshed
in the minds of researchers and pro-
gram managers.43 The misleading be-
liefs persist because of the promissory
note that more automation means
fewer people, fewer people means
fewer errors, and (especially) fewer
people means reduced costs.44
Nearly every funding program that
calls for more automation is premised
with the claim that the introduction
of automation will entail a need for
fewer expert operators at potentially
lower cost to the organization. But
the facts are in plain view: The intro-
duction often requires more experts.
Case studies45 show that automation
creates new kinds of cognitive work
for the operator, often at the wrong
times. Automation often requires
people to do more, to do it faster, or
to do it in more complex ways. The
explosion of features, options, and
modes often creates new demands,
new types of errors, and new paths
toward failure. Ironically, as these
facts became apparent, decision mak-
ers seek additional automation to
compensate for the problems trig-
gered by the automation.44
We see technology—AI, big data,
and automation—continuing to im-
prove, which will make computers
ever more valuable tools. But in the
spirit of human-centered comput-
ing, we dene intelligent systems as
human-machine systems that am-
plify and extend human abilities.46
The technology in such work systems
is designed to improve human per-
formance and accelerate the achieve-
ment of expertise. We hope that
expertise-deniers can get past the
mindset of trying to build systems to
replace the experts and instead seek
to build useful technologies that em-
power experts.
If the challenges to expertise hold
sway, the result might be degrada-
tion of the decision making and resil-
ience of organizations and agencies.
Once such organizations accept the
expertise-deniers’ arguments, they
may sideline domain experts in favor
of statistical analysts and ever more
automation. They are likely to di-
vert funding from training programs
that might produce experts into tech-
nology that makes decisions without
human intervention or responsible
action. Shifting cognitive work over
to automation may deskill workers,
erode the expertise that is crucial for
adaptability, and lead to a downward
spiral of diminishing expertise.
Experts are certainly not perfect,
so the challenges can be useful for
increasing our understanding of the
72 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
boundary conditions of expertise. We
do not want to return to an era where
medicine was governed by anecdote
rather than data—we think it essen-
tial to draw from evidence and from
expertise. We appreciate the dis-
coveries of the heuristics and biases
researchers—the heuristics they have
uncovered can have great value for fos-
tering speculative thinking. We respect
the judgment and decision research
community—we want to take advan-
tage of their efforts to improve the way
we handle evidence and deploy our
intuitions. We want to productively
move forward with improved informa-
tion technology—we want these tools
to be designed to help us gain and en-
hance expertise. We value the social
aspects of work settings—we want
to design work settings and team ar-
rangements that magnify expertise.
Our hope is to encourage a balance
that respects expertise while designing
new ways to strengthen it.
We regard the design of cognitive
work systems as the design of human-
machine interdependencies, guided by
the desire to make the machines com-
prehensible, predictable, and control-
lable. This course of action seems best
suited to promote human welfare and
enable greater achievements.47,48
Acknowledgments
We thank Bon ni e Dorr, Ha l Daume, Jon-
athan Laza r, Jim Hendler, Mark Smith,
and Jenny Preece for their comments on
a draft of this article; and Jan Maarten
Sch raagen and Paul Ward for their helpful
comments and suggestions and for their
patience and encouragement. Th is essay
was adapted from a longer and more in-
dept h account, “T he War on E xperts,”
that will appear in the Oxford H andbook
of Expertise.49
References
1. K.A. Ericsson et al., Cambridge
Handbook of Expertise and Expert
Performan ce, 2nd ed., Cambridge Univ.
Press, 2017.
2. B. Shneiderman and G. Klein, “Tools
That Aid Expert Decision Making:
Supporting Frontier Thinking, Social
Engagement and Responsibility,” blog,
Psycholog y Today, Mar. 2017; www
.psychologytoday.com /blog /seeing-what
-others-dont/201703/tools-aid-expert
-decision-making-rather-degrade-it.
3. T. Nichols, The Death of Expertise,
Oxford Univ. Press, 2017.
4. R. Dawes, “The Robust Beauty of
Improper Linear Models,” American
Psychologist, vol. 34, no. 7, 1979,
pp. 571–582.
5. P.E. Meehl, “Seer Over Sign: The First
Good Example,” J. Experimental
Research in Personality, vol. 1, no. 1,
1965, pp. 27–32.
6. R. Hastie and R. Dawes, Rational
Choice in an Uncertain World, Sage
Publications, 2001.
7. K. Salzinger, “Clinical, Statistical, and
Broken-Leg Predictions,” Behavior and
Philosophy, vol. 33, 2005, pp. 91–99.
8. R. Johnston, Analytic Culture in the
U.S . Intelligence Community: An Eth-
nographic Study, Center for the Study
of Intelligence, Washington, DC, 2005.
9. H.J. Einhorn and R.M. Hogarth,
“Judging Probable Cause,” Psyc hologi-
cal Bull., vol. 99, no. 1, 1978, pp. 3 –19.
10. D. Kahneman and A . Tversky, “Pros-
pect Theory: An Analysis of Decision
under Risk, Econometrica, vol. 47,
no. 2, 1979, pp. 263 –291.
11. D.W. Cheng et al., “Pragmatic Versus
Syntactic Approaches to Training De-
ductive Reasoning,” Cognitive Psychol-
ogy, vol. 18, no. 3, 1986, pp. 293–328.
12. R. Hertwig and G. Gigerenzer, “The
‘Conjunction Fallacy’ Revisited: How In-
telligent Inferences Look like Reasoning
Errors,J. Behavioral Decision Making,
vol. 12, no. 2, 1999, pp. 27–305.
13. B. Fischhoff, “Eliciting Knowledge
for Analytical Representation,” IEEE
Trans. Systems, M an, and Cybernetics,
vol. 19, no. 3, 1989, pp. 448– 461.
14. M.D. Shields, I. Solomon, and W.S.
Waller, “Effects of Alternative Sample
Space Representations on the Accuracy
of Auditors’ Uncertainty Judgments,”
Accounting, Organizations, and Soci-
ety, vol. 12, no. 4, 1987, pp. 375–385.
15. G. Klein, R. Calderwood, and A. Clinton-
Cirocco, “Rapid Decision Making on
the Fire Ground,Proc. Human Factors
and Ergonomics Soc. Ann. Meeting,
vol. 30, no. 6, 1986, pp. 576 –580.
16. A. Tversky and D. Kahneman, “Judg-
ment under Uncertainty: Heu ristics and
Biases, Science, vol. 185, Sept. 1974,
pp. 1124–1131.
17. J. Evetts, “Professionalism: Value and
Ideology,Current Sociology, vol. 61,
no. 5– 6, 2013, pp. 778 –779.
18. H.M. Collins, Changing Order.
Replication and Induction in Scientic
Practice, 2nd ed., Univ. of Chicago
Press, 1992.
19. B. Latour and S. Woolgar, Laborato ry
Life. The Social Construction of Sci-
entic Facts, Sage Publications, 1979.
20. M. Lynch, Scientic Practice and Or-
dinary Action, Cambridge Univ. Press,
1993.
21. K.D. Knorr-Cetina, The Manufacture
of Knowledge, Pergamon Press, 1981.
22. J. Lave, “Situating Learning in Com-
munities of Practice,” Perspectives on
Socially Shared Cognition, L.B. Resnick,
J.M. Levine, and S.D. Teasley, eds.,
American Psychological Assoc., 1993,
pp. 63–82.
23. L. Suchman, Plans and Situated Ac-
tions: The Problem of Human-Machine
Communication, Cambridge Univ.
Press, 1987.
24. E. Wenger, Communities of Practice:
Lear ning, Meaning, & Identity, Cam-
bridge Univ. Press, 1998.
25. M.S. Weldon, “Remembering as a
Social Process,” The Psychology of
Lear ning and Motivation, vol. 40,
no. 1, 2000, pp. 67–120.
26. G.A. Miller, “Dismembering Cog ni-
tion,” One Hundred Years of
Psychological Research in America,
NOVEMB ER/DECEMBER 2017 www.computer.org/intelligent 73
Johns Hopkins Univ. Press, 1986,
pp. 277–298.
27. N.M. Agnew, K.M. Ford, and P.J.
Hayes, “Expertise in Context: Person-
ally Construc ted , Socially Selected
and Reality-Relevant?” Int’l J. E xpe rt
Systems, vol. 7, no. 1, 19 94, pp.
65–88.
28. G. Klein et al., “Macrocognition,”
IEEE Intelligent Systems, vol. 18, no. 3,
2003, pp. 81–85.
29. H.M. Collins,A Sociological/
Philosophical Perspective on Expertise:
The Acquisition of Expertise Through
Socialization,Cambridge Handbook
of Expertise and Expert Performance,
2nd ed., K.A. Ericsson et al., Cambridge
Univ. Press, 2017.
30. H.A. Mieg, “Social and Sociological
Factors in the Development of Exper-
tise,” Cambridge Handbook of Ex-
pertise and Expert Performance, K.A.
Ericsson et al., Cambridge Univ. Press,
2006, pp. 743–760.
31. A.R. Roberts and K.R. Yeager, eds.,
Evidence -Based Practice Manual:
Research and Outcome Measures in
Health and Human Ser vices, Oxford
Univ. Press, 2004.
32. E. Barends, D.M. Rousseau, and R.B.
Briner, Evidence- Based Manage-
ment: The Basic Principles, Center for
Evidence-Based Management,
Amsterdam, 2014.
33. R.L. Wears and G. Klein, “The Rush
from Judgment,” Annals of Emergency
Medicine, forthcoming, 2017.
34. D.E. Klein et al., “Can We Trust Best
Practices? Six Cognitive Challenges of
Evidence-Based Approaches,” J. Cogni-
tive Eng. and Decision Making, vol. 10,
no. 3, 2016, pp. 244 –254.
35. R. Schank, “The Fraudulent Claims
Made by IBM about Watson and AI,”
2015; www.rogerschank.com
/fraudulent-claims-made-by-IBM
-about-Watson-and-AI.
36. R.A. Kerr, “Weather Forecasts Slowly
Clearing Up,” Science, vol. 38, no. 388,
2012, pp. 734–737.
37. P.S. Targett, “Predicting the Future
of the Meteorologist: A Forecaster’s
View,” Bull. Australian Meteorological
and Oceanographic Soc., vol. 7, no. 1,
1994, pp. 46 –52.
38. H.E. Brooks, C. A. Doswell, and R.A.
Maddox, “On the Use of Mesoscale
and Cloud-Scale Models in Operational
Forecasting,” Weather and Forec asting,
vol. 7, Mar. 1992, pp. 120 –132.
39. R.R. Hoffman et al., Minding the
Weather: How Expert Forecasters
Think, MIT Press, 2017.
40. D. Lazer et al., “The Parable of Google
Flu: Traps in the Big Data Analysis,”
Science, vol. 343, 14 Mar. 2014,
pp. 1203–1205.
41. C. Anderson, “The End of Theory: The
Big Data Deluge Makes the Scientic
Method Obsolete,” Wired, 23 June
2008; w ww.wired.com/2008/06
/pb-theory.
42. E. Brynjolfsson and A. McAfee, The Sec-
ond Machine Age: Work, Progress, and
Prosperity in a Time of Brilliant Tech-
nologies, W.W. Nor ton & Co., 2014.
43. J.M. Bradshaw et al., “The Seven
Deadly Myths of ‘Autonomous Sys-
tems’,” IEEE Intelligent Systems,
vol. 28, no. 3, 2013, pp. 54– 61.
44. “Technical Assessment: Autonomy,”
report from the Ofce of the Assistant
Secretary of Defense for Research and
Engineering, Ofce of Technical Intelli-
gence, US Department of Defense, 2015.
45. R.R. Hoffman, T.M. Cullen, and J.K.
Hawley, “Rhetoric and Reality of Auton-
omous Weapons: Getting a Grip on the
Myths and Costs of Automation,” Bull.
Atomic Scientists, vol. 72, no. 4, 2016;
doi:10.1080/00963402. 2016.1194619.
46. J.M. Johnson et al., “Beyond Coop-
erative Robotics: The Central Role of
Interdependence in Coactive Design,”
IEEE Intelligent Systems, vol. 26, no. 3,
2011, pp. 81–88.
47. M. Johnson et al., “Seven Cardinal Vir-
tues of Human-Machine Teamwork,”
IEEE Intelligent Systems, vol. 29, no. 6,
2014, pp. 74–79.
48. G. Klein et al., “Ten Challenges for
Making Automation a ‘Team Player’ in
Joint Human-Agent Activity,” IEEE In-
telligent Systems, vol. 19, no. 6, 2004,
pp. 91–95.
49. P. Ward et al., eds., Oxford Handbook
of Expertise, Oxford Univ. Press, forth-
coming.
Gary Klein is senior scientist at M acroCog-
nition LLC. His research interests include
naturalistic decision making. Klein received
his PhD in experimental psychology from
the University of Pittsburgh. Contact him at
gary@macrocognition.com.
Ben Shneiderman is distinguished univer-
sity professor in the Department of Com-
puter Science at the University of Maryland.
His research interests include human-com-
puter interaction, user experience design, and
information visualization. Shneiderman has
a PhD in computer science from SUNY-Stony
Brook. Contact him at ben@cs.umd.edu.
Robert R. Hoffman is a senior research
scientist at the Institute for Human and
Machine Cognition. His research interests
include macrocognition and complex cog-
nitive systems. Hoffman has PhD in exper-
imental psychology from the University of
Cincinnati. He is a fellow of the Association
for Psychological Science and the Human
Factors and Ergonomics Society and a senior
member of IEEE. Contact him at rhoffman@
ihmc.us.
Kenneth M. Ford is director of the Insti-
tute for Human and Machine Cognition.
His research interests include articial in-
telligence and human-centered computing.
Ford has a PhD in computer science from
Tulane University. Contact him at kford@
ihmc.us
Read your subscriptions
through the myCS
publications portal at
http://mycs.computer.org
... Although the study of cognitive biases has been influential (Kahneman, 2003), questions remain regarding the importance and generalizability of effects demonstrated in the research laboratory. Critics correctly note difference between convenience samples of students and the experienced professionals to whom we seek to generalize (Klein et al., 2017;Luan et al., 2019). Moreover, there are real differences between the judgments studied in the lab and highstakes business decisions (Levitt & List, 2007). ...
... The results we present contradict the theory of ecological rationality, which holds that professional experience ought to minimize decision biases (Klein et al., 2017;Luan et al., 2019). The study of naturalistic decision making focuses on professionals operating within their domains of expertise. ...
Article
Full-text available
Every decision depends on a forecast of its consequences. We examine the calibration of the single longest and most complete forecasting project. The Survey of Professional Forecasters has, since 1968, collected predictions of key economic indicators such as unemployment, inflation, and economic growth. Here, we test the accuracy of those forecasts (n = 16,559) and measure the degree to which they fall victim to overconfidence, both overoptimism and overprecision. We find forecasts are overly precise; forecasters report 53% confidence in the accuracy of their forecasts, but are correct only 23% of the time. By contrast, forecasts show little evidence of optimistic bias. These results have important implications for how organizations ought to make use of forecasts. Moreover, we employ novel methodology in analyzing archival data: we split our dataset into exploration and validation halves. We submitted results from the exploration half to Collabra:Psychology. Following editorial input, we updated our analysis plan for the validation dataset, preregistering only analyses that were consistent across different economic indicators and analytic specifications. This manuscript presents results from the full dataset, prioritizing results that were consistent in both halves of the data.
... However, building knowledge in this way only allows for the audience to learn from the explanation agent. In contrast, there are also likely to be instances where experts have more knowledge than the explainer, resulting in them outperforming the system-generated explanation (Klein et al. 2017). ...
... Whilst we look to XAI to communicate unknown patterns and influences extracted from the prescribed data, the expert audience adds breadth, supplementing the prescribed data with their own peripheral knowledge and undocumented experiences. Hence, designing explanation with an interactive dialogue in mind allows for the e2318670-12 development of a "learning loop," which ultimately enhances the performance of both the XAI agent and the audience (Klein et al. 2017). ...
Article
Full-text available
Scholars often recommend incorporating context into the design of an explainable artificial intelligence (XAI) model in order to ensure successful real-world adoption. However, contemporary literature has so far failed to delve into the detail of what constitutes context. This paper addresses that gap by firstly providing normative and XAI-specific definitions of key concepts, thereby establishing common ground upon which further discourse can be built. Second, far from pulling apart the body of literature to argue that one element of context is more important than another, this paper advocates a more holistic perspective which unites the recent discourse. Using a thematic review, this paper establishes that the four concepts of setting, audience, goals and ethics (SAGE) are widely recognized as key tools in the design of operational XAI solutions. Moreover, when brought together they can be employed as a scaffold to create a user-centric XAI real-world solution.
... However, building knowledge in this way only allows for the audience to learn from the explanation agent. In fields such as fraud detection, there are also likely to be instances where experts have more knowledge than the explainer, resulting in them outperforming the system-generated explanation [116]. ...
... Whilst we look to XAI to communicate unknown patterns and influences extracted from the prescribed data, the expert audience adds breadth, supplementing the explanation with their own peripheral knowledge and undocumented experiences. Hence in expert systems, designing explanation with an interactive dialogue in mind allows for the development of a "learning loop", which ultimately enhances the performance of both the XAI agent and the audience [116]. ...
... SRT argues that because humans are social beings, they naturally respond socially to IT exhibiting human-like traits. IT advancements allow designers to minimise cognitive effort in human-machine interactions by creating advanced interfaces that leverage existing mental models and mimic natural social interactions (Klein et al., 2017). AI-powered digital assistants exemplify these expanded capabilities, enabling media agents with broader social affordances. ...
Article
Purpose Digital voice assistants (DVAs) are revolutionising consumers’ interactions with technology and businesses. Whilst research on the adoption of these devices is rapidly expanding, few have explored post-adoption behaviour. To fill this gap, we investigate how functionality and human-like features shape customers’ emotions, engagement and loyalty towards DVAs. Design/methodology/approach The data were collected through a self-administered online survey from 509 DVA users. Structural equation modelling was employed for data analysis. Findings The results reveal that distinct human-like and functional factors of DVA independently explain customers’ positive emotions and engagement with DVAs. Positive emotions and engagement significantly impact customer loyalty to DVAs. The study shows that localisation of DVAs has a significant positive moderating influence on the service experience-customer engagement relationship but a negative moderating influence on the anthropomorphism-customer engagement relationship. Originality/value Unlike previous research, this study contributes to the literature by delving into post-adoption phenomena. It explains how DVAs’ human-like and functional attributes drive customers’ positive emotional responses, engagement and loyalty towards DVAs. The findings not only unveil new insights into the moderating role of localisation but also provide a crucial understanding regarding the boundary conditions of the influence of anthropomorphism and service experience on customer engagement.
... Experts' judgments are highly regarded and are generally accurate. Experts can be characterized as: "employing more effective strategies than others; perceiving meaning in patterns that others do not notice; form rich mental models of situations to support sense making and anticipatory thinking; have extensive and highly organised domain knowledge; and are intrinsically motivated to work on hard problems that stretch their capabilities" (Klein et al. 2017). Similar positive comments about domain experts are made by (Chi 2006). ...
Article
Full-text available
Machine Learning systems rely heavily on annotated instances. Such annotations are frequently done by human experts, or by tools developed by experts, and so the central message of this book, Noise: A Flaw in Human Judgment (Kahneman, Sibony, and Sunstein 2021) is of considerable importance to AI/Machine Learning community. The core message is that if a number of experts are asked to annotate tasks that involve judgments, these responses will frequently differ. This observation poses a problem for how analysts choose a particular annotated dataset (from the group), or process the set of responses to give a “balanced” response, or whether to reject all the annotated datasets. A further important aspect of this book is the case studies which demonstrate that differences in judgments between fellow experts have been reported in a significant number of disciplines including, business, the law, government, and medicine. Kahneman, Sibony and Sunstein (2021), referred to as KSS subsequently, discuss how Expert Biases can be reduced, but the main focus of this book is a discussion of Noise, that is, differences that often occur between fellow experts, and how Noise can often be reduced. To address the last point KSS have formulated a set of six decision hygiene principles which include the recommendation that complex tasks should be subdivided, and then each subtask should be solved separately. A further principle is that each task should be solved by individual experts before the various judgments are discussed with fellow experts. Effectively, the book being reviewed covers three main topics: First, it reports several motivating studies that show how judgments of fellow experts varied significantly in the pricing of insurance premiums, and in setting the lengths of custodial sentences. These motivating studies very effectively illustrate the central concepts of Judgment, Noise, and Bias; that section also provides definitions of these core concepts and discusses how Noise is often amplified in group meetings. Secondly, the authors provide detailed discussion of further studies, in a variety of domains, which report the levels of disagreement between experts. Thirdly, KSS discusses how to reduce the levels of Noise between experts, as noted above, the authors refer to these as Principles of Noise Hygiene. These three parts are interwoven in a complex way throughout the book; in our view, the best overview of the book is given in the section Review and Conclusions: Taking Noise Seriously (KSS, p. 361).
... Thus, unlike jurors or laypeople, who rarely need to rely on meta-information, judges can be expected to have considerable expertise in weighing information in the context of their profession. Experts' judgments are often thought of as superior, as experts seem able to resort to more efficient task-specific strategies compared to other people (Klein, Shneiderman, Hoffman & Ford, 2017). By extension, judges may be especially able to make judgments and decisions while remaining unaffected by erroneous evidence. ...
Article
Full-text available
Previous studies have shown that people are truth-biased in that they tend to believe the information they receive, even if it is clearly flagged as false. The truth bias has been recently proposed to be an instance of meta-cognitive myopia, that is, of a generalized human insensitivity towards the quality and correctness of the information available in the environment. In two studies we tested whether meta-cognitive myopia and the ensuing truth bias may operate in a courtroom setting. Based on a well-established paradigm in the truth-bias literature, we asked mock jurors (Study 1) and professional judges (Study 2) to read two crime reports containing aggravating or mitigating information that was explicitly flagged as false. Our findings suggest that jurors and judges are truth-biased, as their decisions and memory about the cases were affected by the false information. We discuss the implications of the potential operation of the truth bias in the courtroom, in the light of the literature on inadmissible and discredible evidence, and make some policy suggestions.
Article
While, by default, people tend to believe communicated content, it is also possible that they become more vigilant when personal stakes increase. A lab ( N = 72) and an online ( N = 284) experiment show that people make judgements affected by explicitly tagged false information and that they misremember such information as true – a phenomenon dubbed the ‘truth bias’. However, both experiments show that this bias is significantly reduced when personal stakes – instantiated here as a financial incentive – become high. Experiment 2 also shows that personal stakes mitigate the truth bias when they are high at the moment of false information processing, but they cannot reduce belief in false information a posteriori, that is once participants have already processed false information. Experiment 2 also suggests that high stakes reduce belief in false information whether participants’ focus is directed towards making accurate judgements or correctly remembering information truthfulness. We discuss the implications of our findings for models of information validation and interventions against real‐world misinformation.
Article
Full-text available
Providing decision makers with more information is often expected to result in more informed and superior decisions. This is especially true when leveraging artificial intelligence (AI) to explore and find complex patterns in vast amounts of data. Although AI can enable an “information advantage,” truly intelligent systems should buffer scarce human cognitive resources from information overload and be well adapted to the environment in which they are deployed. Paradoxically, some practitioners have conflated AI's information processing superiority with a contradictory decision-support goal: to provide human decision makers with more , higher quality, or more novel courses of action, regardless of context, than they could generate without AI. In this article, I review the evidence examining the costs and benefits of providing decision makers with more or less choice and identify the factors that moderate the relationship between the amount of choice and decision effectiveness. Although providing more information and choice increases confidence and certainty in one's decision, it can make decision making more difficult, decrease satisfaction, and result in poorer decision outcomes. The research indicates that such negative effects are influenced by the level of entropy and variety provided and can be reduced with increased familiarity but are further compounded when decisions are increasingly effortful, difficult, or complex. The review concludes with guidance on how designers might leverage knowledge of choice overload and associated moderator effects to create more adaptive and effective decision support systems.
Article
Full-text available
There is a growing popularity of data-driven best practices in a variety of fields. Although we applaud the impulse to replace anecdotes with evidence, it is important to appreciate some of the cognitive constraints on promulgating best practices to be used by practitioners. We use the evidence-based medicine (EBM) framework that has become popular in health care to raise questions about whether the approach is consistent with how people actually make decisions to manage patient safety. We examine six potential disconnects and suggest ways to strengthen best practices strategies.
Article
Full-text available
This article plays counterpoint to our previous discussions of the “seven deadly myths” of autonomous systems. The seven deadly myths are common design misconceptions to be acknowledged and avoided for the ills they breed. Here, we present seven design principles to be understood and embraced for the virtues they engender. The cardinal virtues of classical antiquity that were adopted in Christian tradition included justice, prudence, temperance, and fortitude (courage). As we’ll show in this essay, in effective human-machine teamwork we can also see virtues at play—namely clarity, humility, resilience, beneficence (helpfulness), cohesiveness, integrity, and thrift. As we unfold the principles that enable these virtues to emerge, it will become clear that fully integrating them into the design of intelligent systems requires the participation of a broad range of stakeholders who aren’t always included in such discussions, including workers, engineers, operators, and strategic visionaries developing research roadmaps. The principles aren’t merely for the consumption of specialists in human factors or ergonomics. We illustrate these principles and their resultant virtues by drawing on lessons learned in the US Defense Advanced Research Projects Agency (DARPA) Robotics Challenge (DRC).
Book
A detailed study of research on the psychology of expertise in weather forecasting, drawing on findings in cognitive science, meteorology, and computer science. This book argues that the human cognition system is the least understood, yet probably most important, component of forecasting accuracy. Minding the Weather investigates how people acquire massive and highly organized knowledge and develop the reasoning skills and strategies that enable them to achieve the highest levels of performance. The authors consider such topics as the forecasting workplace; atmospheric scientists' descriptions of their reasoning strategies; the nature of expertise; forecaster knowledge, perceptual skills, and reasoning; and expert systems designed to imitate forecaster reasoning. Drawing on research in cognitive science, meteorology, and computer science, the authors argue that forecasting involves an interdependence of humans and technologies. Human expertise will always be necessary.
Article
A primary motivation for developing autonomous weapons is the assumption that such systems require fewer combatants with lower levels of expertise, thus cutting costs. For more than two decades, the Defense Department has employed highly automated weapons, including the Patriot air defense system and the Predator aircraft system. For the myths of automation to hold, the human costs of the systems should have fallen as the Army and Air Force became more accustomed to them. But automated weapon systems have actually required more highly trained and experienced experts, and the failure to train personnel to operate the systems adequately has led to fratricide and the loss of innocent life. The Defense Department needs to improve its policies on the procurement and development of automated systems if the weapons are to be responsive and effective in future conflicts.
Article
Accurate prediction of behavior is a critical task for the psychologist, particularly for the practitioner. Outstanding among those who have successfully wrestled with this complicated task is Paul Meehl. Yet, small has been the influence of his work on the everyday practice of prediction. The object of this paper is to review Meehl's work in this area and, using behavior analysis, to seek an understanding of practitioners' continued opposition to his findings.
Article
Many decisions are based on beliefs concerning the likelihood of uncertain events such as the outcome of an election, the guilt of a defendant, or the future value of the dollar. Occasionally, beliefs concerning uncertain events are expressed in numerical form as odds or subjective probabilities. In general, the heuristics are quite useful, but sometimes they lead to severe and systematic errors. The subjective assessment of probability resembles the subjective assessment of physical quantities such as distance or size. These judgments are all based on data of limited validity, which are processed according to heuristic rules. However, the reliance on this rule leads to systematic errors in the estimation of distance. This chapter describes three heuristics that are employed in making judgments under uncertainty. The first is representativeness, which is usually employed when people are asked to judge the probability that an object or event belongs to a class or event. The second is the availability of instances or scenarios, which is often employed when people are asked to assess the frequency of a class or the plausibility of a particular development, and the third is adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available.