ArticlePDF Available

Abstract and Figures

The identification of an expert is vital to any study or application involving expertise. If external criterion (a “gold standard”) exists, then identification is straightforward: Simply compare people against the standard and select whoever is closest. However, such criteria are seldom available for domains where experts work; that's why experts are needed in the first place. The purpose here is to explore various methods for identifying experts in the absence of a gold standard. One particularly promising approach (labeled CWS for Cochran–Weiss–Shanteau) is explored in detail. We illustrate CWS through reanalyses of three previous studies of experts. In each case, CWS provided new insights into identifying experts. When applied to auditors, CWS correctly detected group differences in expertise. For agricultural judges, CWS revealed subtle distinctions between subspecialties of experts. In personnel selection, CWS showed that irrelevant attributes were more informative than relevant attributes. We believe CWS provides a valuable tool for identification and evaluation of experts.
Content may be subject to copyright.
Performance-based assessment of expertise: How to decide if
someone is an expert or not
James Shanteau
a,*
, David J. Weiss
b
, Rickey P. Thomas
a
, Julia C. Pounds
c
a
Department of Psychology, Kansas State University, Bluemont Hall 492, 1100 Mid-Campus Drive, Manhattan, KS 66506-5302, USA
b
California State University, Los Angeles, CA, USA
c
Federal Aviation Administration, Oklahoma City, OK, USA
Received 15 December 1999; accepted 15 April 2000
Abstract
The identi®cation of an expert is vital to any study or application involving expertise. If external criterion (a ``gold
standard'') exists, then identi®cation is straightforward: Simply compare people against the standard and select whoever
is closest. However, such criteria are seldom available for domains where experts work; that's why experts are needed in
the ®rst place. The purpose here is to explore various methods for identifying experts in the absence of a gold standard.
One particularly promising approach (labeled CWS for Cochran±Weiss±Shanteau) is explored in detail. We illustrate
CWS through reanalyses of three previous studies of experts. In each case, CWS provided new insights into identifying
experts. When applied to auditors, CWS correctly detected group dierences in expertise. For agricultural judges, CWS
revealed subtle distinctions between subspecialties of experts. In personnel selection, CWS showed that irrelevant at-
tributes were more informative than relevant attributes. We believe CWS provides a valuable tool for identi®cation and
evaluation of experts. Ó2002 Elsevier Science B.V. All rights reserved.
Keywords: Psychology; Expert systems; Auditing; Management; Agriculture
1. Introduction
Although experts have been studied for over a
century (Shanteau, 1999), there remains a critical
unanswered question ± how can we describe who
is, and who is not, an expert? If there is an external
criterion (a ``gold standard''), the answer is
straightforward. All we have to do is compare a
would-be expert's judgments to the correct answer.
If a person's answers are close to correct, then he
or she is an ``expert.'' If not, not.
This validity-based approach is compelling in
its simplicity. Unfortunately, it is problematic in
application. The diculty is that experts are nee-
ded precisely in domains where correct answers
seldom exist (Gigerenzer et al., 1999; Shanteau,
1995). Indeed, if we could compute (or look up)
correct answers, why would we need an expert at
all?
European Journal of Operational Research 136 (2002) 253±263
www.elsevier.com/locate/dsw
*
Corresponding author. Tel.: +1-785-532-0618; fax: +1-785-
532-5401.
E-mail address: shanteau@ksu.edu (J. Shanteau).
0377-2217/02/$ - see front matter Ó2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 0 1 ) 0 0 113-8
The purpose of this paper is to explore the
application of a new measure of expertise (labeled
CWS for Cochran±Weiss±Shanteau) for identifying
expertise in the absence of external criteria. The
measure is based on the behavior of would-be
experts by using their performance in the domain.
In eect, this is a bootstrap approach in which the
individual's own decisions are used to validate (or
invalidate) his/her claim to expertise.
The remainder of this paper is organized into
®ve sections: In Section 2, we review approaches
used in prior research to identify would-be experts.
In Section 3, we introduce our proposed approach
to identi®cation of expertise. In Section 4, we ap-
ply this approach to several previously conducted
studies of experts. In Section 5, we consider cave-
ats and restrictions that should be considered
when applying CWS. Finally, we oer our con-
clusions about the future of CWS.
2. Prior approaches
Many approaches have been used by previous
investigators to identify experts. Nine of these
traditional approaches will be summarized here.
We also consider the advantages and, more im-
portantly, the disadvantages of each approach.
2.1. Experience
In many studies, the number of years of job-
relevant experience is used as a surrogate for ex-
pertise. Participants with many years of experience
are classi®ed as ``experts,'' while others with little
experience are labeled ``novices.'' On the surface,
this approach appears convincing. After all, no
one can function as an ``expert'' for any length of
time if they are totally incompetent.
Although the argument can be made that ex-
perts almost always have considerable experience,
the converse does not necessarily follow. There are
many examples of professionals with considerable
experience who never become experts. Such indi-
viduals may even work with top experts, but they
seldom rise to the performance levels required for
true expertise.
In a study of grain judges, for instance, Trumbo
et al. (1962) found that number of years of experi-
ence did not correlate with accuracy of wheat
grading. Instead, their results showed a dierent
trend: judges with more experience systematically
overrated grain quality (an interesting form of
``grade in¯ation''). Similarly, Goldberg (1968)
asked clinical psychologists with varying degrees of
experience to diagnose psychiatric patients. He
found no relation between experience and accuracy
of the diagnoses; however, the con®dence of clini-
cians in their diagnoses did increase with experience.
Although there are undoubtedly instances where
a positive relationship exists between experience
and performance, there is little reason to expect this
to apply universally. At best, experience is an un-
certain predictor of degree of expertise. At worst,
experience re¯ects seniority ± and little more.
2.2. Certi®cation
In many professions, individuals receive some
form of accreditation or title as a re¯ection of
their skill. For instance, doctors may be ``board
certi®ed'' and university faculty may be ``full
professor.'' Generally, it is safe to say that a cer-
ti®ed individual is more likely to be an expert than
someone who is uncerti®ed.
The problem with certi®cation is that it is more
often tied to years on the job than it is to profes-
sional performance. This can be particularly true
in bureaucracies. In military photo interpretation,
for instance, the rank of the individuals can vary
from Sergeant to Major. Yet performance is un-
related to rank (Tod Levitt, personal communi-
cation).
Another example occurs in the Israeli Air Force,
where the lead pilot in a battle is identi®ed by skill
rather than rank ± that means a General may fol-
low a Captain. This has been cited as one reason
for superiority of the Israelis in air combat against
Arab Air Forces (where lead pilots are usually
determined by rank). The Israelis recognized that
talent is not always re¯ected by formal certi®cation
(R. Lipshitz, personal communication).
Another problem with certi®cation is the
``ratchet up eect'' ± people generally move up the
254 J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263
certi®cation ladder, but seldom down. Once certi-
®ed, the recipient is accredited for life. Even if the
skill level of the individual suers a serious decline,
the title or rank remains. (Just ask students about
the teaching ability of some senior professors.)
2.3. Social acclamation
One method used by many researchers (in-
cluding the present authors) has been to rely on
identi®cation of experts by people working in the
®eld. That is, professionals are asked whom they
consider to be an expert. When there is some
agreement about the identi®cation of such an in-
dividual, that person is then labeled an expert by
``social acclamation.''
In her analysis of livestock judges, for example,
Phelps (1977) asked professionals in agriculture
whom they considered the best. From their an-
swers, she identi®ed four top livestock judges to be
the experts in her investigation (for further details
on this study, see below).
Absent other means of identifying experts, ac-
clamation is a reasonable strategy to follow. It is
unlikely that multiple professionals working in a
®eld would identify the same unquali®ed person as
an expert. If they agree, it seems safe to assume
that the agreed-upon person is an expert. The
problem with this approach is a ``popularity eect''
± someone better known by his or her peers is
more likely to be identi®ed as an expert. Mean-
while, another person outside the peer group is
unlikely to be seen as an expert ± even though that
person may be on the cutting edge of new
knowledge. Indeed, those who make new discov-
eries in a ®eld are frequently unpopular in the eyes
of their peers at the time of their breakthroughs.
2.4. Consistency (within) reliability
Einhorn (1972, 1974) argued that intra-person
(within) reliability is a necessary condition for ex-
pertise. That is, an expert's judgments should be
internally consistent. Conversely, inconsistency
would be prima facie evidence that the person is
not an expert.
Table 1 lists within-person consistency values
from eight prior studies of experts. The four ver-
tical categories correspond to a classi®cation of
task diculty proposed by Shanteau (1999). There
are two domains listed for each category, with
internal consistency correlations. For example, the
average consistency for weather forecasters (a de-
cision-aided task) is quite high at 0.98. For
stockbrokers (an unaided task), the average con-
sistency is less than 0.40.
As might be expected, aided tasks produce
higher internal consistency values than unaided
tasks. To a ®rst approximation, therefore, it ap-
pears that intra-person reliability corresponds
closely to the performance level of experts in dif-
ferent domains.
The diculty with this approach is that some-
one can be consistent by following some simple,
but incorrect rule. As long as the rule is followed
routinely, the person's behavior will exhibit high
consistency. For example, by always answering
``yes'' and ``no'' to alternate questions, one can be
perfectly repeatable. But such answers would gen-
erally be inappropriate. Thus, internal consistency
Table 1
Reliability (consistency) values across levels of expert performancea
Highest levels of performance Lowest levels of performance
Aided decisions Competent Restricted Unaided decisions
Weather forecasters Livestock judges Clinical psychologists Stockbrokers
r0:98 r0:96 r0:44 r<0.40
Auditors Grain inspectors Pathologists Polygraphers
r0:90 r0:62 r0:50 r0:91
a
The values cited in this table (left±right and top±bottom) were drawn from the following: Stewart et al. (1997), Phelps and Shanteau
(1978), Goldberg and Werts (1966), Slovic (1969), Kida (1980), Trumbo et al. (1962), Einhorn (1974), and Raskin and Podlesny (1979).
J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263 255
is a necessary condition ± an expert could
hardly behave randomly ± but not sucient for
expertise.
2.5. Consensus (between) reliability
Einhorn (1972, 1974) argued that agreement
between individuals is a necessary condition for
expertise. That is, he believed that experts in a
given ®eld should agree with each other (also see
Ashton, 1985). If there is disagreement, then it
suggests that one, some, or all of the would-be
experts are not really what they claim to be.
Table 2 lists average between-expert correla-
tions for the same studies listed in Table 1. For
instance, the consensus correlations for weather
forecasters and stockbrokers are 0.95 and 0.32,
respectively. Except for pathologists, the consen-
sus values are similar to, but lower than, the cor-
responding consistency values in Table 1.
Livestock judges and polygraphers display
quite dierent consistency and consensus results.
Further analysis reveals that there are several
schools of thought in these domains about how to
make decisions. Thus, experts from each school
may be internally consistent, but show sizable
disagreement with experts from another school.
This could explain the discrepancy between the
high consistency values and the low consensus
values in these two domains.
On the surface, consensus appears to be a
compelling property for experts. After all, we feel
quite uncomfortable when two or more experts
(such as doctors) argue about which is the correct
procedure to follow. When the experts agree, on
the other hand, we feel more comfortable with the
mutually agreed-upon course of action.
The problem with consensus is that agreement
can result from premature closure, e.g., groupthink
(Janis, 1972). There are many illustrations where
the best answer was not the one identi®ed by a
group of experts because they focused initially on
an inferior alternative. Thus, they become blind to
better options. Therefore, many experts may agree
± but they may all be wrong (Shanteau, in press;
Weiss and Shanteau, in press).
2.6. Discrimination ability
Hammond (1996) and others have pointed out
that the ability to make ®ne discriminations be-
tween similar, but not equivalent, cases is a de-
®ning skill of experts. That is, an expert must be
able to perceive and act on subtle dierences that a
non-expert may often overlook. In the study of
livestock judges by Phelps described below, the
researchers were able to develop quantitative
models of the experts' judgments. However, it
proved impossible for these researchers to apply
the models to actual livestock due to the diculty
of perceiving the appropriate characteristics of
animals. Thus, knowing how to combine infor-
mation is of no value without knowing what in-
formation to combine.
Although it seems clear that discrimination is a
necessary condition for expertise, there is a catch.
A non-expert may well dierentiate between cases
using some easily identi®able, but irrelevant at-
tribute. For instance, it is easy to distinguish be-
tween livestock based on the length or curliness of
Table 2
Reliability (consensus) values across levels of expert performancea
Highest levels of performance Lowest levels of performance
Aided decisions Competent Restricted Unaided decisions
Weather forecasters Livestock judges Clinical psychologists Stockbrokers
r0:95 r0:50 r0:40 r0:32
Auditors Grain inspectors Pathologists Polygraphers
r0:76 r0:60 r0:55 r0:33
a
The values cited in this table (left±right and top±bottom) were drawn from the following: Stewart et al. (1997), Phelps and Shanteau
(1978), Goldberg and Werts (1966), Slovic (1969), Kida (1980), Trumbo et al. (1962), Einhorn (1974), and Lykken (1979).
256 J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263
their tails. However, tail characteristics play no
role in the meat quality of farm animals (Bill Able,
personal communication). Thus, discrimination
ability is a necessary, but not sucient, condition
for identifying experts.
2.7. Behavioral characteristics
Research by (Abdolmohammadi and Shanteau,
1992; also see Shanteau, 1989) found that expert
auditors share many common behavioral charac-
teristics. Some examples are self-con®dence, cre-
ativity, perceptiveness, communication skills,and
stress tolerance. A complete list of characteristics
(along with their de®nitions) appears in the origi-
nal paper.
Because many experts exhibit such traits, Ab-
dolmohammadi and Shanteau proposed that be-
havioral characteristics might be used to develop a
``trait pro®le'' of experts. If appropriate tests can
be identi®ed or constructed, then would-be experts
would take such tests. Those that score closest to
the pro®le of established experts would then be-
come potential experts.
Although this approach has considerable po-
tential, there are three critical problems. First, the
required tests for several of these characteristics do
not exist, e.g., Communication Skills or Tolerance
of Stress. Second, even if they did, the tests would
have to be normalized for a domain (e.g., audi-
tors). Third, the extent to which non-experts may
also share these same characteristics is unclear.
Thus, although this approach holds promise, more
work is needed before experts can be identi®ed
using their behavioral characteristics.
2.8. Knowledge tests
In studies of problem solving or game-playing
experts are often identi®ed based on tests of fac-
tual knowledge. For example, Chi (1978) used
knowledge about dinosaurs to separate children
into experts and novices.
Knowledge of relevant facts is clearly a pre-
requisite for expertise. Someone who knows
nothing about a domain will be unable to make
competent decisions. Yet, knowledge alone is not
sucient to establish that someone is an expert. In
the Chi study, for example, knowledge about dif-
ferent types of dinosaurs is not enough to know
what they ate, where they lived, how long they
survived, or why they died out.
The problem is that it takes more than knowl-
edge of facts for expertise. It is also necessary to
see which facts to apply in a given situation. In
most domains, that is the hard part.
2.9. Creation of experts
In certain contexts, it is possible for experts to
be ``created'' through extensive training by re-
searchers. This approach has signi®cant advanta-
ges, including the fact that development of
expertise can be studied longitudinally. Moreover,
the skills learned are under direct control of re-
searchers.
One notable example of this approach is a
student who worked with William Chase at Car-
negie-Mellon University to enhance his short-term
memory span (Chase and Ericsson, 1981). Because
the student was a track athlete, he learned to
translate groups of digits into times for various
running distances. When asked to retrieve the
digits, he recalled the times in clusters tied to
running. Using this strategy, the student broke the
old record for short-term memory span of 18 digits
established by a German mathematician. The new
record ± over 80! (Other students since have ex-
tended the record beyond 100.)
Experts can be created in this way for certain
narrow tasks, e.g., to play computer games or
work in a simulated microworld environment. In
most realms of expertise, however, a broad range
of skills is required based on years of training and
experience. For instance, becoming a medical
doctor can take a dozen years just to get started.
Obviously, training students for a few months
cannot simulate such expertise.
3. A new approach
As the preceding survey shows, many ap-
proaches have been advanced for identifying ex-
J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263 257
perts. Each of these approaches, however, has one
or more serious ¯aws. No generally acceptable
approach exists at the present time. To ®ll this gap,
the two senior authors (Weiss and Shanteau,
submitted) proposed a new approach for de®ning
expertise. They combined two necessary, but not
sucient, measures, into a single index.
First, they agreed with Hammond (1996) that
discrimination is critical for an expert. The ability
to dierentiate between similar, but not identical,
cases is a hallmark of expertise. That is, experts
perceive and act on subtle distinctions that others
miss. Second, they followed Einhorn's (1974)
suggestion that consistency, or within-person reli-
ability, is necessary in an expert. If someone can-
not repeat their judgment in a similar situation,
then they are unlikely to be an expert.
Discrimination refers to a judge's dieren-
tial evaluation of dierent stimulus cases. Con-
sistency refers to a judge's evaluation of the
same stimuli over time; inconsistency is its
complement.
3.1. CWS ratio
As shown in Eq. (1), Weiss and Shanteau
combine discrimination and consistency into a
ratio. The CWS ratio will be large when a judge
discriminates consistently, but will be small if the
judge either discriminates less or has lower con-
sistency.
CWS Discrimination
Inconsistency :1
Our construction of this index parallels Coch-
ran's (1943) suggestion to use a ratio of variances
to assess the quality of a response instrument.
(Another reason for using variance ratios is that
they are asymptotically ecient (I.R. Goodman,
personal communication).) Cochran argued that
an eective instrument should allow participants
to express perceived dierences among stimuli in a
consistent way. We view an eective expert in the
same way. We acknowledge our intellectual debt
to Cochran by referring to our performance-based
index as CWS.
The intuition underlying the index is that a
good measuring tool necessarily has a high CWS
ratio. That is, a proper instrument yields dierent
measures for dierent objects, and gives the same
measure whenever it is applied to the same object.
A ruler, for example, discriminates among objects
of varying length, and produces identical scores
for the same objects. Thus, a proper measuring
instrument will produce a high CWS value as de-
®ned in Eq. (1).
Similarly, an expert must be both discriminat-
ing and consistent. It is easy to display one or the
other, but hard to do both. One can show dis-
crimination by generating a wide variety of re-
sponses over stimuli; one can exhibit consistency
by repeating the same response to all stimuli. But
adopting either of these strategies alone means
that the other entity will be lost. To display
both properties simultaneously requires careful
assessment of the stimuli, the essence of expert
judgment.
3.2. Using CWS
CWS can be estimated by asking would-be ex-
perts to make judgments of a series of stimulus
cases; this allows for assessment of their discrimi-
nation ability. In addition, at least some of the
cases should be repeated; this allows for assess-
ment of their consistency.
Discrimination and inconsistency values can
be estimated using a variety of analytic proce-
dures, such as analysis of variance or multiple
regression. It is important to emphasize that the
use of ratios is descriptive, not inferential. That
is, CWS is more of a qualitative tool than a
quantitative tool. There are no comparisons to
statistical tables and no determinations of sig-
ni®cance. Rather, CWS is used to establish that
someone behaves more (high value) or less (low
value) like an expert.
To rank-order two (or more) would-be experts,
CWS ratios can be compared using a procedure
developed by Schumann and Bradley (1959). This
allows the researcher to determine whether one
individual is performing better than another
(Weiss, 1985).
258 J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263
4. Reanalyses of prior studies
In this section, we apply CWS to three previous
studies of experts. By reanalyzing these results, we
hope to show the utility of CWS in a variety of
contexts.
4.1. Audit judgment
Ettenson (1984) asked two groups of auditors
to evaluate 24 ®nancial cases described by a
common set of cues. One group of 15 expert au-
ditors was recruited from Big Six accounting ®rms
in Omaha, Nebraska. The expert group included
audit seniors and partners, with 4±25 years of
audit experience. For comparison, 15 novice ac-
counting students were obtained from two large
Midwestern universities.
Every ®nancial case was described using 16
cues, each of which was given either a high or low
value. For example, net income was set at either a
high or low number. For each case, participants
were asked to make a going concern assessment. A
fractional factorial design was used to generate 16
cases. Eight of these cases were then replicated to
produce a total of 24 stimuli; participants were not
told that some cases were identical. The order of
presentation of cases was randomized.
Based on feedback from an auditor collabora-
tor, the cues were classi®ed as either ``diagnostic''
(e.g., net income), ``partially diagnostic'' (e.g., ag-
ing of receivables), or ``non-diagnostic'' (e.g., prior
audit results). From analysis of the fractional de-
sign, discrimination was estimated from the mean
square values for each cue ± high variance implies
high discrimination. Inconsistency was estimated
from the average of within-cell variances ± low
variance implies high consistency. The ratio of
discrimination variance divided by inconsistency
variance was computed to form separate CWS
values for diagnostic, partially diagnostic, and
non-diagnostic cues.
The results in Table 3 show that average CWS
values decline systematically as the diagnosticity of
the cues declines. For the expert group (®rst row in
Table 3), the dierences are notable, especially
between diagnostic and partially diagnostic cues.
For the novice group (second row in the table),
there is a similar but less pronounced decline.
More important, there is a sizable dierence be-
tween experts and novices for diagnostic cues. The
size of this dierence is less for partially diagnostic
cues, and non-existent for non-diagnostic cues.
For diagnostic cues, CWS clearly distinguishes
between experts and novices. Moreover, the size of
dierence between the groups declines for less di-
agnostic cues. These results show that CWS can
distinguish between expert and novice groups.
4.2. Livestock judgment
Phelps (1977) had four professional livestock
judges evaluate 27 drawings of gilts ± female pigs.
These drawings were created by an artist to yield a
333, size breeding meat quality, factorial
design. The judges independently evaluated each
gilt for breeding quality (how good is the animal
for reproduction) and slaughter quality (how good
is the meat from the animal.) All stimuli were
presented three times, although judges were not
told that they were being shown the same
drawings.
Two of the judges were nationally recognized
experts in assessment of swine and were very fa-
miliar with gilts of the sort shown in the drawings.
The other two were nationally recognized experts
as cattle judges; although they were knowledgeable
about swine judging, they lacked day-to-day fa-
miliarity and experience.
For breeding judgments (upper panel in
Table 4), swine experts produced the largest CWS
values for breeding and meat cues. In comparison,
cattle experts produced large CWS values only for
the meat cue. This apparently re¯ects the unfa-
miliarity of breeding characteristics of swine by
Table 3
Average CWS values for two groups of auditors with three
categories of cuesa
Diagnostic Partially
diagnostic
Non-
diagnostic
Experts 13.10 6.42 3.32
Novices 8.08 5.13 3.03
a
Results based on a reanalysis of Ettenson (1984).
J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263 259
cattle judges; meat quality characteristics, how-
ever, were readily emphasized by cattle judges.
For slaughter judgments (lower panel in Table
4), the meat cue dominates for both swine and
cattle judges. However, there is over a 2-to-1 dif-
ference in the magnitude of CWS for meat between
swine and cattle judges. Breeding and size dimen-
sions were small for both types of judges.
Interestingly, for cattle judges, there is little
dierence in CWS between breeding and slaughter
judgments. For swine judges, however, there is a
considerable dierence between breeding and
slaughter judgments, especially for the breeding
cue. Thus, it appears that swine judges are more
sensitive to changes in the task. In all, CWS pro-
vides a revealing picture of the dierence between
these two highly skilled types of experts. This
study also highlights the role that speci®c tasks
play in expertise.
4.3. Personnel hiring
Nagy (1981) used summary descriptions of job
candidates for the position of computer pro-
grammer at a large company in the state of
Washington. She asked four professional person-
nel selectors (experts) and 20 management
students (novices) to evaluate these candidates.
Each candidate was described by legally relevant
attributes (recommendations from prior employers
and amount of job-relevant experience) and legally
irrelevant attributes (age,gender,andphysical at-
tractiveness). Filler information from local phone
books was used to supply background informa-
tion, such as phone number and home address, on
the application summaries.
Each participant evaluated 32 applicants (gen-
erated from a 2 2222 factorial design)
twice. Before the evaluations, participants were
reminded about the legal requirements for hiring,
i.e., what information should and should not be
used. The importance of the ®ve attributes was
determined for each participant on a 0±100 nor-
malized scale; average CWS values are reported
for each group.
As can be seen for the relevant attributes (upper
panel in Table 5), average CWS values are nearly
identical for the two groups. This is not surprising
given that participants were told immediately be-
Table 5
Average CWS values for two groups of personnel selectorsa
Recommendations Experience
Relevant attributes
Professionals 88.25 86.17
Students 88.81 86.88
Age Attractiveness Gender
Irrelevant attributes
Professionals 0.99 1.58 0.00
Students 28.12 25.19 13.32
a
Results based on reanalysis of Nagy (1981).
Table 4
Average CWS values for swine judgments for two types of livestock expertsa
Size Breeding Meat
Breeding judgments
Swine experts 15.9 53.8 65.6
Cattle experts <1:0 3.4 79.2
Slaughter judgments
Swine experts <1:0 3.2 212.7
Cattle experts <1:0 7.5 98.0
a
Results based on a reanalysis of Phelps (1977).
260 J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263
fore the study about hiring guidelines. In contrast,
CWS values for irrelevant attributes (lower panel)
reveal a dierent pattern. For professionals, CWS
approaches zero (as it should). In contrast, CWS
values are considerably larger for students. Despite
being reminded that age, gender, and attractive-
ness are not legally allowed, business students had
sizable CWS values for these irrelevant attributes.
Certainly, it is not easy to ignore something as
obvious as age or gender, although that is what the
legal guidelines require. Experts, however, appar-
ently have developed strategies to do just that.
Thus, there are tasks where CWS values for irrel-
evant attributes may be more diagnostic of ex-
pertise than relevant attributes.
5. Caveats
There are ®ve caveats and precautions that
deserve mention. First, the application of CWS to
these three prior studies is encouraging as far as it
goes. However, more evidence is needed before
CWS can be used by itself to identify experts. For
now, it is clear that CWS can be used as a useful
supplement to other approaches, e.g., social ac-
clamation.
Second, the stimuli used in these studies were
abstractions of real-world problems. Speci®cally,
cases were presented in static (non-changing) en-
vironments, with no feedback or dynamic/tempo-
ral changes. We are now applying CWS in
complex, real-time environments.
Third, CWS was applied here to individuals
whose results were combined to produce group
averages. However, most experts work in teams. If
teams are treated as a decision-making unit, then it
is possible to apply CWS in the same way as with
individuals. Preliminary eorts to apply CWS to
team decision making have been encouraging.
Fourth, CWS assumes that there are real dif-
ferences in the stimuli to be judged. If the stimuli
are not dierent, then there is nothing to discrim-
inate. If multiple patients have the same disease,
for instance, then there will be no dierential di-
agnoses. Therefore, there must be a range of
stimuli before CWS can be used to identify ex-
perts.
Finally, it is possible for CWS to yield high
values for non-experts who use a consistent, but
incorrect rule. Suppose all job candidates with
short names (e.g., Ann) get high recommendations
while all job candidates with long names (e.g.,
Georgette) get low recommendations. Because of
high consistency, such an inappropriate rule would
produce high CWS values. One way around this
``catch'' is to ask judges to evaluate the same cases
in dierent contexts, e.g., recommendations for a
dierent job. If judgments are the same as before,
then the participant is not likely to be an expert ±
despite having a high CWS value.
6. Conclusions
The present application of CWS leads to ®ve
conclusions: First, in the analyses above, CWS
proved superior to any previously proposed ap-
proach for identifying experts. If CWS continues
to be successful, it may provide an answer to the
long-standing question of how to identify expertise
in the absence of external criteria.
Second, the success of CWS across dierent
domains is noteworthy. In addition to auditing,
livestock judging, and personnel selection, we have
applied CWS to wine judging, medical decision
making, soil judging, microworld simulations,
sensory food evaluations, and air trac control.
Thus far, CWS has worked well in every do-
main.
Third, in addition to identifying experts, CWS
has provided new insights into interpretation of
previous research. In the Phelps study of livestock
judges, for example, CWS clari®ed a long-standing
question about how to distinguish between experts
from closely related specialty areas.
Fourth, by focusing on discrimination and
consistency, CWS may have important implica-
tions for selection and training of novices to be-
come experts. It is unclear, for example, whether
discrimination and consistency can be learned, or
whether novices should be preselected for these
skills. Either way, CWS oers new perspectives on
what it means to be an expert.
Finally, we are now applying CWS to data sets
where there is no prior information about the
J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263 261
relevance of attributes. The question is whether
CWS can identify experts in the absence of any
knowledge of what is relevant and what is irrele-
vant. In preliminary analyses, the dierences do
not appear to be as large as shown in the present
tables. However, CWS does consistently separate
experts from non-experts. In all, the future for
CWS looks hopeful.
Acknowledgements
Preparation of this manuscript was supported,
in part, by grant 96-12126 from the National Sci-
ence Foundation and by grant 98-G-026 from the
Federal Aviation Administration in the Department
of Transportation (in the USA).
References
Abdolmohammadi, M.J., Shanteau, J., 1992. Personal charac-
teristics of expert auditors. Organizational Behavior and
Human Decision Processes 58, 158±172.
Ashton, A.H., 1985. Does consensus imply accuracy in
accounting studies of decision making. Accounting Review
60, 173±185.
Chase, W.G., Ericsson, K.A., 1981. Skilled memory. In:
Anderson, J.R. (Ed.), Cognitive Skills and Their
Acquisition. Erlbaum Associates, Hillsdale, NJ, pp. 141±
189.
Chi, M.T.H., 1978. Knowledge structures and memory
development. In: Siegler, R.S. (Ed.), Children's Thinking:
What Develops? Erlbaum Associates, Hillsdale, NJ,
pp. 73±96.
Cochran, W.G., 1943. The comparison of dierent scales of
measurement for experimental results. Annals of Mathe-
matical Statistics 14, 205±216.
Einhorn, H.J., 1972. Expert measurement and mechanical
combination. Organizational Behavior and Human Per-
formance 7, 86±106.
Einhorn, H.J., 1974. Expert judgment: Some necessary condi-
tions and an example. Journal of Applied Psychology 59,
562±571.
Ettenson, R., 1984. A schematic approach to the examination
of the search for and use of information in expert decision
making. Unpublished Doctoral Dissertation, Kansas State
University, Manhattan, KS.
Gigerenzer, G., Todd, P., & the ABC group, 1999. Simple
Heuristics that Make Us Smart. Oxford University Press,
London.
Goldberg, L.R., 1968. Simple models or simple processes: Some
research on clinical judgments. American Psychologist 23,
482±496.
Goldberg, L.R., Werts, C.E., 1966. The reliability of clinicians
judgments: A multitrait±multimethod approach. Journal
of Consulting Psychology 30, 199±206.
Hammond, K.R., 1996. Human Judgment and Social Policy.
Oxford University Press, New York.
Janis, I.L., 1972. Victims of Groupthink. Houghton-Miin,
Boston.
Kida, T., 1980. An investigation into auditor's continuity and
related quali®cation judgments. Journal of Accounting
Research 22, 145±152.
Lykken, D.T., 1979. The detection of deception. Psychological
Bulletin 80, 47±53.
Nagy, G.F., 1981. How are personnel selection decisions made
An analysis of decision strategies in a simulated personnel
selection. Unpublished Doctoral Dissertation, Kansas
State University, Manhattan, KS.
Phelps, R.H., 1977. Expert livestock judgment: A descriptive
analysis of the development of expertise. Unpublished
Doctoral Dissertation, Kansas State University, Manhat-
tan, KS.
Phelps, R.H., Shanteau, J., 1978. Livestock judges: How much
information can an expert use? Organizational Behavior
and Human Performance 21, 209±219.
Raskin, D.C., Podlesny, J.A., 1979. Truth and deception: A
reply to Lykken. Psychological Bulletin 86, 54±59.
Schumann, D.E.W., Bradley, R.A., 1959. The comparison of
the sensitivities of similar experiments: Model II of the
analysis of variance. Biometrics 15, 405±416.
Shanteau, J., 1989. Psychological characteristics and strate-
gies of expert decision makers. In: Rohrmann, B.,
Beach, L.R., Vlek, C., Watson, S.R. (Eds.), Advances
in Decision Research. North-Holland, Amsterdam,
pp. 203±215.
Shanteau, J., 1995. Expert judgment and ®nancial deci-
sion making, In: Green, B. (Ed.), Risky Business, Uni-
versity of Stockholm School of Business, Stockholm,
pp. 16±32.
Shanteau, J., 1999. Decision making by experts: The GNAHM
eect. In: Shanteau, J., Mellers, B.A., Schum, D.A. (Eds.),
Decision Science and Technology: Re¯ections on the
Contributions of Ward Edwards. Kluwer Academic Pub-
lishers, Boston, pp. 105±130.
Shanteau, J., in press. What does it mean when experts
disagree? In: Salas, E., Klein, G. (Ed.), Linking Expertise
and Naturalistic Decision Making. Erlbaum Associates,
Hillsdale, NJ.
Slovic, P., 1969. Analyzing the expert judge: A descriptive study
of a stockbroker's decision processes. Journal of Applied
Psychology 53, 255±263.
Stewart, T.R., Roebber, P.J., Bosart, L.F., 1997. The impor-
tance of the task in analyzing expert judgment. Organiza-
tional Behavior and Human Decision Processes 69, 205±
219.
262 J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263
Trumbo, D., Adams, C., Milner, M., Schipper, L., 1962.
Reliability and accuracy in the inspection of hard red
winter wheat, Cereal Science Today 7.
Weiss, D.J., 1985. SCHUBRAD: The comparison of the
sensitivities of similar experiments. Behavior Research
Methods Instrumentation and Computers 17, 572.
Weiss, D.J., Shanteau, J., in press. The Vice of Consensus and
the Virtue of Consistency. In: Shanteau, J., Johnson, P.,
Smith, C. (Eds.), Psychological Explorations of Competent
Decision Making. Cambridge University Press, New York.
Weiss, D.J., Shanteau, J., submitted. Empirical assessment of
expertise.
J. Shanteau et al. / European Journal of Operational Research 136 (2002) 253±263 263
... Similarly, these studies show that when used to take a decision, models often yield more precise and desirable outcomes than expert judgments (Meehl, 1954;Goldberg, 1968;Shanteau and Stewart, 1992;Kahneman and Klein, 2009;Logg et al., 2019). Yet, in parallel, the "Naturalistic Decision-Making" literature tries to show that experts are competent and make accurate decisions in their specialized domains (Libby, 1975;Shanteau, 1992;Shanteau and Stewart, 1992;Devine and Heckman, 1996;Shanteau et al., 2002). In addition, a growing but still small literature examines the performance of targeting mechanisms based on expert's judgments for picking winners in BPCs. ...
... Similarly, these studies show that when used to take a decision, models often yield more precise and desirable outcomes than expert judgments in several cases (Meehl, 1954;Goldberg, 1968;Shanteau and Stewart, 1992;Kahneman and Klein, 2009;Logg et al., 2019). Yet, in parallel, the "Naturalistic Decision-Making" literature tries to show that experts are competent and make accurate decisions in their specialized domains (Libby, 1975;Shanteau, 1992;Shanteau and Stewart, 1992;Devine and Heckman, 1996;Shanteau et al., 2002). ...
... Åstebro and Koehler 2004;Åstebro and Elhedhli, 2006), expert judges relying on simple decision heuristics performed better than a model in picking successful ventures. 71 Moreover, modeling poses the problem of the pre-identification of variables that characterize the entrepreneurs and their business plans, since these variables are necessary to calculate ranking scores (Freel 1998;Shanteau et al., 2002;Grover et al., 2019). Such variables may be drawn from the literature, though there is no guarantee that important variables may still not be omitted (see Hillebrecht et al., 2020). ...
Thesis
Full-text available
Most Sub-Saharan African (SSA) countries experienced sound economic growth and a declining rate of poverty over the last two decades. Though, by far, the SSA region remains the poorest in the world and faces tremendous political, social, and economic challenges. Moreover, due to the COVID-19 pandemic, SSA entered into a recession with a GDP growth rate of minus 5% in 2020 as ever recorded over 25 years. This has also induced an increase in poverty in the region, which adds up to the structural challenges and further highlight the need of sound policies to address economic growth, governance, jobs, and poverty for the region to meet the Sustainable Development Goals (SDGs) in 2030 and beyond. This thesis examines the effects of institutional quality, political instability, and a government targeted entrepreneurship program on the accumulation of human, physical, and financial capital by households and firms. In the literature, these factors are identified as the key determinants of economic growth and job creation, yet this thesis contributes to a knowledge gap, especially at the microeconomic level, on how households and firms accumulate these factors in the presence of weak institutional quality, political instability, and government targeted entrepreneurship programs. In particular, this thesis investigates heterogeneity as well as a single country study of the effects of institutional quality and political instability; it also employs a randomized controlled trial (RCT) to assess the impacts of two different targeted entrepreneurship support programs; and finally, it taps on data from this field experiment to assess the performance of two different targeting mechanisms for selecting growth-oriented entrepreneurs. Each paper is self-contained and three among the four papers were written with co-authors. The first paper assesses the effects of institutional quality and political instability on household assets and human capital accumulation in 19 Sub-Saharan African countries for the period 2003-16. In this paper, the concept of instability is enlarged to include factual instability as measured by the number of political violence and civil unrest events, perceived instability as measured by the perceptions of the quality of institutions by households, and the interplay between factual and perceived instability. Contrary to most previous analyses, this paper takes into account household wealth distribution to show how the effects of political instability differ for poor vs. rich households. For identification, I exploit the variation of factual and perceived instability across 185 administrative regions in the 19 countries. My regressions control for a large range of confounding factors measured at the levels of households, regions, and countries. Overall, factual and perceived instability are associated with higher investments in assets, and factual instability is also associated with more investment in house improvements, yet it is negatively associated with the ownership of financial accounts. With regard to the heterogeneous effects, increased factual or perceived instability is associated with more investments in physical capital but less investments in financial and human capital among rich households, and with less investments in physical, financial and human capital among poor households. These findings suggest that political instability might enhance the accumulation of wealth by rich households and reduce that of poor households, implying that the detrimental effects of political instability have lasting consequences for poor households, especially when poor households are exposed to an actual or even just perceived deteriorating quality of the country’s institutions. The second paper, written with Nicolas Büttner and Michael Grimm, analyzes households’ investments in assets and their consumption, and education and health expenditures when exposed to actual instability as measured by the number of political violence and protest events in Burkina Faso. There is a large, rather macroeconomic, literature that shows that political instability and social conflict are associated with poor economic outcomes including lower investment and reduced economic growth. However, there is only very little research on the impact of instability on households’ behavior, in particular their saving and investment decisions. This paper merges six rounds of household survey data and a geo-referenced time series of politically motivated events and fatalities from the Armed Conflict Location and Event Data project (ACLED) to analyze households’ decisions when exposed to instability in Burkina Faso. For identification, the paper exploits variation in the intensity of political instability across time and space while controlling for time-effects and municipality fixed effects as well as rainfall and nighttime light intensity, and many other potential confounders. The results show a negative effect of political instability on financial savings, the accumulation of durables, investment in house improvements, as well as on investment in education and health. Instability seems, in particular, to lead to a reshuffling from investment expenditures to increased food consumption, implying lower growth prospects in the future. With respect to economic growth, the sizable education and health effects seem to be particularly worrisome. The third paper, written with Michael Grimm and Michael Weber, employs a randomized controlled trial (RCT) to assess the short-term effects of a government support program targeted at already existing and new firms located in a semi-urban area in Burkina Faso. Most support programs targeted at small firms in low- and middle-income countries fail to generate transformative effects and employment at a larger scale. Bad targeting, too little flexibility and the limited size of the support are some of the factors that are often seen as important constraints. This paper assesses the short-term effects of a randomized targeted government support program to a pool of small and medium-sized firms that have been selected based on a rigorous business plan competition (BPC). One group received large cash grants of up to US$8,000, flexible in use. A second group received cash grants of an equally important size, but earmarked to business development services (BDSs) and thus less flexible and with a required own contribution of 20%. A third group serves as a control group. All firms operate in agri-business or related activities in a semi-urban area in the Centre-Est and Centre-Sud regions of Burkina Faso. An assessment of the short-term impacts shows that beneficiaries of cash grants engage in better business practices, such as formalization and bookkeeping. They also invest more, though, this does not translate into higher profits and employment yet. Beneficiaries of cash grants and BDSs show a higher ability to innovate. The results also show that cash grants cushioned the adverse effects of the COVID-19 pandemic for the beneficiaries. More generally, this study adds to the thin literature on support programs implemented in a fragile-state context. The fourth paper, written with Michael Weber, examines the selection of entrepreneurs based on expert judgments for a BPC in Burkina Faso. To support job creation in developing countries, governments allocate significant funds to a typically small number of new or already existing micro, small, and medium-sized enterprises (MSMEs) that are growth oriented. Increasingly, these enterprises are picked through BPCs where thematic experts are asked to make the selection. So far, there exists contrasting and limited evidence on the effectiveness and efficiency of these expert judgments for screening growth-oriented entrepreneurs among contestants in BPCs. Alternative or complementary approaches such as evaluation and selection algorithms are discussed in the literature but evidence on their performance is thin. This paper uses a principal component analysis (PCA) to build a metric for comparing the performance of these alternative mechanisms for targeting entrepreneurs with high potential to grow. The results show expert subjectivity bias in judging contestant entrepreneurs. The paper finds that the scores from the expert judgment and those from the algorithm perform similarly well for picking the top-ranked or talented entrepreneurs. It also finds that both types of scores have predictive power, i.e. have statistically significantly associated with 17 firm performance outcomes measured 10 or 34 months after the BPC started. Yet, the predictive power, as measured by the magnitude of the regression coefficients, is higher for the algorithm metric, even when it is considered jointly with expert judgment scores. Despite the statistical superiority of the algorithm, expert assessments at least through pitches of entrepreneurs have proved useful in many settings where free-riding or misuse of public funds may occur. Hence, efficiency and precision could be achieved by relying on a reasoned combination of expert judgments and an algorithm for targeting growth-oriented entrepreneurs. These four papers bring new insights on the relationship between weak institutions, political instability, and targeted government support to entrepreneurship for increasing the accumulation of financial, physical, and human capital, and productivity. And these are the key factors for spurring economic growth and creating jobs in SSA. These findings suggest that efficient institutions building in SSA countries would enhance citizen perceptions of good governance which would reduce political instability and enable households including the poor to accumulate productive assets, increase their productivity and reduce poverty. The findings also suggest that targeted government entrepreneurship support programs, e.g. in the forms of cash grants with monitored disbursements yet flexible in use, can enhance firms’ human capital, productive assets, and innovations, even in the short term. Moreover, the targeting mechanism of such programs could be made more effective and efficient by relying on a combinaison of expert judgments and an algorithm for picking growth-oriented entrepreneurs.
... Although "experts" have been studied by researchers over a long period, there appears to be a limited consensus on what exactly constitutes expertise (Baker et al., 2006;Shanteau et al., 2002). The following components of expertise can be commonly observed in the work of various researchers (Baker et al., 2006;Shanteau et al., 2002;Swanson and Holton, 2001): ...
... Although "experts" have been studied by researchers over a long period, there appears to be a limited consensus on what exactly constitutes expertise (Baker et al., 2006;Shanteau et al., 2002). The following components of expertise can be commonly observed in the work of various researchers (Baker et al., 2006;Shanteau et al., 2002;Swanson and Holton, 2001): ...
Article
(Purpose) : Manufacturing supply chains (SCs) across the world have become increasingly vulnerable to disruptions due to the increasing fragmentation of business functions and tasks across many firms located within the country and abroad. Despite the numerous instances of SC disruptions being reported in the literature, the study of SC vulnerability lacks adequate conceptual and empirical support. This study aims to address this research gap. (Design/methodology/approach) : The concept of SC vulnerability was examined considering the outcome and contextual models of vulnerability, which are well established in extant multi-disciplinary vulnerability literature. An exploratory Delphi study was then conducted to understand the extent of vulnerability of various manufacturing SCs in India, drivers of this vulnerability and the key hazards exploiting this vulnerability. (Findings) : The study confirms the increasing vulnerability of manufacturing SCs in India. It also highlights the lack of top management commitment to risk mitigation as the key vulnerability driver and frequent changes in government laws and regulations as the key hazard being faced by the manufacturing SCs in India. (Originality/value) : This study highlights the utility of outcome and contextual models of vulnerability as conceptual frameworks for understanding SC vulnerability. These conceptual insights along with the key manufacturing SC vulnerability drivers and hazards identified in the study should provide a basis for SC redesign for vulnerability reduction and the selection of SC risk mitigation strategies.
... The role of academic knowledge Academia is a particularly active site of knowledge generation on radicalisation, with one review finding that 71% of publications were in academic journals (Neumann and Kleinmann, 2013, 369). Certified, credentialised knowledge is often seen more broadly as constitutive of expertise (Shanteau et al., 2002), and Jackson (2012, 18) writes that 'social scientific credentials' are one way by which terrorism experts are 'authorised' to speak. Peer reviewed publications, qualifications and position were seen as important by some interviewees as ways to judge the expertise of themselves and others. ...
Article
Full-text available
Radicalisation has become a highly influential idea in British policy making. It underpins and justifies Prevent, a core part of the UK's counter-terrorism strategy. Experts have theorised the radicalisation process, often beset by a weak evidence base and mired in fundamental contestation on definitions and explanatory factors. Experiential experts have been active contributors to these debates, presenting a challenge to the low-ranking role often given to experiential knowledge in evidence hierarchies and a contrast to policy areas in which it remains poorly valued. This paper draws on interviews with radicalisation experts to examine the dynamics of this pluralisation in practice. With a focus on credibility contests, it explains how experiential experts can claim authoritative knowledge and the challenges they face from those who prioritise theory-driven empirical data as the basis for contributions to knowledge. The paper draws out the implications for understandings of expertise of this newly conceptualised, evidence poor and highly applied topic area.
... A quantitative description of the empirical accuracy of different signs of abuse in children with disabilities Age: Seniority in career and age is highly linked, with increased age implying increased seniority for researchers (Over, 1988) (Shanteau et al., 2002). 1-5 years: 4 (10%) 6-10 y: 7 (18%) 11-20 y: 10 (26%) More than 20 y: 15 (38%) ...
Article
Background Children with intellectual disabilities are at risk of becoming victims of abuse. However, persons working with this population often lack knowledge on how to interpret signs of abuse. The purpose of this study was to identify and socially validate signs of abuse in children with disabilities. Method The study employed a mixed-method sequential design. The first phase consisted of a rapid review of publications that described signs of abuse in children with disabilities (n = 23). The second phase included social validation using an online survey. The participants were professionals working with disability and/or child abuse (n = 39). Results A significant difference between the 10 highest rated signs of abuse compared to the 10 lowest rated signs was found. Group comparisons between participants showed significant differences in the ratings of eight signs. Conclusions The results from the study can provide guidance to the accuracy of signs of abuse in children with disabilities.
Article
In humans and other gregarious animals, collective decision-making is a robust behavioural feature of groups. Pooling individual information is also fundamental for modern societies, in which digital technologies have exponentially increased the interdependence of individual group members. In this Review, we selectively discuss the recent human and animal literature, focusing on cognitive and behavioural mechanisms that can yield collective intelligence beyond the wisdom of crowds. We distinguish between two group decision-making situations: consensus decision-making, in which a group consensus is required, and combined decision-making, in which a group consensus is not required. We show that in both group decision-making situations, cognitive and behavioural algorithms that capitalize on individual heterogeneity are the key for collective intelligence to emerge. These algorithms include accuracy or expertise-weighted aggregation of individual inputs and implicit or explicit coordination of cognition and behaviour towards division of labour. These mechanisms can be implemented either as ‘cognitive algebra’, executed mainly within the mind of an individual or by some arbitrating system, or as a dynamic behavioural aggregation through social interaction of individual group members. Finally, we discuss implications for collective decision-making in modern societies characterized by a fluid but auto-correlated flow of information and outline some future directions. Collective intelligence emerges in group decision-making, whether it requires a consensus or not. In this Review, Kameda et al. describe cognitive and behavioural algorithms that capitalize on individual heterogeneity to yield gains in decision-making accuracy beyond the wisdom of crowds. View-only file is available. https://rdcu.be/cL3QB
Article
Inconsistency in real‐world judgments can cause random unfairness, injustice and misallocation of resources. In their recent monograph analyse judgment inconsistency or “Noise”, examine its sources and propose remedies. In this commentary on Kahneman et al., we reflect on the major concepts (such as “judgment”, “noise”, “error”, “bias”) used in analysing inconsistency. We place this work in the broader context of applied cognitive psychology, relating it to error typologies and to dual‐systems views of thinking. We also compare Kahneman et al.’s heuristics based approach to the linear combination of attributes based approach of Social Judgment Theory (SJT), with particular reference to judgment noise. We conclude that the main contributions of Kahneman et al.’s book are (a) to raise awareness of the pervasiveness of judgment noise across a range of important real‐world areas, (b) to provide a taxonomy of types of noise in terms of system noise v. occasion noise, and level noise v. pattern noise, and (c) to outline useful ways of reducing noise, and thus overall levels of error. This article is protected by copyright. All rights reserved.
Chapter
The lack of and need for theoretical foundations to systems engineering have been recognized by multiple researchers in recent years. The lack of a foundation extends to the positions of systems engineers in organizations. Presently, organizational architectures for systems engineers are based on heuristics. Since systems engineering is required to alleviate certain challenges associated with the development of complex systems, a strong theoretical foundation to the establishment of organizational architectures for systems engineers is imperative. Such a theoretical foundation will ensure that the contribution made by systems engineers to organizational value can be improved. The goal of this paper is to provide a basis for creating a mathematical framework for organizational architectures for systems engineers. A literature review spanning multiple disciplines is conducted to identify elements pertaining to the organizational architectures. These elements are then used in a directed graph to visually represent the relationships between the elements and the mapping between systems engineers and organizational value.
Article
Purpose The work of internal auditors is relevant to their host entities' reporting processes; however, few researchers have examined how internal auditors’ competency and objectivity affect their resistance to pressure from host entities regarding their reports. Thus, the main objective of this study is to examine the influence of internal audit functions' (IAF) quality factors on chief audit executives' (CAEs) ability not to modify internal audit report. Design/methodology/approach This study uses data from the Global Internal Audit Common Body of Knowledge to investigate the relationship between IAF quality and auditor resistance to pressure related to changes in internal audit reports. IAF quality is calculated using a composite measure comprising four IAF quality components. Auditors' resistance is measured using the extent to which internal auditors experienced a situation wherein they were directed to modify a valid audit finding in a report. Findings The analyses provide evidence that CAEs experience, certification, training and objectivity were all significantly associated with resistance to pressure. In other words, a greater quality of IAF leads to a greater ability to resist pressure to change their reports. Research limitations/implications Despite the statistically significant results that confirm the impact of IAF competence and objectivity on the resistance of CAEs to pressure, some other factors should be considered simultaneously in future research. In addition, the study sample contains 2,193 CAEs from different regions, environments, sectors and business areas. Focussing on a particular environment, sector or organisation size may generate different results. Practical implications The following practical implications are proposed: First, internal audit regulators will find this study helpful in formulating strategies for creating balanced relationships between CAEs and other authorities and users. Second, CAEs can be encouraged to undergo constant training and complete professional development (as required by the Institute of Internal Auditors [IIA] standard). Finally, it would be interesting to apply this study to a particular environment, sector and size. Originality/value This study builds on the limited research that investigates the relationship between IAFs’ quality and the resistance of CAEs to pressure. It extends Calven’s (2021) study that investigates the impact of adherence to the IIA's Core Principles on the likelihood of IAFs modifying valid audit findings. This study examines the influence of IAF quality factors on CAEs' ability not to modify internal audit report.
Article
Full-text available
The estimation of parameters and model structure for informing infectious disease response has become a focal point of the recent pandemic. However, it has also highlighted a plethora of challenges remaining in the fast and robust extraction of information using data and models to help inform policy. In this paper, we identify and discuss four broad challenges in the estimation paradigm relating to infectious disease modelling, namely the Uncertainty Quantification framework, data challenges in estimation, model-based inference and prediction, and expert judgement. We also postulate priorities in estimation methodology to facilitate preparation for future pandemics.
Book
Decision Science and Technology is a compilation of chapters written in honor of a remarkable man, Ward Edwards. Among Ward's many contributions are two significant accomplishments, either of which would have been enough for a very distinguished career. First, Ward is the founder of behavioral decision theory. This interdisciplinary discipline addresses the question of how people actually confront decisions, as opposed to the question of how they should make decisions. Second, Ward laid the groundwork for sound normative systems by noticing which tasks humans can do well and which tasks computers should perform. This volume, organized into five parts, reflects those accomplishments and more. The book is divided into four sections: `Behavioral Decision Theory' examines theoretical descriptions and empirical findings about human decision making. `Decision Analysis' examines topics in decision analysis.`Decision in Society' explores issues in societal decision making. The final section, `Historical Notes', provides some historical perspectives on the development of the decision theory. Within these sections, major, multi-disciplinary scholars in decision theory have written chapters exploring some very bold themes in the field, as an examination of the book's contents will show. The main reason for the health of the Decision Analysis field is its close links between theory and applications that have characterized it over the years. In this volume, the chapters by Barron and Barrett; Fishburn; Fryback; Keeney; Moreno, Pericchi, and Kadane; Howard; Phillips; Slovic and Gregory; Winkler; and, above all, von Winterfeldt focus on those links. Decision science originally developed out of concern with real decision problems; and applied work, such as is represented in this volume, will help the field to remain strong.
Chapter
Psychological studies involving experts date back to the earliest days of experimental psychology. Research on domain experts has also been a fundamental part of the history of judgment and decision making (JDM). The purpose of this chapter is to look at how domain experts have been viewed in the decision making literature. The focus will be on an unappreciated historical bias derived from a misinterpretation of the foundations of experimental psychology.
The expert can and should be used as a provider of input for a mechanical combining process since most studies show mechanical combination to be superior to clinical combination. However, even in expert measurement, the global judgment is itself a clinical combination of other judgmental components and as such it may not be as efficient as a mechanical combination of the components. The superiority of mechanically combining components as opposed to using the global judgment for predicting some external criterion is discussed. The use of components is extended to deal with multiple judges since specific judges may be differentially valid with respect to subsets of components for predicting the criterion. These ideas are illustrated by using the results of a study dealing with the prediction of survival on the basis of information contained in biopsies taken from patients having a certain type of cancer. Judgments were made by three highly trained pathologists. Implications and extensions for using expert measurement and mechanical combination are discussed.
Article
The paper describes the ACT theory of learning. The theory is embodied as a computer simulation program that makes predictions about human learning of various cognitive skills such as language fluency, study skills for social science texts, problem-solving skills in mathematics, and computer programming skills. The learning takes place within the ACT theory of the performance of such skills. This theory involves a propositional network representation of general factual knowledge and a production system representation of procedural knowledge. Skill learning mainly involves addition and modification of the productions. There are five mechanisms by which this takes place: Designation, strengthening, generalization, discrimination, and composition. Each of these five learning mechanisms is discussed in detail and related to available data in procedural learning.