ArticlePDF Available

Individual Differences in Reasoning: Implications for the Rationality Debate

Authors:

Abstract

Much research in the last two decades has demonstrated that human responses deviate from the performance deemed normative according to various models of decision making and rational judgment (e.g., the basic axioms of utility theory). This gap between the normative and the descriptive can be interpreted as indicating systematic irrationalities in human cognition. However, four alternative interpretations preserve the assumption that human behavior and cognition is largely rational. These posit that the gap is due to (1) performance errors, (2) computational limitations, (3) the wrong norm being applied by the experimenter, and (4) a different construal of the task by the subject. In the debates about the viability of these alternative explanations, attention has been focused too narrowly on the model response. In a series of experiments involving most of the classic tasks in the heuristics and biases literature, we have examined the implications of individual differences in performance for each of the four explanations of the normative/descriptive gap. Performance errors are a minor factor in the gap; computational limitations underlie non-normative responding on several tasks, particularly those that involve some type of cognitive decontextualization. Unexpected patterns of covariance can suggest when the wrong norm is being applied to a task or when an alternative construal of the task should be considered appropriate.
Abstract: Much research in the last two decades has demon-
strated that human responses deviate from the performance
deemed normative according to various models of decision mak-
ing and rational judgment (e.g., the basic axioms of utility theory).
This gap between the normative and the descriptive can be inter-
preted as indicating systematic irrationalities in human cognition.
However, four alternative interpretations preserve the assumption
that human behavior and cognition is largely rational. These posit
that the gap is due to (1) performance errors, (2) computational
limitations, (3) the wrong norm being applied by the experi-
menter, and (4) a different construal of the task by the subject. In
the debates about the viability of these alternative explanations,
attention has been focused too narrowly on the modal response.
In a series of experiments involving most of the classic tasks in the
heuristics and biases literature, we have examined the implica-
tions of individual differences in performance for each of the four
explanations of the normative/descriptive gap. Performance er-
rors are a minor factor in the gap; computational limitations un-
derlie non-normative responding on several tasks, particularly
those that involve some type of cognitive decontextualization. Un-
expected patterns of covariance can suggest when the wrong norm
is being applied to a task or when an alternative construal of the
task should be considered appropriate.
Keywords: biases; descriptive models; heuristics; individual dif-
ferences; normative models; rationality; reasoning
1. Introduction
A substantial research literature – one comprising literally
hundreds of empirical studies conducted over nearly three
decades – has firmly established that people’s responses of-
ten deviate from the performance considered normative on
many reasoning tasks. For example, people assess proba-
bilities incorrectly, they display confirmation bias, they test
hypotheses inefficiently, they violate the axioms of utility
theory, they do not properly calibrate degrees of belief, they
overproject their own opinions on others, they allow prior
knowledge to become implicated in deductive reasoning,
and they display numerous other information processing bi-
ases (for summaries of the large literature, see Baron 1994;
1998; Evans 1989; Evans & Over 1996; Kahneman et al.
1982; Newstead & Evans 1995; Nickerson 1998; Osherson
1995; Piattelli-Palmarini 1994; Plous 1993; Reyna et al., in
press; Shafir 1994; Shafir & Tversky 1995). Indeed, demon-
strating that descriptive accounts of human behavior di-
verged from normative models was a main theme of the so-
called heuristics and biases literature of the 1970s and early
1980s (see Arkes & Hammond 1986; Kahneman et al.
1982).
The interpretation of the gap between descriptive mod-
els and normative models in the human reasoning and de-
cision making literature has been the subject of con-
tentious debate for almost two decades (a substantial
portion of that debate appearing in this journal; for sum-
maries, see Baron 1994; Cohen 1981; 1983; Evans & Over
1996; Gigerenzer 1996a; Kahneman 1981; Kahneman &
Tversky 1983; 1996; Koehler 1996; Stein 1996). The de-
BEHAVIORAL AND BRAIN SCIENCES (2000) 23, 645726
Printed in the United States of America
© 2000 Cambridge University Press 0140-525X/00 $12.50
645
Individual differences in reasoning:
Implications for the rationality
debate?
Keith E. Stanovich
Department of Human Development and Applied Psychology, University of
Toronto, Toronto, Ontario, Canada M5S 1V6
kstanovich@oise.utoronto.ca
Richard F.West
School of Psychology, James Madison University, Harrisonburg, VA 22807
westrf@jmu.edu falcon.jmu.edu//~westrf
Keith E. Stanovich is Professor of Human Develop-
ment and Applied Psychology at the University of
Toronto. He is the author of more than 125 scientific ar-
ticles in the areas of literacy and reasoning, including
Who Is Rational? Studies of Individual Differences in
Reasoning (Erlbaum, 1999). He is a Fellow of the APA
and APS and has received the Sylvia Scribner Award
from the American Educational Research Association
for contributions to research.
Richard F. West is a Professor in the School of Psy-
chology at James Madison University, where he has
been named a Madison Scholar. He received his Ph.D.
in Psychology from the University of Michigan. The au-
thor of more than 50 publications, his main scientific in-
terests are the study of rational thought, reasoning, de-
cision making, the cognitive consequences of literacy,
and cognitive processes of reading.
bate has arisen because some investigators wished to in-
terpret the gap between the descriptive and the normative
as indicating that human cognition was characterized by
systematic irrationalities. Owing to the emphasis that these
theorists place on reforming human cognition, they were
labeled the Meliorists by Stanovich (1999). Disputing this
contention were numerous investigators (termed the Pan-
glossians; see Stanovich 1999) who argued that there were
other reasons why reasoning might not accord with nor-
mative theory (see Cohen 1981 and Stein 1996 for exten-
sive discussions of the various possibilities) – reasons that
prevent the ascription of irrationality to subjects. First, in-
stances of reasoning might depart from normative stan-
dards due to performance errors – temporary lapses of
attention, memory deactivation, and other sporadic infor-
mation processing mishaps. Second, there may be stable
and inherent computational limitations that prevent the
normative response (Cherniak 1986; Goldman 1978; Har-
man 1995; Oaksford & Chater 1993; 1995; 1998; Stich
1990). Third, in interpreting performance, we might be ap-
plying the wrong normative model to the task (Koehler
1996). Alternatively, we may be applying the correct nor-
mative model to the problem as set, but the subject might
have construed the problem differently and be providing
the normatively appropriate answer to a different problem
(Adler 1984; 1991; Berkeley & Humphreys 1982; Broome
1990; Hilton 1995; Schwarz 1996).
However, in referring to the various alternative explana-
tions (other than systematic irrationality) for the normative/
descriptive gap, Rips (1994) warns that “a determined skep-
tic can usually explain away any instance of what seems at
first to be a logical mistake” (p. 393). In an earlier criticism
of Henle’s (1978) Panglossian position, Johnson-Laird
(1983) made the same point: “There are no criteria inde-
pendent of controversy by which to make a fair assessment
of whether an error violates logic. It is not clear what would
count as crucial evidence, since it is always possible to pro-
vide an alternative explanation for an error” (p. 26). The
most humorous version of this argument was made by Kah-
neman (1981) in his dig at the Panglossians who seem to
have only two categories of errors, “pardonable errors by
subjects and unpardonable ones by psychologists” (p. 340).
Referring to the four classes of alternative explanations dis-
cussed above – performance errors, computational limita-
tions, alternative problem construal, and incorrect norm
application – Kahneman notes that Panglossians have “a
handy kit of defenses that may be used if [subjects are] ac-
cused of errors: temporary insanity, a difficult childhood,
entrapment, or judicial mistakes – one of them will surely
work, and will restore the presumption of rationality”
(p. 340).
These comments by Rips (1994), Johnson-Laird (1983),
and Kahneman (1981) highlight the need for principled
constraints on the alternative explanations of normative/de-
scriptive discrepancies. In this target article we describe a
research logic aimed at inferring such constraints from pat-
terns of individual differences that are revealed across a
wide range of tasks in the heuristics and biases literature.
We argue here – using selected examples of empirical re-
sults (Stanovich 1999; Stanovich & West 1998a; 1998b;
1998c; 1998d; 1999) – that these individual differences and
their patterns of covariance have implications for explana-
tions of why human behavior often departs from normative
models.
1
2. Performance errors
Panglossian theorists who argue that discrepancies be-
tween actual responses and those dictated by normative
models are not indicative of human irrationality (e.g., Co-
hen 1981) sometimes attribute the discrepancies to perfor-
mance errors. Borrowing the idea of a competence/perfor-
mance distinction from linguists (see Stein 1996, pp. 89),
these theorists view performance errors as the failure to ap-
ply a rule, strategy, or algorithm that is part of a person’s
competence because of a momentary and fairly random
lapse in ancillary processes necessary to execute the strat-
egy (lack of attention, temporary memory deactivation, dis-
traction, etc.). Stein (1996) explains the idea of a perfor-
mance error by referring to a “mere mistake” – a more
colloquial notion that involves “a momentary lapse, a diver-
gence from some typical behavior. This is in contrast to at-
tributing a divergence from norm to reasoning in accor-
dance with principles that diverge from the normative
principles of reasoning. Behavior due to irrationality con-
notes a systematic divergence from the norm” (p. 8). Simi-
larly, in the heuristics and biases literature, the term bias is
reserved for systematic deviations from normative reason-
ing and does not refer to transitory processing errors (“a
bias is a source of error which is systematic rather than ran-
dom,” Evans 1984, p. 462).
Another way to think of the performance error explana-
tion is to conceive of it within the true score/measurement
error framework of classical test theory. Mean or modal
performance might be viewed as centered on the norma-
tive response – the response all people are trying to ap-
proximate. However, scores will vary around this central
tendency due to random performance factors (error vari-
ance).
It should be noted that Cohen (1981) and Stein (1996)
sometimes encompass computational limitations within
their notion of a performance error. In the present target
article, the two are distinguished even though both are
identified with the algorithmic level of analysis (see Ander-
son 1990; Marr 1982; and the discussion below on levels of
analysis in cognitive theory) because they have different im-
plications for covariance relationships across tasks. Here,
performance errors represent algorithmic-level problems
that are transitory in nature. Nontransitory problems at the
algorithmic level that would be expected to recur on a read-
ministration of the task are termed computational limita-
tions.
This notion of a performance error as a momentary at-
tention, memory, or processing lapse that causes responses
to appear nonnormative even when competence is fully
normative has implications for patterns of individual differ-
ences across reasoning tasks. For example, the strongest
possible form of this view is that all discrepancies from nor-
mative responses are due to performance errors. This
strong form of the hypothesis has the implication that there
should be virtually no correlations among nonnormative
processing biases across tasks. If each departure from nor-
mative responding represents a momentary processing
lapse due to distraction, carelessness, or temporary confu-
sion, then there is no reason to expect covariance among bi-
ases across tasks (or covariance among items within tasks,
for that matter) because error variances should be uncor-
related.
In contrast, positive manifold (uniformly positive bivari-
Stanovich & West: Individual differences in reasoning
646 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
ate associations in a correlation matrix) among disparate
tasks in the heuristics and biases literature – and among
items within tasks – would call into question the notion that
all variability in responding can be attributable to perfor-
mance errors. This was essentially Rips and Conrad’s (1983)
argument when they examined individual differences in de-
ductive reasoning: “Subjects’ absolute scores on the propo-
sitional tests correlated with their performance on certain
other reasoning tests.... If the differences in propositional
reasoning were merely due to interference from other per-
formance factors, it would be difficult to explain why they
correlate with these tests” (pp. 28283). In fact, a parallel
argument has been made in economics where, as in rea-
soning, models of perfect market rationality are protected
from refutation by positing the existence of local market
mistakes of a transitory nature (temporary information de-
ficiency, insufficient attention due to small stakes, distrac-
tions leading to missed arbitrage opportunities, etc.).
Advocates of perfect market rationality in economics ad-
mit that people make errors but defend their model of ide-
alized competence by claiming that the errors are essen-
tially random. The following defense of the rationality
assumption in economics is typical in the way it defines per-
formance errors as unsystematic: “In mainstream econom-
ics, to say that people are rational is not to assume that they
never make mistakes, as critics usually suppose. It is merely
to say that they do not make systematic mistakes – i.e., that
they do not keep making the same mistake over and over
again” (The Economist 1998, p. 80). Not surprising, others
have attempted to refute the view that the only mistakes in
economic behavior are unpredictable performance errors
by pointing to the systematic nature of some of the mis-
takes: “The problem is not just that we make random com-
putational mistakes; rather it is that our judgmental errors
are often systematic” (Frank 1990, p. 54). Likewise, Thaler
(1992) argues that “a defense in the same spirit as Fried-
man’s is to admit that of course people make mistakes, but
the mistakes are not a problem in explaining aggregate be-
havior as long as they tend to cancel out. Unfortunately, this
line of defense is also weak because many of the departures
from rational choice that have been observed are system-
atic” (pp. 4 5). Thus, in parallel to our application of an in-
dividual differences methodology to the tasks in the heuris-
tics and biases literature, Thaler argues that variance and
covariance patterns can potentially falsify some applications
of the performance error argument in the field of econom-
ics.
Thus, as in economics, we distinguish systematic from
unsystematic deviations from normative models. The latter
we label performance errors and view them as inoculating
against attributions of irrationality. Just as random, unsys-
tematic errors of economic behavior do not impeach the
model of perfect market rationality, transitory and random
errors in thinking on a heuristics and biases problem do not
impeach the Panglossian assumption of ideal rational com-
petence. Systematic and repeatable failures in algorithmic-
level functioning likewise do not impeach intentional-level
rationality, but they are classified as computational limita-
tions in our taxonomy and are discussed in section 3. Sys-
tematic mistakes not due to algorithmic-level failure do call
into question whether the intentional-level description of
behavior is consistent with the Panglossian assumption of
perfect rationality – provided the normative model being
applied is not inappropriate (see sect. 4) or that the subject
has not arrived at a different, intellectually defensible in-
terpretation of the task (see sect. 5).
In several studies, we have found very little evidence for
the strong version of the performance error view. With vir-
tually all of the tasks from the heuristics and biases litera-
ture that we have examined, there is considerable internal
consistency. Further, at least for certain classes of task,
there are significant cross-task correlations. For example, in
two different studies (Stanovich & West 1998c) we found
correlations in the range of .25 to .40 (considerably higher
when corrected for attenuation) among the following mea-
sures:
1. Nondeontic versions of Wason’s (1966) selection task:
The subject is shown four cards lying on a table showing two
letters and two numbers (A, D, 3, 7). They are told that each
card has a number on one side and a letter on the other and
that the experimenter has the following rule (of the if P,
then Q type) in mind with respect to the four cards: “If
there is an A on one side then there is a 3 on the other.” The
subject is then told that he/she must turn over whichever
cards are necessary to determine whether the experi-
menter’s rule is true or false. Only a small number of sub-
jects make the correct selections of the A card (P) and 7
card (not-Q) and, as a result, the task has generated a sub-
stantial literature (Evans et al. 1993; Johnson-Laird 1999;
Newstead & Evans 1995).
2. A syllogistic reasoning task in which logical validity
conflicted with the believability of the conclusion (see
Evans et al. 1983). An example item is: All mammals walk.
Whales are mammals. Conclusion: Whales walk.
3. Statistical reasoning problems of the type studied by
the Nisbett group (e.g., Fong et al. 1986) and inspired by the
finding that human judgment is overly influenced by vivid
but unrepresentative personal and case evidence and under-
influenced by more representative and diagnostic, but pallid,
statistical evidence. The quintessential problem involves
choosing between contradictory car purchase recommen-
dations – one from a large-sample survey of car buyers and
the other the heartfelt and emotional testimony of a single
friend.
4. A covariation detection task modeled on the work of
Wasserman et al. (1990). Subjects evaluated data derived
from a 2 3 2 contingency matrix.
5. A hypothesis testing task modeled on Tschirgi (1980)
in which the score on the task was the number of times sub-
jects attempted to test a hypothesis in a manner that con-
founded variables.
6. A measure of outcome bias modeled on the work of
Baron and Hershey (1988). This bias is demonstrated when
subjects rate a decision with a positive outcome as superior
to a decision with a negative outcome even when the infor-
mation available to the decision maker was the same in both
cases.
7. A measure of if/only thinking bias (Epstein et al. 1992;
Miller et al. 1990). If/only bias refers to the tendency for
people to have differential responses to outcomes based on
the differences in counterfactual alternative outcomes that
might have occurred. The bias is demonstrated when sub-
jects rate a decision leading to a negative outcome as worse
than a control condition when the former makes it easier to
imagine a positive outcome occurring.
8. An argument evaluation task (Stanovich & West 1997)
that tapped reasoning skills of the type studied in the in-
formal reasoning literature (Baron 1995; Klaczynski et al.
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 647
1997; Perkins et al. 1991). Important to see, it was designed
so that to do well on it one had to adhere to a stricture not
to implicate prior belief in the evaluation of the argument.
3. Computational limitations
Patterns of individual differences have implications that ex-
tend beyond testing the view that discrepancies between
descriptive models and normative models arise entirely
from performance errors. For example, patterns of individ-
ual differences also have implications for prescriptive mod-
els of rationality. Prescriptive models specify how reasoning
should proceed given the limitations of the human cogni-
tive apparatus and the situational constraints (e.g., time
pressure) under which the decision maker operates (Baron
1985). Thus, normative models might not always be pre-
scriptive for a given individual and situation. Judgments
about the rationality of actions and beliefs must take into
account the resource-limited nature of the human cognitive
apparatus (Cherniak 1986; Goldman 1978; Harman 1995;
Oaksford & Chater 1993; 1995; 1998; Stich 1990). More
colloquially, Stich (1990) has argued that “it seems simply
perverse to judge that subjects are doing a bad job of rea-
soning because they are not using a strategy that requires a
brain the size of a blimp” (p. 27).
Following Dennett (1987) and the taxonomy of Ander-
son (1990; see also Marr 1982; Newell 1982), we distinguish
the algorithmic/design level from the rational/intentional
level of analysis in cognitive science (the first term in each
pair is that preferred by Anderson, the second that pre-
ferred by Dennett). The latter provides a specification of
the goals of the system’s computations (what the system is
attempting to compute and why). At this level, we are con-
cerned with the goals of the system, beliefs relevant to those
goals, and the choice of action that is rational given the sys-
tem’s goals and beliefs (Anderson 1990; Bratman et al.
1991; Dennett 1987; Newell 1982; 1990; Pollock 1995).
However, even if all humans were optimally rational at the
intentional level of analysis, there may still be computa-
tional limitations at the algorithmic level (e.g., Cherniak
1986; Goldman 1978; Oaksford & Chater 1993; 1995). We
would, therefore, still expect individual differences in ac-
tual performance (despite equal rational-level compe-
tence) due to differences at the algorithmic level.
Using such a framework, we view the magnitude of the
correlation between performance on a reasoning task and
cognitive capacity as an empirical clue about the impor-
tance of algorithmic limitations in creating discrepancies
between descriptive and normative models. A strong cor-
relation suggests important algorithmic-level limitations
that might make the normative response not prescriptive
for those of lower cognitive capacity (Panglossian theorists
drawn to this alternative explanation of normative/descrip-
tive gaps were termed Apologists by Stanovich 1999.) In
contrast, the absence of a correlation between the norma-
tive response and cognitive capacity suggests no computa-
tional limitation and thus no reason why the normative re-
sponse should not be considered prescriptive (see Baron
1985).
In our studies, we have operationalized cognitive capac-
ity in terms of well-known cognitive ability (intelligence)
and academic aptitude tasks
2
but have most often used the
total score on the Scholastic Assessment Test.
3,4
All are
known to load highly on psychometric g (Carpenter et al.
1990; Carroll 1993; Matarazzo 1972), and such measures
have been linked to neurophysiological and information-
processing indicators of efficient cognitive computation
(Caryl 1994; Deary 1995; Deary & Stough 1996; Detterman
1994; Fry & Hale 1996; Hunt 1987; Stankov & Dunn 1993;
Vernon 1991; 1993). Furthermore, measures of general in-
telligence have been shown to be linked to virtually all of
the candidate subprocesses of mentality that have been
posited as determinants of cognitive capacity (Carroll
1993). For example, working memory is the quintessential
component of cognitive capacity (in theories of com-
putability, computational power often depends on memory
for the results of intermediate computations). Consistent
with this interpretation, Bara et al. (1995) have found that
“as working memory improves – for whatever reason – it en-
ables deductive reasoning to improve too” (p. 185). But it
has been shown that, from a psychometric perspective, vari-
ation in working memory is almost entirely captured by
measures of general intelligence (Kyllonen 1996; Kyllonen
& Christal 1990).
Measures of general cognitive ability such as those used
in our research are direct marker variables for Spearman’s
(1904; 1927) positive manifold – that performance on all
reasoning tasks tends to be correlated. Below, we will illus-
trate how we use this positive manifold to illuminate rea-
sons for the normative/descriptive gap.
Table 1 indicates the magnitude of the correlation be-
tween one such measure – Scholastic Assessment Test total
scores – and the eight different reasoning tasks studied by
Stanovich and West (1998c, Experiments 1 and 2) and men-
tioned in the previous section. In Experiment 1, syllogistic
reasoning in the face of interfering content displayed the
highest correlation (.470) and the other three correlations
were roughly equal in magnitude (.347 to .394). All were
statistically significant (p , .001). The remaining correla-
tions in the table are the results from a replication and ex-
tension experiment. Three of the four tasks from the previ-
ous experiment were carried over (all but the selection task)
and displayed correlations similar in magnitude to those ob-
tained in the first experiment. The correlations involving
Stanovich & West: Individual differences in reasoning
648 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
Table 1. Correlations between the reasoning tasks
and Scholastic Assessment Test total scores
in the Stanovich and West (1998c) studies
Experiment 1
Syllogisms .470**
Selection task .394**
Statistical reasoning .347**
Argument evaluation task .358**
Experiment 2
Syllogisms .410**
Statistical reasoning .376**
Argument evaluation task .371**
Covariation detection .239**
Hypothesis testing bias 2.223**
Outcome bias 2.172**
If/only thinking 2.208**
Composite score .547**
**5 p , .001, all two-tailed
Ns 5 178 to 184 in Experiment 1 and 527 to 529 in Experiment 2
the four new tasks introduced in Experiment 2 were also all
statistically significant. The sign on the hypothesis testing,
outcome bias, and if/only thinking tasks was negative be-
cause high scores on these tasks reflect susceptibility to
nonnormative cognitive biases. The correlations on the four
new tasks were generally lower (.172 to .239) than the cor-
relations involving the other tasks (.371 to .410). The scores
on all of the tasks in Experiment 2 were standardized and
summed to yield a composite score. The composite’s corre-
lation with Scholastic Assessment Test scores was .547. It
thus appears that, to a moderate extent, discrepancies be-
tween actual performance and normative models can be ac-
counted for by variation in computational limitations at the
algorithmic level – at least with respect to the tasks investi-
gated in these particular experiments.
However, there are some tasks in the heuristics and bi-
ases literature that lack any association at all with cognitive
ability. The so-called false consensus effect in the opinion
prediction paradigm (Krueger & Clement 1994; Krueger &
Zeiger 1993) displays complete dissociation with cognitive
ability (Stanovich 1999; Stanovich & West 1998c). Like-
wise, the overconfidence effect in the knowledge calibra-
tion paradigm (e.g., Lichtenstein et al. 1982) displays a neg-
ligible correlation with cognitive ability (Stanovich 1999;
Stanovich & West 1998c).
Collectively, these results indicate that computational
limitations seem far from absolute. That is, although com-
putational limitations appear implicated to some extent in
many of the tasks, the normative responses for all of them
were computed by some university students who had mod-
est cognitive abilities (e.g., below the mean in a university
sample). Such results help to situate the relationship be-
tween prescriptive and normative models for the tasks in
question because the boundaries of prescriptive recom-
mendations for particular individuals might be explored by
examining the distribution of the cognitive capacities of in-
dividuals who gave the normative response on a particular
task. For most of these tasks, only a small number of the stu-
dents with the very lowest cognitive ability in this sample
would have prescriptive models that deviated substantially
from the normative model for computational reasons. Such
findings also might be taken to suggest that perhaps other
factors might account for variation – a prediction that will
be confirmed when work on styles of epistemic regulation
is examined in section 7. Of course, the deviation between
the normative and prescriptive model due to computational
limitations will certainly be larger in unselected or nonuni-
versity populations. This point also serves to reinforce the
caveat that the correlations observed in Table 1 were un-
doubtedly attenuated due to restriction of range in the sam-
ple. Nevertheless, if the normative/prescriptive gap is in-
deed modest, then there may well be true individual
differences at the intentional level – that is, true individual
differences in rational thought.
All of the camps in the dispute about human rationality
recognize that positing computational limitations as an ex-
planation for differences between normative and descriptive
models is a legitimate strategy. Meliorists agree on the im-
portance of assessing such limitations. Likewise, Panglos-
sians will, when it is absolutely necessary, turn themselves
into Apologists to rescue subjects from the charge of irra-
tionality. Thus, they too acknowledge the importance of as-
sessing computational limitations. In the next section, how-
ever, we examine an alternative explanation of the normative/
descriptive gap that is much more controversial – the notion
that inappropriate normative models have been applied to
certain tasks in the heuristics and biases literature.
4. Applying the wrong normative model
The possibility of incorrect norm application arises because
psychologists must appeal to the normative models of other
disciplines (statistics, logic, etc.) in order to interpret the re-
sponses on various tasks, and these models must be applied
to a particular problem or situation. Matching a problem to
a normative model is rarely an automatic or clear-cut pro-
cedure. The complexities involved in matching problems to
norms make possible the argument that the gap between
the descriptive and normative occurs because psychologists
are applying the wrong normative model to the situation. It
is a potent strategy for the Panglossian theorist to use
against the advocate of Meliorism, and such claims have be-
come quite common in critiques of the heuristics and biases
literature:
Many critics have insisted that in fact it is Kahneman & Tver-
sky, not their subjects, who have failed to grasp the logic of the
problem. (Margolis 1987, p. 158)
If a “fallacy” is involved, it is probably more attributable to
the researchers than to the subjects. (Messer & Griggs 1993, p.
195)
When ordinary people reject the answers given by normative
theories, they may do so out of ignorance and lack of expertise,
or they may be signaling the fact that the normative theory is
inadequate. (Lopes 1981, p. 344)
In the examples of alleged base rate fallacy considered by
Kahneman and Tversky, they, and not their experimental sub-
jects, commit the fallacies. (Levi 1983, p. 502)
What Wason and his successors judged to be the wrong re-
sponse is in fact correct. (Wetherick 1993, p. 107)
Perhaps the only people who suffer any illusion in relation to
cognitive illusions are cognitive psychologists. (Ayton & Hard-
man 1997, p. 45)
These quotations reflect the numerous ongoing critiques of
the heuristics and biases literature in which it is argued that
the wrong normative standards have been applied to per-
formance. For example, Lopes (1982) has argued that the
literature on the inability of human subjects to generate
random sequences (e.g., Wagenaar 1972) has adopted a
narrow concept of randomness that does not acknowledge
broader conceptions that are debated in the philosophy and
mathematics literature. Birnbaum (1983) has demon-
strated that conceptualizing the well-known taxicab base-
rate problem (see Bar-Hillel 1980; Tversky & Kahneman
1982) within a signal-detection framework can lead to dif-
ferent estimates from those assumed to be normatively cor-
rect under the less flexible Bayesian model that is usually
applied. Gigerenzer (1991a; 1991b; 1993; Gigerenzer et al.
1991) has argued that the overconfidence effect in knowl-
edge calibration experiments (Lichtenstein et al. 1982) and
the conjunction effect in probability judgment (Tversky &
Kahneman 1983) have been mistakenly classified as cogni-
tive biases because of the application of an inappropriate
normative model of probability assessment (i.e., requests
for single-event subjective judgments, when under some
conceptions of probability such judgments are not subject
to the rules of a probability calculus). Dawes (1989; 1990)
and Hoch (1987) have argued that social psychologists have
too hastily applied an overly simplified normative model in
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 649
labeling performance in opinion prediction experiments as
displaying a so-called false consensus (see also Krueger &
Clement 1994; Krueger & Zeiger 1993).
4.1. From the descriptive to the normative
in reasoning and decision making
The cases just mentioned provide examples of how the ex-
istence of deviations between normative models and actual
human reasoning has been called into question by casting
doubt on the appropriateness of the normative models used
to evaluate performance. Stein (1996, p. 239) terms this the
“reject-the-norm” strategy. It is noteworthy that this strat-
egy is frequently used by the Panglossian camp in the ra-
tionality debate, although this connection is not a necessary
one. Specifically, Panglossians, exclusively used the reject-
the-norm-application strategy to eliminate gaps between
descriptive models of performance and normative models.
When this type of critique is employed, the normative
model that is suggested as a substitute for the one tradi-
tionally used in the heuristics and biases literature is one
that coincides perfectly with the descriptive model of the
subjects’ performance – thus preserving a view of human
rationality as ideal. It is rarely noted that the strategy could
be used in just the opposite way – to create gaps between
the normative and descriptive. Situations where the modal
response coincides with the standard normative model
could be critiqued, and alternative models could be sug-
gested that would result in a new normative/descriptive
gap. But this is never done. The Panglossian camp, often
highly critical of empirical psychologists (“Kahneman and
Tversky . . . and not their experimental subjects, commit
the fallacies”; Levi, 1983, p. 502), is never critical of psy-
chologists who design reasoning tasks in instances where
the modal subject gives the response the experimenters
deem correct. Ironically, in these cases, according to the
Panglossians, the same psychologists seem never to err in
their task designs and interpretations.
The fact that the use of the reject-the-norm-application
strategy is entirely contingent on the existence or nonexis-
tence of a normative/descriptive gap suggests that, at least
for Panglossians, the strategy is empirically, not conceptu-
ally, triggered (normative applications are never rejected
for purely conceptual reasons when they coincide with the
modal human response). What this means is that in an im-
portant sense the applications of norms being endorsed by
the Panglossian camp are conditioned (if not indexed en-
tirely) by descriptive facts about human behavior. The ra-
tionality debate itself is, reflexively, evidence that the de-
scriptive models of actual behavior condition expert notions
of the normative. That is, there would have been no debate
(or at least much less of one) had people behaved in accord
with the then-accepted norms.
Gigerenzer (1991b) is clear about his adherence to an
empirically-driven reject-the-norm-application strategy:
Since its origins in the mid-seventeenth century.... when
there was a striking discrepancy between the judgment of rea-
sonable men and what probability theory dictated – as with the
famous St. Petersburg paradox – then the mathematicians
went back to the blackboard and changed the equations (Das-
ton 1980). Those good old days have gone. . . . If, in studies on
social cognition, researchers find a discrepancy between hu-
man judgment and what probability theory seems to dictate,
the blame is now put on the human mind, not the statistical
model. (p. 109)
One way of framing the current debate between the Pan-
glossians and Meliorists is to observe that the Panglossians
wish for a return of the “good old days” where the normative
was derived from the intuitions of the untutored layperson
(“an appeal to people’s intuitions is indispensable”; Cohen,
1981, p. 318); whereas the Meliorists (with their greater em-
phasis on the culturally constructed nature of norms) view
the mode of operation during the “good old days” as a con-
tingent fact of history – the product of a period when few as-
pects of epistemic and pragmatic rationality had been codi-
fied and preserved for general diffusion through education.
Thus, the Panglossian reject-the-norm-application view
can in essence be seen as a conscious application of the nat-
uralistic fallacy (deriving ought from is). For example, Co-
hen (1981), like Gigerenzer, feels that the normative is in-
dexed to the descriptive in the sense that a competence
model of actual behavior can simply be interpreted as the
normative model. Stein (1996) notes that proponents of this
position believe that the normative can simply be “read off
from a model of competence because “whatever human
reasoning competence turns out to be, the principles em-
bodied in it are the normative principles of reasoning”
(p. 231). Although both endorse this linking of the norma-
tive to the descriptive, Gigerenzer (1991b) and Cohen
(1981) do so for somewhat different reasons. For Cohen
(1981), it follows from his endorsement of narrow reflective
equilibrium as the sine qua non of normative justification.
Gigerenzer’s (1991b) endorsement is related to his position
in the “cognitive ecologist” camp (to use Piattelli-Pal-
marini’s 1994, p. 183 term) with its emphasis on the ability
of evolutionary mechanisms to achieve an optimal
Brunswikian tuning of the organism to the local environ-
ment (Brase et al. 1998; Cosmides & Tooby 1994; 1996;
Oaksford & Chater 1994; 1998; Pinker 1997).
That Gigerenzer and Cohen concur here – even though
they have somewhat different positions on normative justi-
fication – simply shows how widespread is the acceptance
of the principle that descriptive facts about human behav-
ior condition our notions about the appropriateness of the
normative models used to evaluate behavior. In fact, stated
in such broad form, this principle is not restricted to the
Panglossian position. For example, in decision science,
there is a long tradition of acknowledging descriptive influ-
ences when deciding which normative model to apply to a
particular situation. Slovic (1995) refers to this “deep inter-
play between descriptive phenomena and normative prin-
ciples” (p. 370). Larrick et al. (1993) have reminded us that
“there is also a tradition of justifying, and amending, nor-
mative models in response to empirical considerations”
(p. 332). March (1988) refers to this tradition when he dis-
cusses how actual human behavior has conditioned models
of efficient problem solving in artificial intelligence and in
the area of organizational decision making. The assump-
tions underlying the naturalistic project in epistemology
(e.g., Kornblith 1985; 1993) have the same implication –
that findings about how humans form and alter beliefs
should have a bearing on which normative theories are cor-
rectly applied when evaluating the adequacy of belief ac-
quisition. This position is in fact quite widespread:
If people’s (or animals’) judgments do not match those pre-
dicted by a normative model, this may say more about the need
for revising the theory to more closely describe subjects’ cog-
nitive processes than it says about the adequacy of those pro-
cesses. (Alloy & Tabachnik 1984, p. 140)
Stanovich & West: Individual differences in reasoning
650 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
We must look to what people do in order to gather materials
for epistemic reconstruction and self-improvement. (Kyburg
1991, p. 139)
When ordinary people reject the answers given by normative
theories, they may do so out of ignorance and lack of expertise,
or they may be signaling the fact that the normative theory is
inadequate. (Lopes 1981, p. 344)
Of course, in this discussion we have conjoined disparate
views that are actually arrayed on a continuum. The reject-
the-norm advocates represent the extreme form of this
view – they simply want to read off the normative from the
descriptive: “the argument under consideration here re-
jects the standard picture of rationality and takes the rea-
soning experiments as giving insight not just into human
reasoning competence but also into the normative princi-
ples of reasoning” (Stein 1996, p. 233). In contrast, other
theorists (e.g., March 1988) simply want to subtly fine-tune
and adjust normative applications based on descriptive
facts about reasoning performance.
One thing that all of the various camps in the rationality
dispute have in common is that each conditions their beliefs
about the appropriate norm to apply based on the central
tendency of the responses to a problem. They all seem to
see that single aspect of performance as the only descrip-
tive fact that is relevant to conditioning their views about
the appropriate normative model to apply. For example, ad-
vocates of the reject-the-norm-application strategy for
dealing with normative/descriptive discrepancies view the
mean, or modal, response as a direct pointer to the appro-
priate normative model. One goal of the present research
program is to expand the scope of the descriptive informa-
tion used to condition our views about appropriate applica-
tions of norms.
4.2. Putting descriptive facts to work:The
understanding/acceptance assumption
How should we interpret situations where the majority of
individuals respond in ways that depart from the normative
model applied to the problem by reasoning experts? Tha-
gard (1982) calls the two different interpretations the pop-
ulist strategy and the elitist strategy: “The populist strategy,
favored by Cohen (1981), is to emphasize the reflective
equilibrium of the average person. . . . The elitist strategy,
favored by Stich and Nisbett (1980), is to emphasize the re-
flective equilibrium of experts” (p. 39). Thus, Thagard
(1982) identifies the populist strategy with the Panglossian
position and the elitist strategy with the Meliorist position.
But there are few controversial tasks in the heuristics and
biases literature where all untutored laypersons disagree
with the experts. There are always some who agree. Thus,
the issue is not the untutored average person versus experts
(as suggested by Thagard’s formulation), but experts plus
some laypersons versus other untutored individuals. Might
the cognitive characteristics of those departing from expert
opinion have implications for which normative model we
deem appropriate? Larrick et al. (1993) made just such an
argument in their analysis of what justified the cost-benefit
reasoning of microeconomics: “Intelligent people would be
more likely to use cost-benefit reasoning. Because intelli-
gence is generally regarded as being the set of psychologi-
cal properties that makes for effectiveness across environ-
ments . . . intelligent people should be more likely to use
the most effective reasoning strategies than should less in-
telligent people” (p. 333). Larrick et al. (1993) are alluding
to the fact that we may want to condition our inferences
about appropriate norms based not only on what response
the majority of people make but also on what response the
most cognitively competent subjects make.
Slovic and Tversky (1974) made essentially this argument
years ago, although it was couched in very different terms
in their paper and thus was hard to discern. Slovic and Tver-
sky (1974) argued that descriptive facts about argument en-
dorsement should condition the inductive inferences of ex-
perts regarding appropriate normative principles. In
response to the argument that there is “no valid way to dis-
tinguish between outright rejection of the axiom and failure
to understand it” (p. 372), Slovic and Tversky (1974) ob-
served that “the deeper the understanding of the axiom, the
greater the readiness to accept it” (pp. 372 73). Thus, a
correlation between understanding and acceptance would
suggest that the gap between the descriptive and normative
was due to an initial failure to fully process and/or under-
stand the task.
We might call Slovic and Tversky’s argument the under-
standing/acceptance assumption – that more reflective and
engaged reasoners are more likely to affirm the appropriate
normative model for a particular situation. From their un-
derstanding/acceptance principle, it follows that if greater
understanding resulted in more acceptance of the axiom,
then the initial gap between the normative and descriptive
would be attributed to factors that prevented problem un-
derstanding (e.g., lack of ability or reflectiveness on the part
of the subject). Such a finding would increase confidence in
the normative appropriateness of the axioms and/or in their
application to a particular problem. In contrast, if better un-
derstanding failed to result in greater acceptance of the ax-
iom, then its normative status for that particular problem
might be considered to be undermined.
Using their understanding/acceptance principle, Slovic
and Tversky (1974) examined the Allais (1953) problem and
found little support for the applicability of the indepen-
dence axiom of utility theory (the axiom stating that if the
outcome in some state of the world is the same across op-
tions, then that state of the world should be ignored; Baron
1993; Savage 1954). When presented with arguments to ex-
plicate both the Allais (1953) and Savage (1954) positions,
subjects found the Allais argument against independence at
least as compelling and did not tend to change their task be-
havior in the normative direction (see MacCrimmon 1968;
MacCrimmon & Larsson 1979 for more mixed results on
the independence axiom using related paradigms). Al-
though Slovic and Tversky (1974) failed to find support for
this particular normative application, they presented a prin-
ciple that may be of general usefulness in theoretical de-
bates about why human performance deviates from nor-
mative models. The central idea behind Slovic and
Tversky’s (1974) development of the understanding/accep-
tance assumption is that increased understanding should
drive performance in the direction of the truly normative
principle for the particular situation – so that the direction
that performance moves in response to increased under-
standing provides an empirical clue as to what is the proper
normative model to be applied.
One might conceive of two generic strategies for apply-
ing the understanding/acceptance principle based on the
fact that variation in understanding can be created or it can
be studied by examining naturally occurring individual dif-
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 651
ferences. Slovic and Tversky employed the former strategy
by providing subjects with explicated arguments supporting
the Allais or Savage normative interpretation (see also Do-
herty et al. 1981; Stanovich & West 1999). Other methods
of manipulating understanding have provided consistent
evidence in favor of the normative principle of descriptive
invariance (see Kahneman & Tversky 1984). For example,
it has been found that being forced to take more time or to
provide a rationale for selections increases adherence to de-
scriptive invariance (Larrick et al. 1992; Miller & Fagley
1991; Sieck & Yates 1997; Takemura 1992; 1993; 1994).
Moshman and Geil (1998) found that group discussion fa-
cilitated performance on Wason’s selection task.
As an alternative to manipulating understanding, the un-
derstanding/acceptance principle can be transformed into
an individual differences prediction. For example, the prin-
ciple might be interpreted as indicating that more reflec-
tive, engaged, and intelligent reasoners are more likely to
respond in accord with normative principles. Thus, it might
be expected that those individuals with cognitive/personal-
ity characteristics more conducive to deeper understanding
would be more accepting of the appropriate normative
principles for a particular problem. This was the emphasis
of Larrick et al. (1993) when they argued that more intelli-
gent people should be more likely to use cost-benefit prin-
ciples. Similarly, need for cognition – a dispositional vari-
able reflecting the tendency toward thoughtful analysis and
reflective thinking – has been associated with aspects of
epistemic and practical rationality (Cacioppo et al. 1996;
Kardash & Scholes 1996; Klaczynski et al. 1997; Smith &
Levin 1996; Verplanken 1993). This particular application
of the understanding/acceptance principle derives from the
assumption that a normative/descriptive gap that is dispro-
portionately created by subjects with a superficial under-
standing of the problem provides no warrant for amending
the application of standard normative models.
4.3. Tacit acceptance of the understanding/acceptance
principle as a mechanism for adjudicating disputes
about the appropriate normative models to apply
It is important to point out that many theorists on all sides
of the rationality debate have acknowledged the force of the
understanding/acceptance argument (without always label-
ing the argument as such or citing Slovic & Tversky 1974).
For example, Gigerenzer and Goldstein (1996) lament the
fact that Apologists who emphasize Simon’s (1956; 1957;
1983) concept of bounded rationality seemingly accept the
normative models applied by the heuristics and biases the-
orists by their assumption that, if computational limitations
were removed, individuals’ responses would indeed be
closer to the behavior those models prescribe.
Lopes and Oden (1991) also wish to deny this tacit as-
sumption in the literature on computational limitations:
“discrepancies between data and model are typically attrib-
uted to people’s limited capacity to process information....
There is, however, no support for the view that people would
choose in accord with normative prescriptions if they were
provided with increased capacity” (pp. 208209). In stress-
ing the importance of the lack of evidence for the notion that
people would “choose in accord with normative prescrip-
tions if they were provided with increased capacity” (p. 209),
Lopes and Oden (1991) acknowledge the force of the indi-
vidual differences version of the understanding/acceptance
principle – because examining variation in cognitive ability
is just that: looking at what subjects who have “increased ca-
pacity” actually do with that increased capacity.
In fact, critics of the heuristics and biases literature have
repeatedly drawn on an individual differences version of
the understanding/acceptance principle to bolster their cri-
tiques. For example, Cohen (1982) critiques the older
“bookbag and poker chip” literature on Bayesian conser-
vatism (Phillips & Edwards 1966; Slovic et al. 1977) by not-
ing that “if so-called ‘conservatism’ resulted from some
inherent inadequacy in people’s information-processing
systems one might expect that, when individual differences
in information-processing are measured on independently
attested scales, some of them would correlate with degrees
of ‘conservatism.’ In fact, no such correlation was found by
Alker and Hermann (1971). And this is just what one would
expect if ‘conservatism’ is not a defect, but a rather deeply
rooted virtue of the system” (pp. 25960). This is precisely
how Alker and Hermann (1971) themselves argued in their
paper:
Phillips et al. (1966) have proposed that conservatism is the re-
sult of intellectual deficiencies. If this is the case, variables such
as rationality, verbal intelligence, and integrative complexity
should have related to deviation from optimality – more ratio-
nal, intelligent, and complex individuals should have shown less
conservatism. (p. 40)
Wetherick (1971; 1995) has been a critic of the standard
interpretation of the four-card selection task (Wason 1966)
for over 25 years. As a Panglossian theorist, he has been at
pains to defend the modal response chosen by roughly 50%
of the subjects (the P and Q cards). As did Cohen (1982)
and Lopes and Oden (1991), Wetherick (1971) points to the
lack of associations with individual differences to bolster his
critique of the standard interpretation of the task: “in Wa-
son’s experimental situation subjects do not choose the not-
Q card nor do they stand and give three cheers for the
Queen, neither fact is interesting in the absence of a plau-
sible theory predicting that they should.... If it could be
shown that subjects who choose not-Q are more intelligent
or obtain better degrees than those who do not this would
make the problem worth investigation, but I have seen no
evidence that this is the case” (Wetherick 1971, p. 213).
Funder (1987), like Cohen (1982) and Wetherick (1971),
uses a finding about individual differences to argue that a
particular attribution bias is not necessarily produced by a
process operating suboptimally. Block and Funder (1986)
analyzed the role effect observed by Ross et al. (1977): that
people rated questioners more knowledgeable than con-
testants in a quiz game. Although the role effect is usually
viewed as an attributional error – people allegedly failed to
consider the individual’s role when estimating the knowl-
edge displayed – Block and Funder (1986) demonstrated
that subjects most susceptible to this attributional “error”
were more socially competent, more well adjusted, and
more intelligent. Funder (1987) argued that “manifestation
of this ‘error,’ far from being a symptom of social malad-
justment, actually seems associated with a degree of com-
petence” (p. 82) and that the so-called error is thus proba-
bly produced by a judgmental process that is generally
efficacious. In short, the argument is that the signs of the
correlations with the individual difference variables point
in the direction of the response that is produced by pro-
cesses that are ordinarily useful.
Thus, Funder (1987), Lopes and Oden (1991), Wether-
Stanovich & West: Individual differences in reasoning
652 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
ick (1971), and Cohen (1982) all have recourse to patterns
of individual differences (or the lack of such patterns) to
pump our intuitions (Dennett 1980) in the direction of un-
dermining the standard interpretations of the tasks under
consideration. In other cases, however, examining individ-
ual differences may actually reinforce confidence in the ap-
propriateness of the normative models applied to problems
in the heuristics and biases literature.
4.4. The understanding/acceptance principle
and Spearman’s positive manifold
With these arguments in mind, it is thus interesting to note
that the direction of all of the correlations displayed in
Table 1 is consistent with the standard normative models
used by psychologists working in the heuristics and biases
tradition. The directionality of the systematic correlations
with intelligence is embarrassing for those reject-the-norm-
application theorists who argue that norms are being incor-
rectly applied if we interpret the correlations in terms of the
understanding/acceptance principle (a principle which, as
seen in sect. 4.3, is endorsed in various forms by a host of
Panglossian critics of the heuristics and biases literature).
Surely we would want to avoid the conclusion that individ-
uals with more computational power are systematically
computing the nonnormative response. Such an outcome
would be an absolute first in a psychometric field that is one
hundred years and thousands of studies old (Brody 1997;
Carroll 1993; 1997; Lubinski & Humphreys 1997; Neisser
et al. 1996; Sternberg & Kaufman 1998). It would mean
that Spearman’s (1904; 1927) positive manifold for cogni-
tive tasks – virtually unchallenged for one hundred
years – had finally broken down. Obviously, parsimony dic-
tates that positive manifold remains a fact of life for cogni-
tive tasks and that the response originally thought to be nor-
mative actually is.
In fact, it is probably helpful to articulate the under-
standing/acceptance principle somewhat more formally in
terms of positive manifold – the fact that different measures
of cognitive ability almost always correlate with each other
(see Carroll 1993; 1997). The individual differences version
of the understanding/acceptance principle puts positive
manifold to use in areas of cognitive psychology where the
nature of the appropriate normative model to apply is in
dispute. The point is that scoring a vocabulary item on a
cognitive ability test and scoring a probabilistic reasoning
response on a task from the heuristics and biases literature
are not the same. The correct response in the former task
has a canonical interpretation agreed upon by all investiga-
tors; whereas the normative appropriateness of responses
on tasks from the latter domain has been the subject of ex-
tremely contentious dispute (Cohen 1981; 1982; 1986; Cos-
mides & Tooby 1996; Einhorn & Hogarth 1981; Gigeren-
zer 1991a; 1993; 1996a; Kahneman & Tversky 1996;
Koehler 1996; Stein 1996). Positive manifold between the
two classes of task would only be expected if the normative
model being used for directional scoring of the tasks in the
latter domain is correct.
5
Likewise, given that positive man-
ifold is the norm among cognitive tasks, the negative corre-
lation (or, to a lesser extent, the lack of a correlation) be-
tween a probabilistic reasoning task and more standard
cognitive ability measures might be taken as a signal that
the wrong normative model is being applied to the former
task or that there are alternative models that are equally ap-
propriate. The latter point is relevant because the pattern
of results in our studies has not always mirrored the posi-
tive manifold displayed in Table 1. We have previously
mentioned the false-consensus effect and overconfidence
effect as such examples, and further instances are discussed
in the next section.
4.5. Noncausal base rates
The statistical reasoning problems utilized in the experi-
ments discussed so far (those derived from Fong et al. 1986)
involved causal aggregate information, analogous to the
causal base rates discussed by Ajzen (1977) and Bar-Hillel
(1980; 1990) – that is, base rates that had a causal relation-
ship to the criterion behavior. Noncausal base-rate prob-
lems – those involving base rates with no obvious causal re-
lationship to the criterion behavior – have had a much more
controversial history in the research literature. They have
been the subject of over a decade’s worth of contentious dis-
pute (Bar-Hillel 1990; Birnbaum 1983; Cohen 1979; 1982;
1986; Cosmides & Tooby 1996; Gigerenzer 1991b; 1993;
1996a; Gigerenzer & Hoffrage 1995; Kahneman & Tversky
1996; Koehler 1996; Kyburg 1983; Levi 1983; Macchi
1995) – important components of which have been articu-
lated in this journal (e.g., Cohen 1981; 1983; Koehler 1996;
Krantz 1981; Kyburg 1983; Levi 1983).
In several experiments, we have examined some of the
noncausal base-rate problems that are notorious for pro-
voking philosophical dispute. One was an AIDS testing
problem modeled on Casscells et al. (1978):
Imagine that AIDS occurs in one in every 1,000 people. Imag-
ine also there is a test to diagnose the disease that always gives
a positive result when a person has AIDS. Finally, imagine that
the test has a false positive rate of 5 percent. This means that
the test wrongly indicates that AIDS is present in 5 percent
of the cases where the person does not have AIDS. Imagine
that we choose a person randomly, administer the test, and that
it yields a positive result (indicates that the person has AIDS).
What is the probability that the individual actually has AIDS,
assuming that we know nothing else about the individual’s per-
sonal or medical history?
The Bayesian posterior probability for this problem is
slightly less than .02. In several analyses and replications
(see Stanovich 1999; Stanovich & West 1998c) in which we
have classified responses of less than 10% as Bayesian, re-
sponses of over 90% as indicating strong reliance on indi-
cant information, and responses between 10% and 90% as
intermediate, we have found that subjects giving the indi-
cant response were higher in cognitive ability than those
giving the Bayesian response.
6
Additionally, when tested on
causal base-rate problems (e.g., Fong et al. 1986), the great-
est base-rate usage was displayed by the group highly re-
liant on the indicant information in the AIDS problem. The
subjects giving the Bayesian answer on the AIDS problem
were least reliant on the aggregate information in the causal
statistical reasoning problems.
A similar violation of the expectation of positive manifold
was observed on the notorious cab problem (see Bar-Hillel
1980; Lyon & Slovic 1976; Tversky & Kahneman 1982) –
also the subject of almost two decades-worth of dispute:
A cab was involved in a hit-and-run accident at night. Two cab
companies, the Green and the Blue, operate in the city in which
the accident occurred. You are given the following facts: 85 per-
cent of the cabs in the city are Green and 15 percent are Blue.
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 653
A witness identified the cab as Blue. The court tested the reli-
ability of the witness under the same circumstances that existed
on the night of the accident and concluded that the witness cor-
rectly identified each of the two colors 80 percent of the time.
What is the probability that the cab involved in the accident was
Blue?
Bayes’s rule yields .41 as the posterior probability of the cab
being blue. Thus, responses over 70% were classified as re-
liant on indicant information, responses between 30% and
70% as Bayesian, and responses less than 30% as reliant on
base-rate information. Again, it was found that subjects giv-
ing the indicant response were higher in cognitive ability
and need for cognition than those giving the Bayesian or
base-rate response (Stanovich & West 1998c; 1999). Fi-
nally, both the cabs problem and the AIDS problem were
subjected to the second of Slovic and Tversky’s (1974)
methods of operationalizing the understanding/acceptance
principle – presenting the subjects with arguments expli-
cating the traditional normative interpretation (Stanovich
& West 1999). On neither problem was there a strong ten-
dency for responses to move in the Bayesian direction sub-
sequent to explication.
The results from both problems indicate that the non-
causal base-rate problems display patterns of individual dif-
ferences quite unlike those shown on the causal aggregate
problems. On the latter, subjects giving the statistical re-
sponse (choosing the aggregate rather than the case or in-
dicant information) scored consistently higher on measures
of cognitive ability. This pattern did not hold for the AIDS
and cab problem where the significant differences were in
the opposite direction – subjects strongly reliant on the in-
dicant information scored higher on measures of cognitive
ability and were more likely to give the Bayesian response
on causal base-rate problems.
We examined the processing of noncausal base rates in
another task with very different task requirements (see
Stanovich 1999; Stanovich & West 1998d) – a selection task
in which individuals were not forced to compute a Bayesian
posterior, but instead simply had to indicate whether or not
they thought the base rate was relevant to their decision.
The task was taken from the work of Doherty and Mynatt
(1990). Subjects were given the following instructions:
Imagine you are a doctor. A patient comes to you with a red rash
on his fingers. What information would you want in order to di-
agnose whether the patient has the disease Digirosa? Below are
four pieces of information that may or may not be relevant to
the diagnosis. Please indicate all of the pieces of information
that are necessary to make the diagnosis, but only those pieces
of information that are necessary to do so.
Subjects then chose from the alternatives listed in the order:
percentage of people without Digirosa who have a red rash,
percentage of people with Digirosa, percentage of people
without Digirosa, and percentage of people with Digirosa
who have a red rash. These alternatives represented the
choices of P(D/~H), P(H), P(~H), and P(D/H), respectively.
The normatively correct choice of P(H), P(D/H), and
P(D/~H) was made by 13.4% of our sample. The most pop-
ular choice (made by 35.5% of the sample) was the two
components of the likelihood ratio, (P(D/H) and P(D/~H);
21.9% of the sample chose P(D/H) only; and 22.7% chose
the base rate, P(H), and the numerator of the likelihood ra-
tio, P(D/H) – ignoring the denominator of the likelihood
ratio, P(D/~H). Collapsed across these combinations, al-
most all subjects (96.0%) viewed P(D/H) as relevant and
very few (2.8%) viewed P(~H) as relevant. Overall, 54.3%
of the subjects deemed that P(D/~H) was necessary infor-
mation and 41.5% of the sample thought it was necessary
to know the base rate, P(H).
We examined the cognitive characteristics of the subjects
who thought the base rate was relevant and found that they
did not display higher Scholastic Assessment Test scores
than those who did not choose the base rate. The pattern of
individual differences was quite different for the denomi-
nator of the likelihood ratio, P(D/~H) – a component that
is normatively uncontroversial. Subjects seeing this infor-
mation as relevant had significantly higher Scholastic As-
sessment Test scores.
Interesting to see, in light of these patterns of individual
differences showing lack of positive manifold when the tasks
are scored in terms of the standard Bayesian approach, non-
causal base-rate problems like the AIDS and cab problems
have been the focus of intense debate in the literature (Co-
hen 1979; 1981; 1982; 1986; Koehler 1996; Kyburg 1983;
Levi 1983). Several authors have argued that a rote appli-
cation of the Bayesian formula to these problems is unwar-
ranted because noncausal base rates of the AIDS-problem
type lack relevance and reference-class specificity. Finally,
our results might also suggest that the Bayesian subjects on
the AIDS problem might not actually be arriving at their re-
sponse through anything resembling Bayesian processing
(whether or not they were operating in a frequentist mode;
Gigerenzer & Hoffrage 1995), because on causal aggregate
statistical reasoning problems these subjects were less likely
to rely on the aggregate information.
5. Alternative task construals
Theorists who resist interpreting the gap between norma-
tive and descriptive models as indicating human irrational-
ity have one more strategy available in addition to those pre-
viously described. In the context of empirical cognitive
psychology, it is a commonplace argument, but it is one that
continues to create enormous controversy and to bedevil
efforts to compare human performance to normative stan-
dards. It is the argument that although the experimenter
may well be applying the correct normative model to the
problem as set, the subject might be construing the problem
differently and be providing the normatively appropriate an-
swer to a different problem – in short, that subjects have a
different interpretation of the task (see, e.g., Adler 1984;
1991; Broome 1990; Henle 1962; Hilton 1995; Levinson
1995; Margolis 1987; Schick 1987; 1997; Schwarz 1996).
Such an argument is somewhat different from any of the
critiques examined thus far. It is not the equivalent of posit-
ing that a performance error has been made, because per-
formance errors (attention lapses, etc.) – being transitory
and random – would not be expected to recur in exactly the
same way in a readministration of the same task. Whereas,
if the subject has truly misunderstood the task, they would
be expected to do so again on an identical readministration
of the task.
Correspondingly, this criticism is different from the ar-
gument that the task exceeds the computational capacity of
the subject. The latter explanation locates the cause of the
suboptimal performance within the subject. In contrast, the
alternative task construal argument places the blame at
least somewhat on the shoulders of the experimenter for
Stanovich & West: Individual differences in reasoning
654 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
failing to realize that there were task features that might
lead subjects to frame the problem in a manner different
from that intended.
7
As with incorrect norm application, the alternative con-
strual argument locates the problem with the experimenter.
However, it is different in that in the wrong norm explana-
tion it is assumed that the subject is interpreting the task as
the experimenter intended – but the experimenter is not
using the right criteria to evaluate performance. In contrast,
the alternative task construal argument allows that the ex-
perimenter may be applying the correct normative model
to the problem the experimenter intends the subject to
solve – but posits that the subject has construed the prob-
lem in some other way and is providing a normatively ap-
propriate answer to a different problem.
It seems that in order to comprehensively evaluate the
rationality of human cognition it will be necessary to evalu-
ate the appropriateness of various task construals. This is
because – contrary to thin theories of means/ends rational-
ity that avoid evaluating the subject’s task construal (Elster
1983; Nathanson 1994) – it will be argued here that if we
are going to have any normative standards at all, then we
must also have standards for what are appropriate and in-
appropriate task construals. In the remainder of this sec-
tion, we will sketch the arguments of philosophers and de-
cision scientists who have made just this point. Then it will
be argued that:
1. In order to tackle the difficult problem of evaluating
task construals, criteria of wide reflective equilibrium come
into play;
2. It will be necessary to use all descriptive information
about human performance that could potentially affect ex-
pert wide reflective equilibrium;
3. Included in the relevant descriptive facts are individ-
ual differences in task construal and their patterns of co-
variance. This argument will again make use of the under-
standing/acceptance principle of Slovic and Tversky (1974)
discussed in section 4.2.
5.1. The necessity of principles of rational construal
It is now widely recognized that the evaluation of the nor-
mative appropriateness of a response to a particular task is
always relative to a particular interpretation of the task. For
example, Schick (1987) argues that “how rationality directs
us to choose depends on which understandings are ours . . .
[and that] the understandings people have bear on the
question of what would be rational for them” (pp. 53 and
58). Likewise, Tversky (1975) argued that “the question of
whether utility theory is compatible with the data or not,
therefore, depends critically on the interpretation of the
consequences” (p. 171).
However, others have pointed to the danger inherent in
too permissively explaining away nonnormative responses
by positing different construals of the problem. Normative
theories will be drained of all of their evaluative force if we
adopt an attitude that is too charitable toward alternative
construals. Broome (1990) illustrates the problem by dis-
cussing the preference reversal phenomenon (Lichtenstein
& Slovic 1971; Slovic 1995). In a choice between two gam-
bles, A and B, a person chooses A over B. However, when
pricing the gambles, the person puts a higher price on B.
This violation of procedural invariance leads to what ap-
pears to be intransitivity. Presumably there is an amount of
money, M, that would be preferred to A but given a choice
of M and B the person would choose B. Thus, we appear to
have B . M, M . A, A . B. Broome (1990) points out that
when choosing A over B the subject is choosing A and is si-
multaneously rejecting B. Evaluating A in the M versus A
comparison is not the same. Here, when choosing A, the
subject is not rejecting B. The A alternative here might be
considered to be a different prospect (call it A9), and if it is
so considered there is no intransitivity (B . M, M . A9, A
. B). Broome (1990) argues that whenever the basic ax-
ioms such as transitivity, independence, or descriptive or
procedural invariance are breached, the same inoculating
strategy could be invoked – that of individuating outcomes
so finely that the violation disappears.
Broome’s (1990) point is that the thinner the categories
we use to individuate outcomes, the harder it will be to at-
tribute irrationality to a set of preferences if we evaluate ra-
tionality only in instrumental terms. He argues that we
need, in addition to the formal principles of rationality,
principles that deal with content so as to enable us to eval-
uate the reasonableness of a particular individuation of out-
comes. Broome (1990) acknowledges that “this procedure
puts principles of rationality to work at a very early stage of
decision theory. They are needed in fixing the set of alter-
native prospects that preferences can then be defined
upon. The principles in question might be called “rational
principles of indifference” (p. 140). Broome (1990) admits
that
many people think there can be no principles of rationality
apart from the formal ones. This goes along with the common
view that rationality can only be instrumental . . . [however] if
you acknowledge only formal principles of rationality, and deny
that there are any principles of indifference, you will find your-
self without any principles of rationality at all. (pp. 140
41)
Broome cites Tversky (1975) as concurring in this view:
I believe that an adequate analysis of rational choice cannot ac-
cept the evaluation of the consequences as given, and examine
only the consistency of preferences. There is probably as much
irrationality in our feelings, as expressed in the way we evalu-
ate consequences, as there is in our choice of actions. An ade-
quate normative analysis must deal with problems such as the
legitimacy of regret in Allais’ problem.... I do not see how the
normative appeal of the axioms could be discussed without a
reference to a specific interpretation. (Tversky 1975, p. 172)
Others agree with the Broome/Tversky analysis (see Baron
1993; 1994; Frisch 1994; Schick 1997). Although there is
some support for Broome’s generic argument, the con-
tentious disputes about rational principles of indifference
and rational construals of the tasks in the heuristics and bi-
ases literature (Adler 1984; 1991; Berkeley & Humphreys
1982; Cohen 1981; 1986; Gigerenzer 1993; 1996a; Hilton
1995; Jepson et al. 1983; Kahneman & Tversky 1983; 1996;
Lopes 1991; Nisbett 1981; Schwarz 1996) highlight the dif-
ficulties to be faced when attempting to evaluate specific
problem construals. For example, Margolis (1987) agrees
with Henle (1962) that the subjects’ nonnormative re-
sponses will almost always be logical responses to some
other problem representation. But unlike Henle (1962),
Margolis (1987) argues that many of these alternative task
construals are so bizarre – so far from what the very words
in the instructions said – that they represent serious cogni-
tive errors that deserve attention:
But in contrast to Henle and Cohen, the detailed conclusions I
draw strengthen rather than invalidate the basic claim of the ex-
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 655
perimenters. For although subjects can be – in fact, I try to
show, ordinarily are – giving reasonable responses to a different
question, the different question can be wildly irrelevant to any-
thing that plausibly could be construed as the meaning of the
question asked. The locus of the illusion is shifted, but the force
of the illusion is confirmed not invalidated or explained away.
(p. 141)
5.2. Evaluating principles of rational construal:The
understanding/acceptance assumption revisited
Given current arguments that principles of rational con-
strual are necessary for a full normative theory of human ra-
tionality (Broome 1990; Einhorn & Hogarth 1981; Junger-
mann 1986; Schick 1987; 1997; Shweder 1987; Tversky
1975), how are such principles to be derived? When search-
ing for principles of rational task construal the same mech-
anisms of justification used to assess principles of instru-
mental rationality will be available. Perhaps in some cases –
instances where the problem structure maps the world in an
unusually close and canonical way – problem construals
could be directly evaluated by how well they serve the deci-
sion makers in achieving their goals (Baron 1993; 1994). In
such cases, it might be possible to prove the superiority or
inferiority of certain construals by appeals to Dutch Book or
money pump arguments (de Finetti 1970/1990; Maher
1993; Osherson 1995; Resnik 1987; Skyrms 1986).
Also available will be the expert wide reflective equi-
librium view discussed by Stich and Nisbett (1980) (see
Stanovich 1999; Stein 1996). In contrast, Baron (1993;
1994) and Thagard (1982) argue that rather than any sort of
reflective equilibrium, what is needed here are “arguments
that an inferential system is optimal with respect to the cri-
teria discussed” (Thagard 1982, p. 40). But in the area of
task construal, finding optimization of criteria may be un-
likely – there will be few money pumps or Dutch Books to
point the way. If in the area of task construal there will be
few money pumps or Dutch Books to prove that a particu-
lar task interpretation has disastrous consequences, then
the field will be again thrust back upon the debate that Tha-
gard (1982) calls “the argument between the populists and
the elitists.” But as argued before, this is really a misnomer.
There are few controversial tasks in the heuristics and bi-
ases literature where all untutored laypersons interpret
tasks differently from the experts who designed them. The
issue is not the untutored average person versus experts,
but experts plus some laypersons versus other untutored in-
dividuals. The cognitive characteristics of those departing
from the expert construal might – for reasons parallel to
those argued in section 4 – have implications for how we
evaluate particular task interpretations. It is argued here
that Slovic and Tversky’s (1974) assumption (“the deeper
the understanding of the axiom, the greater the readiness
to accept it” pp. 372 – 73) can again be used as a tool to con-
dition the expert reflective equilibrium regarding princi-
ples of rational task construal.
Framing effects are ideal vehicles for demonstrating how
the understanding/acceptance principle might be utilized.
First, it has already been shown that there are consistent in-
dividual differences across a variety of framing problems
(Frisch 1993). Second, framing problems have engendered
much dispute regarding issues of appropriate task con-
strual. The Disease Problem of Tversky and Kahneman
(1981) has been the subject of much contention:
Problem 1. Imagine that the United States is preparing
for the outbreak of an unusual disease, which is expected to
kill 600 people. Two alternative programs to combat the dis-
ease have been proposed. Assume that the exact scientific
estimates of the consequences of the programs are as fol-
lows: If Program A is adopted, 200 people will be saved. If
Program B is adopted, there is a one-third probability that
600 people will be saved and a two-thirds probability that
no people will be saved. Which of the two programs would
you favor, Program A or Program B?
Problem 2. Imagine that the United States is preparing
for the outbreak of an unusual disease, which is expected to
kill 600 people. Two alternative programs to combat the dis-
ease have been proposed. Assume that the exact scientific
estimates of the consequences of the programs are as fol-
lows: If Program C is adopted, 400 people will die. If Pro-
gram D is adopted, there is a one-third probability that no-
body will die and a two-thirds probability that 600 people
will die. Which of the two programs would you favor, Pro-
gram C or Program D?
Many subjects select alternatives A and D in these two
problems despite the fact that the two problems are re-
descriptions of each other and that Program A maps to
Program C rather than D. This response pattern violates
the assumption of descriptive invariance of utility theory.
However, Berkeley and Humphreys (1982) argue that Pro-
grams A and C might not be descriptively invariant in sub-
jects’ interpretations. They argue that the wording of the
outcome of Program A (“will be saved”) combined with the
fact that its outcome is seemingly not described in the ex-
haustive way as the consequences for Program B suggests
the possibility of human agency in the future which might
enable the saving of more lives (see Kuhberger 1995). The
wording of the outcome of Program C (“will die”) does not
suggest the possibility of future human agency working to
possibly save more lives (indeed, the possibility of losing a
few more might be inferred by some people). Under such
a construal of the problem, it is no longer nonnormative to
choose Programs A and D. Likewise, Macdonald (1986) ar-
gues that, regarding the “200 people will be saved” phrasing,
“it is unnatural to predict an exact number of cases” (p. 24)
and that “ordinary language reads ‘or more’ into the inter-
pretation of the statement” (p. 24; see also Jou et al. 1996).
However, consistent with the finding that being forced to
provide a rationale or take more time reduces framing ef-
fects (e.g., Larrick et al. 1992; Sieck & Yates 1997; Take-
mura 1994) and that people higher in need for cognition
displayed reduced framing effects (Smith & Levin 1996), in
our within-subjects study of framing effects on the Disease
Problem (Stanovich & West 1998b), we found that subjects
giving a consistent response to both descriptions of the
problem – who were actually the majority in our within-
subjects experiment – were significantly higher in cognitive
ability than those subjects displaying a framing effect. Thus,
the results of studies investigating the effects of giving a ra-
tionale, taking more time, associations with cognitive en-
gagement, and associations with cognitive ability are all
consistent in suggesting that the response dictated by the
construal of the problem originally favored by Tversky and
Kahneman (1981) should be considered the correct re-
sponse because it is endorsed even by untutored subjects as
long as they are cognitively engaged with the problem, had
enough time to process the information, and had the cog-
nitive ability to fully process the information.
8
Stanovich & West: Individual differences in reasoning
656 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
Perhaps no finding in the heuristics and biases literature
has been the subject of as much criticism as Tversky and
Kahneman’s (1983) claim to have demonstrated a conjunc-
tion fallacy in probabilistic reasoning. Most of the criticisms
have focused on the issue of differential task construal, and
several critics have argued that there are alternative con-
struals of the tasks that are, if anything, more rational than
that which Tversky and Kahneman (1983) regard as nor-
mative for examples such as the well-known Linda Problem:
Linda is 31 years old, single, outspoken, and very bright.
She majored in philosophy. As a student, she was deeply
concerned with issues of discrimination and social justice,
and also participated in antinuclear demonstrations. Please
rank the following statements by their probability, using 1
for the most probable and 8 for the least probable.
a. Linda is a teacher in an elementary school
b. Linda works in a bookstore and takes Yoga classes
c. Linda is active in the feminist movement
d. Linda is a psychiatric social worker
e. Linda is a member of the League of Women Voters
f. Linda is a bank teller
g. Linda is an insurance salesperson
h. Linda is a bank teller and is active in the feminist
movement
Because alternative h is the conjunction of alternatives c
and f, the probability of h cannot be higher than that of ei-
ther c or f, yet 85% of the subjects in Tversky and Kahne-
man’s (1983) study rated alternative h as more probable
than f. What concerns us here is the argument that there
are subtle linguistic and pragmatic features of the problem
which lead subjects to evaluate alternatives different from
those listed. For example, Hilton (1995) argues that under
the assumption that the detailed information given about
the target means that the experimenter knows a consider-
able amount about Linda, then it is reasonable to think that
the phrase “Linda is a bank teller” does not contain the
phrase “and is not active in the feminist movement” be-
cause the experimenter already knows this to be the case.
If “Linda is a bank teller” is interpreted in this way, then rat-
ing h as more probable than f no longer represents a con-
junction fallacy.
Similarly, Morier and Borgida (1984) point out that the
presence of the unusual conjunction “Linda is a bank teller
and is active in the feminist movement” itself might prompt
an interpretation of “Linda is a bank teller” as “Linda is a
bank teller and is not active in the feminist movement.” Ac-
tually, Tversky and Kahneman (1983) themselves had con-
cerns about such an interpretation of the “Linda is a bank
teller” alternative and ran a condition in which this alterna-
tive was rephrased as “Linda is a bank teller whether or not
she is active in the feminist movement.” They found that
the conjunction fallacy was reduced from 85% of their sam-
ple to 57% when this alternative was used. Several other in-
vestigators have suggested that pragmatic inferences lead
to seeming violations of the logic of probability theory in the
Linda Problem
9
(see Adler 1991; Dulany & Hilton 1991;
Levinson 1995; Macdonald & Gilhooly 1990; Politzer &
Noveck 1991; Slugoski & Wilson 1998). These criticisms all
share the implication that actually committing the conjunc-
tion fallacy is a rational response to an alternative construal
of the different statements about Linda.
Assuming that those committing the so-called conjunc-
tion fallacy are making the pragmatic interpretation and
that those avoiding the fallacy are making the interpretation
that the investigators intended, we examined whether the
subjects making the pragmatic interpretation were subjects
who were disproportionately the subjects of higher cogni-
tive ability. Because this group is in fact the majority in most
studies – and because the use of such pragmatic cues and
background knowledge is often interpreted as reflecting
adaptive information processing (e.g., Hilton 1995) – it
might be expected that these individuals would be the sub-
jects of higher cognitive ability.
In our study (Stanovich & West 1998b), we examined the
performance of 150 subjects on the Linda Problem pre-
sented above. Consistent with the results of previous ex-
periments on this problem (Tversky & Kahneman 1983),
80.7% of our sample displayed the conjunction effect – they
rated the feminist bank teller alternative as more probable
than the bank teller alternative. The mean SAT score of the
121 subjects who committed the conjunction fallacy was 82
points lower than the mean score of the 29 who avoided the
fallacy. This difference was highly significant and it trans-
lated into an effect size of .746, which Rosenthal and Ros-
now (1991, p. 446) classify as “large.”
Tversky and Kahneman (1983) and Reeves and Lockhart
(1993) have demonstrated that the incidence of the con-
junction fallacy can be decreased if the problem describes
the event categories in some finite population or if the
problem is presented in a frequentist manner (see also
Fiedler 1988; Gigerenzer 1991b; 1993). We have replicated
this well-known finding, but we have also found that fre-
quentist representations of these problems markedly re-
duce – if not eliminate – cognitive ability differences (Stan-
ovich & West 1998b).
Another problem that has spawned many arguments
about alternative construals is Wason’s (1966) selection
task. Performance on abstract versions of the selection task
is extremely low (see Evans et al. 1993). Typically, less than
10% of subjects make the correct selections of the A card
(P) and 7 card (not-Q). The most common incorrect choices
made by subjects are the A card and the 3 card (P and Q)
or the selection of the A card only (P). The preponderance
of P and Q responses has most often been attributed to a
so-called matching bias that is automatically triggered by
surface-level relevance cues (Evans 1996; Evans & Lynch
1973), but some investigators have championed an expla-
nation based on an alternative task construal. For example,
Oaksford and Chater (1994; 1996; see also Nickerson 1996)
argue that rather than interpreting the task as one of de-
ductive reasoning (as the experimenter intends), many sub-
jects interpret it as an inductive problem of probabilistic hy-
pothesis testing. They show that the P and Q response is
expected under a formal Bayesian analysis which assumes
such an interpretation in addition to optimal data selection.
We have examined individual differences in responding
on a variety of abstract and deontic selection task problems
(Stanovich & West 1998a; 1998c). Typical results are dis-
played in Table 2. The table presents the mean Scholastic
Assessment Test scores of subjects responding correctly (as
traditionally interpreted – with the responses P and not-Q)
on various versions of selection task problems. One was a
commonly used nondeontic problem with content, the so-
called Destination Problem (e.g., Manktelow & Evans
1979). Replicating previous research, few subjects re-
sponded correctly on this problem. However, those that did
had significantly higher Scholastic Assessment Test scores
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 657
than those that did not and the difference was quite large
in magnitude (effect size of .815). Also presented in the
table are two well-known problems (Dominowski 1995;
Griggs 1983; Griggs & Cox 1982; 1983; Newstead & Evans
1995) with deontic rules (reasoning about rules used to
guide human behavior – about what “ought to” or “must”
be done, see Manktelow & Over 1991) – the Drinking-Age
Problem (If a person is drinking beer then the person must
be over 21 years of age.) and the Sears Problem (Any sale
over $30 must be approved by the section manager, Mr.
Jones.). Both are known to facilitate performance and this
effect is clearly replicated in the data presented in Table 2.
However, it is also clear that the differences in cognitive
ability are much less in these two problems. The effect size
is reduced from .815 to .347 in the case of the Drinking-Age
Problem and it fails to even reach statistical significance in
the case of the Sears Problem (effect size of .088). The bot-
tom half of the table indicates that exactly the same pattern
was apparent when the P and not-Q responders were com-
pared only with the P and Q responders on the Destination
Problem – the latter being the response that is most con-
sistent with an inductive construal of the problem (see
Nickerson 1996; Oaksford & Chater 1994; 1996).
Thus, on the selection task, it appears that cognitive abil-
ity differences are strong in cases where there is a dispute
about the proper construal of the task (in nondeontic tasks).
In cases where there is little controversy about alternative
construals – the deontic rules of the Drinking-Age and
Sears problems – cognitive ability differences are markedly
attenuated. This pattern – cognitive ability differences large
on problems where there is contentious dispute regarding
the appropriate construal and cognitive ability differences
small when there is no dispute about task construal – is mir-
rored in our results on the conjunction effect and framing
effect (Stanovich & West 1998b).
6. Dual process theories and alternative
task construals
The sampling of results just presented (for other examples,
see Stanovich 1999) has demonstrated that the responses
associated with alternative construals of a well-known fram-
ing problem (the Disease Problem), of the Linda Problem,
and of the nondeontic selection task were consistently as-
sociated with lower cognitive ability. How might we inter-
pret this consistent pattern displayed on three tasks from
the heuristics and biases literature where alternative task
construals have been championed?
One possible interpretation of this pattern is in terms of
two-process theories of reasoning (Epstein 1994; Evans
1984; 1996; Evans & Over 1996; Sloman 1996). A summary
of the generic properties distinguished by several two-
process views is presented in Table 3. Although the details
and technical properties of these dual-process theories do
not always match exactly, nevertheless there are clear fam-
ily resemblances (for discussions, see Evans & Over 1996;
Gigerenzer & Regier 1996; Sloman 1996). In order to em-
phasize the prototypical view that is adopted here, the two
systems have simply been generically labeled System 1 and
System 2.
The key differences in the properties of the two systems
are listed next. System 1 is characterized as automatic,
largely unconscious, and relatively undemanding of com-
putational capacity. Thus, it conjoins properties of auto-
maticity and heuristic processing as these constructs have
been variously discussed in the literature. These properties
characterize what Levinson (1995) has termed interactional
intelligence – a system composed of the mechanisms that
support a Gricean theory of communication that relies on
intention-attribution. This system has as its goal the ability
to model other minds in order to read intention and to make
rapid interactional moves based on those modeled inten-
tions. System 2 conjoins the various characteristics that
have been viewed as typifying controlled processing. Sys-
tem 2 encompasses the processes of analytic intelligence
that have traditionally been studied by information pro-
cessing theorists trying to uncover the computational com-
ponents underlying intelligence.
For the purposes of the present discussion, the most im-
portant difference between the two systems is that they
tend to lead to different types of task construals. Constru-
als triggered by System 1 are highly contextualized, per-
Stanovich & West: Individual differences in reasoning
658 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
Table 2. Mean Scholastic Assessment Test (SAT) total scores of subjects who gave the correct and incorrect responses to three different
selection task problems (numbers in parentheses are the number of subjects)
P & not-Q
Incorrect (Correct) t value Effect size
a
Nondeontic problem:
Destination Problem 1,187 (197) 1,270 (17) 3.21*** .815
Deontic Problems:
Drinking-Age Problem 1,170 (72) 1,206 (143) 2.39** .347
Sears Problem 1,189 (87) 1,198 (127) 0.63 .088
P & Q P & not-Q t value Effect size
a
Nondeontic Problem:
Destination Problem 1,195 (97) 1,270 (17) 3.06*** .812
Note: df 5 212 for the Destination and Sears Problems and 213 for the Drinking-Age Problem; df 5 112 for the P&Q comparison on
the Destination Problem
* 5 p , .05, ** 5 p , .025, *** 5 p , .01, all two-tailed
a
Cohen’s d
sonalized, and socialized. They are driven by considerations
of relevance and are aimed at inferring intentionality by the
use of conversational implicature even in situations that are
devoid of conversational features (see Margolis 1987). The
primacy of these mechanisms leads to what has been
termed the fundamental computational bias in human cog-
nition (Stanovich 1999) – the tendency toward automatic
contextualization of problems. In contrast, System 2’s more
controlled processes serve to decontextualize and deper-
sonalize problems. This system is more adept at represent-
ing in terms of rules and underlying principles. It can deal
with problems without social content and is not dominated
by the goal of attributing intentionality nor by the search for
conversational relevance.
Using the distinction between System 1 and System 2
processing, it is conjectured here that in order to observe
large cognitive ability differences in a problem situation,
the two systems must strongly cue different responses.
10
It
is not enough simply that both systems are engaged. If both
cue the same response (as in deontic selection task prob-
lems), then this could have the effect of severely diluting
any differences in cognitive ability. One reason that this out-
come is predicted is that it is assumed that individual dif-
ferences in System 1 processes (interactional intelligence)
bear little relation to individual differences in System 2 pro-
cesses (analytic intelligence). This is a conjecture for which
there is a modest amount of evidence. Reber (1993) has
shown preconscious processes to have low variability and to
show little relation to analytic intelligence (see Jones & Day
1997; McGeorge et al. 1997; Reber et al. 1991).
In contrast, if the two systems cue opposite responses,
rule-based System 2 will tend to differentially cue those of
high analytic intelligence and this tendency will not be di-
luted by System 1 (the associative system) nondifferentially
drawing subjects to the same response. For example, the
Linda Problem maximizes the tendency for the two systems
to prime different responses and this problem produced a
large difference in cognitive ability. Similarly, in nondeon-
tic selection tasks there is ample opportunity for the two
systems to cue different responses. A deductive interpreta-
tion conjoined with an exhaustive search for falsifying in-
stances yields the response P and not-Q. This interpretation
and processing style is likely associated with the rule-based
System 2 – individual differences which underlie the psy-
chometric concept of analytic intelligence. In contrast,
within the heuristic-analytic framework of Evans (1984;
1989; 1996), the matching response of P and Q reflects the
heuristic processing of System 1 (in Evans’ theory, a lin-
guistically cued relevance response).
In deontic problems, both deontic and rule-based logics
are cuing construals of the problem that dictate the same
response (P and not-Q). Whatever is one’s theory of re-
sponding in deontic tasks – preconscious relevance judg-
ments, pragmatic schemas, or Darwinian algorithms (e.g.,
Cheng & Holyoak 1989; Cosmides 1989; Cummins 1996;
Evans 1996) – the mechanisms triggering the correct re-
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 659
Table 3. The terms for the two systems used by a variety of theorists and the properties of dual-process theories of reasoning
System 1 System 2
Dual-Process Theories:
Sloman (1996) associative system rule-based system
Evans (1984;1989) heuristic processing analytic processing
Evans & Over (1996) tacit thought processes explicit thought processes
Reber (1993) implicit cognition explicit learning
Levinson (1995) interactional intelligence analytic intelligence
Epstein (1994) experiential system rational system
Pollock (1991) quick and inflexible modules intellection
Hammond (1996) intuitive cognition analytical cognition
Klein (1998) recognition-primed decisions rational choice strategy
Johnson-Laird (1983) implicit inferences explicit inferences
Shiffrin & Schneider (1977) automatic processing controlled processing
Posner & Snyder (1975) automatic activation conscious processing system
Properties: associative rule-based
holistic analytic
automatic controlled
relatively undemanding of demanding of
cognitive capacity cognitive capacity
relatively fast relatively slow
acquisition by biology, acquisition by cultural
exposure, and and formal tuition
personal experience
Task Construal highly contextualized decontextualized
personalized depersonalized
conversational and socialized asocial
Type of Intelligence interactional analytic (psychometric IQ)
Indexed: (conversational implicature)
sponse resemble heuristic or modular structures that fall
within the domain of System 1. These structures are un-
likely to be strongly associated with analytic intelligence
(Cummins 1996; Levinson 1995; McGeorge et al. 1997; Re-
ber 1993; Reber et al. 1991), and hence they operate to
draw subjects of both high and low analytic intelligence to
the same response dictated by the rule-based system – thus
serving to dilute cognitive ability differences between cor-
rect and incorrect responders (see Stanovich & West 1998a
for a data simulation).
6.1. Alternative construals: Evolutionary optimization
versus normative rationality
The sampling of experimental results reviewed here (see
Stanovich 1999 for further examples) has demonstrated
that the response dictated by the construal of the inventors
of the Linda Problem (Tversky & Kahneman 1983), Dis-
ease Problem (Tversky & Kahneman 1981), and selection
task (Wason 1966) is the response favored by subjects of
high analytic intelligence. The alternative responses dic-
tated by the construals favored by the critics of the heuris-
tics and biases literature were the choices of the subjects of
lower analytic intelligence. In this section we will explore
the possibility that these alternative construals may have
been triggered by heuristics that make evolutionary sense,
but that subjects higher in a more flexible type of analytic
intelligence (and those more cognitively engaged, see
Smith & Levin 1996) are more prone to follow normative
rules that maximize personal utility. In a very restricted
sense, such a pattern might be said to have relevance for the
concept of rational task construal.
The argument depends on the distinction between evo-
lutionary adaptation and instrumental rationality (utility
maximization given goals and beliefs). The key point is that
for the latter (variously termed practical, pragmatic, or
means/ends rationality), maximization is at the level of the
individual person. Adaptive optimization in the former case
is at the level of the genes. In Dawkins’s (1976; 1982) terms,
evolutionary adaptation concerns optimization processes
relevant to the so-called replicators (the genes), whereas in-
strumental rationality concerns utility maximization for the
so-called vehicle (or interactor, to use Hull’s 1982 term),
which houses the genes. Anderson (1990; 1991) emphasizes
this distinction in his treatment of adaptationist models in
psychology. In his advocacy of such models, Anderson
(1990; 1991) eschews Dennett’s (1987) assumption of per-
fect rationality in the instrumental sense (hereafter termed
normative rationality) for the somewhat different assump-
tion of evolutionary optimization (i.e., evolution as a local
fitness maximizer). Anderson (1990) accepts Stich’s (see
also Cooper 1989; Skyrms 1996) argument that evolution-
ary adaptation (hereafter termed evolutionary rationality)
11
does not guarantee perfect human rationality in the nor-
mative sense:
Rationality in the adaptive sense, which is used here, is not ra-
tionality in the normative sense that is used in studies of deci-
sion making and social judgment. . . . It is possible that humans
are rational in the adaptive sense in the domains of cognition
studied here but not in decision making and social judgment.
(1990, p. 31)
Thus, Anderson (1991) acknowledges that there may be ar-
guments for “optimizing money, the happiness of oneself
and others, or any other goal. It is just that these goals do
not produce optimization of the species” (pp. 51011). As
a result, a descriptive model of processing that is adaptively
optimal could well deviate substantially from a normative
model. This is because Anderson’s (1990; 1991) adaptation
assumption is that cognition is optimally adapted in an evo-
lutionary sense – and this is not the same as positing that
human cognitive activity will result in normatively appro-
priate responses.
Such a view can encompass both the impressive record
of descriptive accuracy enjoyed by a variety of adaptation-
ist models (Anderson 1990; 1991; Oaksford & Chater 1994;
1996; 1998) as well as the fact that cognitive ability some-
times dissociates from the response deemed optimal on an
adaptationist analysis (Stanovich & West 1998a). As dis-
cussed above, Oaksford and Chater (1994) have had con-
siderable success in modeling the nondeontic selection task
as an inductive problem in which optimal data selection is
assumed (see also Oaksford et al. 1997). Their model pre-
dicts the modal response of P and Q and the corresponding
dearth of P and not-Q choosers. Similarly, Anderson (1990,
p. 15760) models the 2 3 2 contingency assessment ex-
periment using a model of optimally adapted information
processing and shows how it can predict the much-repli-
cated finding that the D cell (cause absent and effect ab-
sent) is vastly underweighted (see also Friedrich 1993;
Klayman & Ha 1987). Finally, a host of investigators (Adler
1984; 1991; Dulany & Hilton 1991; Hilton 1995; Levinson
1995) have stressed how a model of rational conversational
implicature predicts that violating the conjunction rule in
the Linda Problem reflects the adaptive properties of in-
teractional intelligence.
Yet in all three of these cases – despite the fact that the
adaptationist models predict the modal response quite
well – individual differences analyses demonstrate associa-
tions that also must be accounted for. Correct responders
on the nondeontic selection task (P and not-Q choosers –
not those choosing P and Q) are higher in cognitive ability.
In the 2 3 2 covariation detection experiment, it is those
subjects weighting cell D more equally (not those under-
weighting the cell in the way that the adaptationist model
dictates) who are higher in cognitive ability and who tend
to respond normatively on other tasks (Stanovich & West
1998d). Finally, despite conversational implicatures indi-
cating the opposite, individuals of higher cognitive ability
disproportionately tend to adhere to the conjunction rule.
These patterns make sense if it is assumed that the two sys-
tems of processing are optimized for different situations
and different goals and that these data patterns reflect the
greater probability that the analytic intelligence of System
2 will override the interactional intelligence of System 1 in
individuals of higher cognitive ability.
In summary, the biases introduced by System 1 heuristic
processing may well be universal – because the computa-
tional biases inherent in this system are ubiquitous and
shared by all humans. However, it does not necessarily fol-
low that errors on tasks from the heuristics and biases liter-
ature will be universal (we have known for some time that
they are not). This is because, for some individuals, System
2 processes operating in parallel (see Evans & Over 1996)
will have the requisite computational power (or a low
enough threshold) to override the response primed by Sys-
tem 1.
It is hypothesized that the features of System 1 are de-
signed to very closely track increases in the reproduction
Stanovich & West: Individual differences in reasoning
660 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
probability of genes. System 2, while also clearly an evolu-
tionary product, is also primarily a control system focused
on the interests of the whole person. It is the primary max-
imizer of an individual’s personal utility.
12
Maximizing the
latter will occasionally result in sacrificing genetic fitness
(Barkow 1989; Cooper 1989; Skyrms 1996). Because Sys-
tem 2 is more attuned to normative rationality than is Sys-
tem 1, System 2 will seek to fulfill the individual’s goals in
the minority of cases where those goals conflict with the re-
sponses triggered by System 1.
It is proposed that just such conflicts are occurring in
three of the tasks discussed previously (the Disease Problem,
the Linda Problem, and the selection task). This conjecture
is supported by the fact that evolutionary rationality has
been conjoined with Gricean principles of conversational
implicature by several theorists (Gigerenzer 1996b; Hilton
1995; Levinson 1995) who emphasize the principle of “con-
versationally rational interpretation” (Hilton 1995, p. 265).
According to this view, the pragmatic heuristics are not sim-
ply inferior substitutes for computationally costly logical
mechanisms that would work better. Instead, the heuristics
are optimally designed to solve an evolutionary problem in
another domain – attributing intentions to conspecifics and
coordinating mutual intersubjectivity so as to optimally ne-
gotiate cooperative behavior (Cummins 1996; Levinson
1995; Skyrms 1996).
It must be stressed though that in the vast majority of
mundane situations, the evolutionary rationality embodied
in System 1 processes will also serve the goals of normative
rationality. Our automatic, System 1 processes for accu-
rately navigating around objects in the natural world were
adaptive in an evolutionary sense, and they likewise serve
our personal goals as we carry out our lives in the modern
world (i.e., navigational abilities are an evolutionary adap-
tation that serve the instrumental goals of the vehicle as
well).
One way to view the difference between what we have
termed here evolutionary and normative rationality is to
note that they are not really two different types of rational-
ity (see Oaksford & Chater 1998, pp. 29197) but are in-
stead terms for characterizing optimization procedures op-
erating at the subpersonal and personal levels, respectively.
That there are two optimization procedures in operation
here that could come into conflict is a consequence of the
insight that the genes – as subpersonal replicators – can in-
crease their fecundity and longevity in ways that do not nec-
essarily serve the instrumental goals of the vehicles built by
the genome (Cooper 1989; Skyrms 1996).
Skyrms (1996) devotes an entire book on evolutionary
game theory to showing that the idea that “natural selection
will weed out irrationality” (p. x) is false because optimiza-
tion at the subpersonal replicator level is not coextensive
with the optimization of the instrumental goals of the
vehicle (i.e., normative rationality). Gigerenzer (1996b)
provides an example by pointing out that neither rats nor
humans maximize utility in probabilistic contingency ex-
periments. Instead of responding by choosing the most
probable alternative on every trial, subjects alternate in a
manner that matches the probabilities of the stimulus al-
ternatives. This behavior violates normative strictures on
utility maximization, but Gigerenzer (1996b) demonstrates
how probability matching could actually be an evolutionar-
ily stable strategy (see Cooper 1989 and Skyrms 1996 for
many such examples).
Such examples led Skyrms (1996) to note that “when I
contrast the results of the evolutionary account with those
of rational decision theory, I am not criticizing the norma-
tive force of the latter. I am just emphasizing the fact that
the different questions asked by the two traditions may have
different answers” (p. xi). Skyrms’s (1996) book articulates
the environmental and population parameters under which
“rational choice theory completely parts ways with evolu-
tionary theory” (p. 106; see also Cooper 1989). Cognitive
mechanisms that were fitness enhancing might well thwart
our goals as personal agents in an industrial society (see
Baron 1998) because the assumption that our cognitive
mechanisms are adapted in the evolutionary sense (Pinker
1997) does not entail normative rationality. Thus, situations
where evolutionary and normative rationality dissociate
might well put the two processing Systems in partial con-
flict with each other. These conflicts may be rare, but the
few occasions on which they occur might be important
ones. This is because knowledge-based, technological soci-
eties often put a premium on abstraction and decontext-
ualization, and they sometimes require that the fundamen-
tal computational bias of human cognition be overridden by
System 2 processes.
6.2. The fundamental computational bias
and task interpretation
The fundamental computational bias, that “specific fea-
tures of problem content, and their semantic associations,
constitute the dominant influence on thought” (Evans et al.
1983, p. 295; Stanovich 1999), is no doubt rational in the
evolutionary sense. Selection pressure was probably in the
direction of radical contextualization. An organism that
could bring more relevant information to bear (not forget-
ting the frame problem) on the puzzles of life probably
dealt with the world better than competitors and thus re-
produced with greater frequency and contributed more of
its genes to future generations.
Evans and Over (1996) argue that an overemphasis on
normative rationality has led us to overlook the adaptive-
ness of contextualization and the nonoptimality of always
decoupling prior beliefs from problem situations (“beliefs
that have served us well are not lightly to be abandoned,”
p. 114). Their argument here parallels the reasons that phi-
losophy of science has moved beyond naive falsificationism
(see Howson & Urbach 1993). Scientists do not abandon a
richly confirmed and well integrated theory at the first lit-
tle bit of falsifying evidence, because abandoning the the-
ory might actually decrease explanatory coherence (Tha-
gard 1992). Similarly, Evans and Over (1996) argue that
beliefs that have served us well in the past should be hard
to dislodge, and projecting them on to new informa-
tion – because of their past efficacy – might actually help in
assimilating the new information.
Evans and Over (1996) note the mundane but telling fact
that when scanning a room for a particular shape, our visual
systems register color as well. They argue that we do not im-
pute irrationality to our visual systems because they fail to
screen out the information that is not focal. Our systems of
recruiting prior knowledge and contextual information to
solve problems with formal solutions are probably likewise
adaptive in the evolutionary sense. However, Evans and
Over (1996) do note that there is an important disanalogy
here as well, because studies of belief bias in syllogistic rea-
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 661
soning have shown that “subjects can to some extent ignore
belief and reason from a limited number of assumptions
when instructed to do so” (p. 117). That is, in the case of
reasoning – as opposed to the visual domain – some people
do have the cognitive flexibility to decouple unneeded sys-
tems of knowledge and some do not.
The studies reviewed here indicate that those who do
have the requisite flexibility are somewhat higher in cogni-
tive ability and in actively open-minded thinking (see
Stanovich & West 1997). These styles and skills are largely
System 2, not System 1, processes. Thus, the heuristics trig-
gering alternative task construals in the various problems
considered here may well be the adaptive evolutionary
products embodied in System 1 as Levinson (1995) and oth-
ers argue. Nevertheless, many of our personal goals may
have become detached from their evolutionary context (see
Barkow 1989). As Morton (1997) aptly puts it: “We can and
do find ways to benefit from the pleasures that our genes
have arranged for us without doing anything to help the
genes themselves. Contraception is probably the most ob-
vious example, but there are many others. Our genes want
us to be able to reason, but they have no interest in our en-
joying chess” (p. 106).
Thus, we seek “not evolution’s end of reproductive suc-
cess but evolution’s means, love-making. The point of this
example is that some human psychological traits may, at
least in our current environment, be fitness-reducing” (see
Barkow 1989, p. 296). And if the latter are pleasurable, an-
alytic intelligence achieves normative rationality by pursu-
ing them – not the adaptive goals of our genes. This is what
Larrick et al. (1993) argue when they speak of analytic in-
telligence as “the set of psychological properties that en-
ables a person to achieve his or her goals effectively. On this
view, intelligent people will be more likely to use rules of
choice that are effective in reaching their goals than will less
intelligent people” (p. 345).
Thus, high analytic intelligence may lead to task constru-
als that track normative rationality; whereas the alternative
construals of subjects low in analytic intelligence (and
hence more dominated by System 1 processing) might be
more likely to track evolutionary rationality in situations
that put the two types of rationality in conflict – as is con-
jectured to be the case with the problems discussed previ-
ously. If construals consistent with normative rationality are
more likely to satisfy our current individual goals (Baron
1993; 1994) than are construals determined by evolution-
ary rationality (which are construals determined by our
genes’ metaphorical goal – reproductive success), then it is
in this very restricted sense that individual difference rela-
tionships such as those illustrated here tell us which con-
struals are “best.”
6.3. The fundamental computational bias
and the ecology of the modern world
A conflict between the decontextualizing requirements of
normative rationality and the fundamental computational
bias may perhaps be one of the main reasons that norma-
tive and evolutionary rationality dissociate. The fundamen-
tal computational bias is meant to be a global term that cap-
tures the pervasive bias toward the contextualization of all
informational encounters. It conjoins the following pro-
cessing tendencies: (1) the tendency to adhere to Gricean
conversational principles even in situations that lack many
conversational features (Adler 1984; Hilton 1995); (2) the
tendency to contextualize a problem with as much prior
knowledge as is easily accessible, even when the problem is
formal and the only solution is a content-free rule (Evans
1982; 1989; Evans et al. 1983); (3) the tendency to see de-
sign and pattern in situations that are either undesigned,
unpatterned, or random (Levinson 1995); (4) the tendency
to reason enthymematically – to make assumptions not
stated in a problem and then reason from those assump-
tions (Henle 1962; Rescher 1988); (5) the tendency toward
a narrative mode of thought (Bruner 1986; 1990). All of
these properties conjoined represent a cognitive tendency
toward radical contextualization. The bias is termed funda-
mental because it is thought to stem largely from System 1
and that system is assumed to be primary in that it perme-
ates virtually all of our thinking (e.g., Evans & Over 1996).
If the properties of this system are not to be the dominant
factors in our thinking, then they must be overridden by
System 2 processes so that the latter can carry out one of
their important functions of abstracting complex situations
into canonical representations that are stripped of context.
Thus, it is likely that one computational task of System 2 is
to decouple (see Navon 1989a; 1989b) contextual features
automatically supplied by System 1 when they are poten-
tially interfering.
In short, one of the functions of System 2 is to serve as
an override system (see Pollock 1991) for some of the au-
tomatic and obligatory computational results provided by
System 1. This override function might only be needed in a
tiny minority of information processing situations (in most
cases, the two Systems will interact in concert), but they
may be unusually important ones. For example, numerous
theorists have warned about a possible mismatch between
the fundamental computational bias and the processing re-
quirements of many tasks in a technological society con-
taining many symbolic artifacts and often requiring skills of
abstraction (Adler 1984; 1991; Donaldson 1978; 1993).
Hilton (1995) warns that the default assumption that
Gricean conversational principles are operative may be
wrong for many technical settings because
many reasoning heuristics may have evolved because they are
adaptive in contexts of social interaction. For example, the ex-
pectation that errors of interpretation will be quickly repaired
may be correct when we are interacting with a human being but
incorrect when managing a complex system such as an aircraft,
a nuclear power plant, or an economy. The evolutionary adap-
tiveness of such an expectation to a conversational setting may
explain why people are so bad at dealing with lagged feedback
in other settings. (p. 267)
Concerns about the real-world implications of the failure to
engage in necessary cognitive abstraction (see Adler 1984)
were what led Luria (1976) to warn against minimizing the
importance of decontextualizing thinking styles. In dis-
cussing the syllogism, he notes that “a considerable pro-
portion of our intellectual operations involve such verbal
and logical systems; they comprise the basic network of
codes along which the connections in discursive human
thought are channeled” (p. 101). Likewise, regarding the
subtle distinctions on many decontextualized language
tasks, Olson (1986) has argued that “the distinctions on
which such questions are based are extremely important to
many forms of intellectual activity in a literate society. It is
easy to show that sensitivity to the subtleties of language are
crucial to some undertakings. A person who does not clearly
Stanovich & West: Individual differences in reasoning
662 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
see the difference between an expression of intention and
a promise or between a mistake and an accident, or be-
tween a falsehood and a lie, should avoid a legal career or,
for that matter, a theological one” (p. 341). Objective mea-
sures of the requirements for cognitive abstraction have
been increasing across most job categories in technological
societies throughout the past several decades (Gottfredson
1997). This is why measures of the ability to deal with ab-
straction remain the best employment predictor and the
best earnings predictor in postindustrial societies (Brody
1997; Gottfredson 1997; Hunt 1995).
Einhorn and Hogarth (1981) highlighted the importance
of decontextualized environments in their discussion of the
optimistic (Panglossian/Apologist) and pessimistic (Melior-
ist) views of the cognitive biases revealed in laboratory ex-
perimentation. They noted that “the most optimistic asserts
that biases are limited to laboratory situations which are un-
representative of the natural ecology” (p. 82), but they go
on to caution that “in a rapidly changing world it is unclear
what the relevant natural ecology will be. Thus, although
the laboratory may be an unfamiliar environment, lack of
ability to perform well in unfamiliar situations takes on
added importance” (p. 82). There is a caution in this com-
ment for critics of the abstract content of most laboratory
tasks and standardized tests. The issue is that, ironically, the
argument that the laboratory tasks and tests are not like
“real life” is becoming less and less true. “Life,” in fact, is
becoming more like the tests!
The cognitive ecologists have, nevertheless, contributed
greatly in the area of remediation methods for our cogni-
tive deficiencies (Brase et al. 1998; Cosmides & Tooby
1996; Fiedler 1988; Gigerenzer & Hoffrage 1995; Sedl-
meier 1997). Their approach is, however, somewhat differ-
ent from that of the Meliorists. The ecologists concentrate
on shaping the environment (changing the stimuli pre-
sented to subjects) so that the same evolutionarily adapted
mechanisms that fail the standard of normative rationality
under one framing of the problem give the normative re-
sponse under an alternative (e.g., frequentistic) version.
Their emphasis on environmental alteration provides a
much-needed counterpoint to the Meliorist emphasis on
cognitive change. The latter, with their emphasis on re-
forming human thinking, no doubt miss opportunities to
shape the environment so that it fits the representations
that our brains are best evolved to deal with. Investigators
framing cognition within a Meliorist perspective are often
blind to the fact that there may be remarkably efficient
mechanisms available in the brain – if only it was provided
with the right type of representation.
On the other hand, it is not always the case that the world
will let us deal with representations that are optimally
suited to our evolutionarily designed cognitive mecha-
nisms. For example, in a series of elegant experiments,
Gigerenzer et al. (1991) have shown how at least part of the
overconfidence effect in knowledge calibration studies is
due to the unrepresentative stimuli used in such experi-
ments – stimuli that do not match the subjects’ stored cue
validities, which are optimally tuned to the environment.
But there are many instances in real-life when we are sud-
denly placed in environments where the cue validities have
changed. Metacognitive awareness of such situations (a
System 2 activity) and strategies for suppressing incorrect
confidence judgments generated by the responses to cues
automatically generated by System 1 will be crucial here.
High school musicians who aspire to a career in music have
to recalibrate when they arrive at university and encounter
large numbers of talented musicians for the first time. If
they persist in their old confidence judgments they may not
change majors when they should. Many real-life situations
where accomplishment yields a new environment with even
more stringent performance requirements share this logic.
Each time we “ratchet up” in the competitive environment
of a capitalist economy we are in a situation just like the
overconfidence knowledge calibration experiments with
their unrepresentative materials (Frank & Cook 1995). It is
important to have learned System 2 strategies that will tem-
per one’s overconfidence in such situations (Koriat et al.
1980).
7. Individual differences and the normative/
descriptive gap
In our research program, we have attempted to demon-
strate that a consideration of individual differences in the
heuristics and biases literature may have implications for
debates about the cause of the gap between normative
models and descriptive models of actual performance. Pat-
terns of individual differences have implications for argu-
ments that all such gaps reflect merely performance errors.
Individual differences are also directly relevant to theories
that algorithmic-level limitations prevent the computation
of the normative response in a system that would otherwise
compute it. The wrong norm and alternative construal ex-
planations of the gap involve many additional complications
but, at the very least, patterns of individual differences
might serve as “intuition pumps” (Dennett 1980) and alter
our reflective equilibrium regarding the plausibility of such
explanations (Stanovich 1999).
Different outcomes occurred across the wide range of
tasks we have examined in our research program. Of
course, all the tasks had some unreliable variance and thus
some responses that deviated from the response considered
normative could easily be considered as performance er-
rors. But not all deviations could be so explained. Several
tasks (e.g., syllogistic reasoning with interfering content,
four-card selection task) were characterized by heavy com-
putational loads that made the normative response not pre-
scriptive for some subjects – but these were usually few in
number.
13
Finally, a few tasks yielded patterns of covari-
ance that served to raise doubts about the appropriateness
of normative models applied to them and/or the task con-
struals assumed by the problem inventors (e.g., several non-
causal baserate items, false consensus effect).
Although many normative/descriptive gaps could be re-
duced by these mechanisms, not all of the discrepancies
could be explained by factors that do not bring human ra-
tionality into question. Algorithmic-level limitations were
far from absolute. The magnitude of the associations with
cognitive ability left much room for the possibility that the
remaining reliable variance might indicate that there are
systematic irrationalities in intentional-level psychology. A
component of our research program mentioned only briefly
previously has produced data consistent with this possibil-
ity. Specifically, it was not the case that once capacity limi-
tations had been controlled the remaining variations from
normative responding were unpredictable (which would
have indicated that the residual variance consisted largely
Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 663
of performance errors). In several studies, we have shown
that there was significant covariance among the scores from
a variety of tasks in the heuristics and biases literature after
they had been residualized on measures of cognitive ability
(Stanovich 1999). The residual variance (after partialling
cognitive ability) was also systematically associated with
questionnaire responses that were conceptualized as inten-
tional-level styles relating to epistemic regulation (Sá et al.
1999; Stanovich & West 1997; 1998c). Both of these find-
ings are indications that the residual variance is systematic.
They falsify models that attempt to explain the normative/
descriptive gap entirely in terms of computational limita-
tions and random performance errors. Instead, the findings
support the notion that the normative/descriptive discrep-
ancies that remain after computational limitations have
been accounted for reflect a systematically suboptimal in-
tentional-level psychology.
One of the purposes of the present research program is
to reverse the figure and ground in the rationality debate,
which has tended to be dominated by the particular way
that philosophers frame the competence/performance dis-
tinction. For example, Cohen (1982) argues that there re-
ally are only two factors affecting performance on rational
thinking tasks: “normatively correct mechanisms on the one
side, and adventitious causes of error on the other” (p. 252).
Not surprisingly given such a conceptualization, the pro-
cesses contributing to error (“adventitious causes”) are of
little interest to Cohen (1981; 1982). But from a psycho-
logical standpoint, there may be important implications in
precisely the aspects of performance that have been back-
grounded in this controversy (“adventitious causes”). For
example, Johnson-Laird and Byrne (1993) articulate a view
of rational thought that parses the competence/perfor-
mance distinction much differently from that of Cohen
(1981; 1982; 1986) and that simultaneously leaves room for
systematically varying cognitive styles to play a more im-
portant role in theories of rational thought. At the heart of
the rational competence that Johnson-Laird and Byrne
(1993) attribute to humans is not perfect rationality but in-
stead just one meta-principle: People are programmed to
accept inferences as valid provided that they have con-
structed no mental model of the premises that contradict
the inference. Inferences are categorized as false when a
mental model is discovered that is contradictory. However,
the search for contradictory models is “not governed by any
systematic or comprehensive principles” (p. 178).
The key point in Johnson-Laird and Byrne’s (1991; 1993;
see Johnson-Laird 1999) account
14
is that once an individ-
ual constructs a mental model from the premises, once the
individual draws a new conclusion from the model, and
once the individual begins the search for an alternative
model of the premises which contradicts the conclusion,
the individual “lacks any systematic method to make this
search for counter-examples” (p. 205; see Bucciarelli &
Johnson-Laird 1999). Here is where Johnson-Laird and
Byrne’s (1993) model could be modified to allow for the in-
fluence of thinking styles in ways that the impeccable com-
petence view of Cohen (1981; 1982) does not. In this pas-
sage, Johnson-Laird and Byrne seem to be arguing that
there are no systematic control features of the search
process. But styles of epistemic regulation (Sá et al. 1999;
Stanovich & West 1997) may in fact be reflecting just such
control features. Individual differences in the extensiveness
of the search for contradictory models could arise from a
variety of cognitive factors that, although they may not be
completely systematic, may be far from “adventitious” (see
Johnson-Laird & Oatley 1992; Oatley 1992; Overton 1985;
1990) – factors such as dispositions toward premature clo-
sure, cognitive confidence, reflectivity, dispositions toward
confirmation bias, ideational generativity, and so on.
Dennett (1988) argues that we use the intentional stance
for humans and dogs but not for lecterns because for the
latter “there is no predictive leverage gained by adopting
the intentional stance” (p. 496). In the experiments just
mentioned (Sá et al. 1999; Stanovich & West 1997; 1998c),
it has been shown that there is additional predictive lever-
age to be gained by relaxing the idealized rationality as-
sumption of Dennett’s (1987; 1988) intentional stance and
by positing measurable and systematic variation in inten-
tional-level psychologies. Knowledge about such individual
differences in people’s intentional-level psychologies can
be used to predict variance in the normative/descriptive
gap displayed on many reasoning tasks. Consistent with the
Meliorist conclusion that there can be individual differ-
ences in human rationality, our results show that there is
variability in reasoning that cannot be accommodated
within a model of perfect rational competence operating in
the presence of performance errors and computational lim-
itations.
NOTES
1. Individual differences on tasks in the heuristics and biases
literature have been examined previously by investigators such as
Hoch and Tschirgi (1985), Jepson et al. (1983), Rips and Conrad
(1983), Slugoski and Wilson (1998), and Yates et al. (1996). Our
focus here is the examination of individual differences through a
particular metatheoretical lens – as providing principled con-
straints on alternative explanations for the normative/descriptive
gap.
2. All of the work cited here was conducted within Western
cultures which matched the context of the tests. Of course, we rec-
ognize the inapplicability of such measures as indicators of cogni-
tive ability in cultures other than those within which the tests were
derived (Ceci 1996; Greenfield 1997; Scribner & Cole 1981).
Nevertheless, it is conceded by even those supporting more con-
textualist views of intelligence (e.g., Sternberg 1985; Sternberg &
Gardner 1982) that measures of general intelligence do identify
individuals with superior reasoning ability – reasoning ability that
is then applied to problems that may have a good degree of cul-
tural specificity (see Sternberg 1997; Sternberg & Kaufman 1998).
3. The Scholastic Assessment Test is a three-hour paper-and-
pencil exam used for university admissions testing. The verbal sec-
tion of the SAT test includes three types of items: verbal analogies,
sentence completions, and critical reading problems. The mathe-
matical section contains arithmetic, algebra, and geometry prob-
lems that require quantitative reasoning.
4. We note that the practice of analyzing a single score from
such ability measures does not imply the denial of the existence of
second-order factors in a hierarchical model of intelligence. How-
ever, theorists from a variety of persuasions (Carroll 1993; 1997;
Hunt 1997; Snyderman & Rothman 1990; Sternberg & Gardner
1982; Sternberg & Kaufman 1998) acknowledge that the second-
order factors are correlated. Thus, such second-order factors are
not properly interpreted as separate faculties (despite the popu-
larity of such colloquial interpretations of so-called “multiple in-
telligences”). In the most comprehensive survey of intelligence re-
searchers, Snyderman and Rothman (1990) found that by a
margin of 58% to 13%, the surveyed experts endorsed a model of
“a general intelligence factor with subsidiary group factors” over
a “separate faculties” model. Throughout this target article we uti-
lize a single score that loads highly on the general factor, but analy-
ses which separated out group factors (Stratum II in Carroll’s
Stanovich & West: Individual differences in reasoning
664 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
widely accepted model based on his analysis of 460 data sets; see
Carroll 1993) would reveal convergent trends.
5. Positive correlations with developmental maturity (e.g.,
Byrnes & Overton 1986; Jacobs & Potenza 1991; Klahr et al. 1993;
Markovits & Vachon 1989; Moshman & Franks 1986) would seem
to have the same implication.
6. However, we have found (Stanovich & West 1999) that the
patterns of individual differences reversed somewhat when the
potentially confusing term “false positive rate” was removed from
the problem (see Cosmides & Tooby 1996 for work on the effect
of this factor). It is thus possible that this term was contributing to
an incorrect construal of the problem (see sect. 5).
7. However, sometimes alternative construals might be com-
putational escape hatches (Stanovich 1999). That is, an alternative
construal might be hiding an inability to compute the normative
model. Thus, for example, in the selection task, perhaps some
people represent the task as an inductive problem of optimal data
sampling in the manner that Oaksford and Chater (1994; 1996)
have outlined because of the difficulty of solving the problem if
interpreted deductively. As O’Brien (1995) demonstrates, the ab-
stract selection task is a very hard problem for a mental logic with-
out direct access to the truth table for the material conditional.
Likewise, Johnson-Laird and Byrne (1991) have shown that tasks
requiring the generation of counterexamples are difficult unless
the subject is primed to do so.
8. The results with respect to the framing problems studied by
Frisch (1993) do not always go in this direction. See Stanovich and
West (1998b) for examples of framing problems where the more
cognitively able subjects are not less likely to display framing ef-
fects.
9. Kahneman and Tversky (1982) themselves (pp. 132
35)
were among the first to discuss the issue of conversational impli-
catures in the tasks employed in the heuristics and biases research
program.
10. Of course, another way that cognitive ability differences
might be observed is if the task engages only System 2. For the
present discussion, this is an uninteresting case.
11. It should be noted that the distinction between normative
and evolutionary rationality used here is different from the dis-
tinction between rationality
1
and rationality
2
used by Evans and
Over (1996). They define rationality
1
as reasoning and acting “in
a way that is generally reliable and efficient for achieving one’s
goals” (p. 8). Rationality
2
concerns reasoning and acting “when
one has a reason for what one does sanctioned by a normative the-
ory” (p. 8). Because normative theories concern goals at the per-
sonal level, not the genetic level, both of the rationalities defined
by Evans and Over (1996) fall within what has been termed here
normative rationality. Both concern goals at the personal level.
Evans and Over (1996) wish to distinguish the explicit (i.e., con-
scious) following of a normative rule (rationality
2
) from the largely
unconscious processes “that do much to help them achieve their
ordinary goals” (p. 9). Their distinction is between two sets of al-
gorithmic mechanisms that can both serve normative rationality.
The distinction we draw is in terms of levels of optimization (at the
level of the replicator itself – the gene – or the level of the vehi-
cle); whereas theirs is in terms of the mechanism used to pursue
personal goals (mechanisms of conscious, reason-based rule fol-
lowing versus tacit heuristics).
It should also be noted that, for the purposes of our discussion
here, the term evolutionary rationality has less confusing conno-
tations than the term “adaptive rationality” discussed by Oaksford
and Chater (1998). The latter could potentially blur precisely the
distinction stressed here – that between behavior resulting from
adaptations in service of the genes and behavior serving the or-
ganism’s current goals.
12. Evidence for this assumption comes from voluminous data
indicating that analytic intelligence is related to the very type of
outcomes that normative rationality would be expected to maxi-
mize. For example, the System 2 processes that collectively com-
prise the construct of cognitive ability are moderately and reliably
correlated with job success and with the avoidance of harmful be-
haviors (Brody 1997; Lubinski & Humphreys 1997; Gottfredson
1997).
13. Even on tasks with clear computational limitations, some
subjects from the lowest strata of cognitive ability solved the prob-
lem. Conversely, on virtually all the problems, some university
subjects of the highest cognitive ability failed to give the norma-
tive response. Fully 55.6% of the university subjects who were at
the 75th percentile or above in our sample in cognitive ability
committed the conjunction fallacy on the Linda Problem. Fully
82.4% of the same group failed to solve a nondeontic selection task
problem.
14. A reviewer has pointed out that the discussion here is not
necessarily tied to the mental models approach. The notion of
searching for counterexamples under the guidance of some sort of
control process is at the core of any implementation of logic.
Open Peer Commentary
Commentary submitted by the qualified professional readership of this
journal will be considered for publication in a later issue as Continuing
Commentary on this article. Integrative overviews and syntheses are es-
pecially encouraged.
Three fallacies
Jonathan E. Adler
Department of Philosophy, Brooklyn College and the Graduate Center of the
City University of New York, Brooklyn, NY 11210
jadler@brooklyn.cuny.edu
Abstract: Three fallacies in the rationality debate obscure the possibility
for reconciling the opposed camps. I focus on how these fallacies arise in
the view that subjects interpret their task differently from the experi-
menters (owing to the influence of conversational expectations). The
themes are: first, critical assessment must start from subjects’ under-
standing; second, a modal fallacy; and third, fallacies of distribution.
Three fallacies in the rationality debate obscure the possibility for
reconciling the opposed camps, a reconciliation toward which
Stanovich & West (S&W) display sympathy in their discussion of
dual models and the understanding/accepting principle.
The fallacies are prominent in S&W’s treatment of the response
they take as most challenging: “subjects have a different inter-
pretation of the task” (sect. 5, para. 1). The response requires the
subjects’ understanding assumption: “criticism of subjects’ per-
formance must start from the subjects’ understanding.” The argu-
ment then is that if subjects’ responses are correct according to
their own (reasonable) interpretation of their task, then they are
correct (Adler 1994).
But consider conversational reinterpretations of Piaget’s exper-
iments on conservation. Children younger than 6-years-of-age
deny that the length of sticks remains the same after rearrange-
ment, and that is explained as satisfying the expectation of rele-
vance of the experimenter’s action (in moving the objects). How-
ever, if a child over 7 offered the nonconserving response, we
would regard that as a defect in reasoning, even though it was per-
fectly in accord with his (reasonable) interpretation. This child has
not achieved the common knowledge that not all activities by
speakers (or actors) along with their focal contribution are maxi-
mally relevant to it.
Contrary to the subjects’ understanding assumption, such con-
textualized interpretations may reflect a defective or weak grasp
Commentary/Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 665
of the crucial logical terms and formal principles that the experi-
menters are studying. The weak grasp helps explain why subjects
so readily read their task differently from the experimenters. Con-
sider the base-rate studies. It is plausible that understanding
“probability” as something close to “explanatorily coherent” or
“reasonable, based on the evidence” fits ordinary usage and rea-
soning better. Although that interpretation may be suggested and
even encouraged, it is not mandated. (Gricean implicatures are re-
quired, not merely allowed or suggested; Grice 1989) The inter-
pretations of their task that subjects must make to accord with ex-
planations like the conversational one depend upon their failure
to have accessible or online grasp of important distinctions.
Compare this to teaching: I understand why students read
“Jones does not believe that Marcia is in Italy” as “Jones believes
that Marcia is not in Italy.” Roughly, negation is understood (con-
versationally) to function as a denial, and a denial is typically in-
formative only if treated as having small-scope reading. But it is
my job to show students that there is an important distinction
there that their understanding obscures and that encourages
faulty inference. I expose the defects through eliciting other, re-
lated judgments of theirs. The failing is a failing according to their
own beliefs.
The fallacy in the subjects’-understanding assumption is an in-
stance of the broader one in the saying “to understand is to for-
give.” To understand subjects’ responses conversationally need
not be to justify them.
To infer from the correctness of subjects’ judgments given their
own construal of their task the correctness of subjects’ judgments
is to commit a second (modal) fallacy:
If S understands problem P as t and if answer A follows from (is
the right answer to) t, then S reasons well when S answers A to
problem P.
S understands problem P as t and answer A does follow from t.
So, S reasons well when S answers A to t.
The conclusion does not follow, as it does not in the following:
If John is a bachelor, then he must be unmarried.
John is a bachelor.
So, John must be unmarried.
The fallacy arises from the natural misplacement of the “must” in
the consequent clause. The necessity really governs the whole
statement. Similarly, if we place the “reasons well,” from the pre-
vious argument sketch, out front, it is evident that the conclusion
does not follow, unless we also know that:
S reasons well in understanding P as t.
But children do not reason well in the Piaget task by understand-
ing the question actually asked “Are the sticks the same size or is
one longer?” as having the meaning “Which answer – that the
sticks are the same size or that one is longer – renders the exper-
imenter’s actions (more) relevant to his actual question?” (Com-
pare to Margolis 1987: Ch. 8.)
The third and last fallacy is a distributional one. Consider,
specifically, the base rate studies applied to judgments of one’s
marital prospects, where the divorce rates are to serve as base
rates. Gilovich (1991) comments: “To be sure, we should not dis-
count our current feelings and self-knowledge altogether; we just
need to temper them a bit more with our knowledge of what hap-
pens to people in general. This is the consensus opinion of all
scholars in the field.”
But if belief is connected to action in the standard ways, the rec-
ommendation does not accord with the marriage practice, since if
a prospective spouse lessens her commitment, she cannot expect
her partner not to do so in turn, threatening an unstable down-
ward spiral. But it is equally wrong to conclude that one should
never temper one’s judgments, nor that for purposes of prediction
or betting, one should not integrate the divorce rates with one’s
self-knowledge.
But let us revert to our main illustration by considering Gilo-
vich’s “consensus” recommendation as applied to conversation.
Imagine that we acceded to the recommendation to temper our
acceptance of testimony by an estimate of the degree of reliable
truth telling in the relevant community. The result would be that
much more of what we standardly accept as true we can now only
accept as bearing some high degree of probability. Our tempered
beliefs issue in tempered acceptance. The layers of complexity are
unmanageable. We will undermine the trust and normal flow of
information between speakers and hearers. No conversational
practice corresponds to it.
The (distributional) fallacy in this (Meliorist) direction is to in-
fer that because in any particular case one can so temper one’s ac-
ceptance of the word of another, one can do so regularly. But the
opposed (Panglossian) fallacy is to infer that because the practice
forbids tempering generally, in each case one is justified in not
tempering. This opposite distributional fallacy is a form of rule-
worship it is to infer that we ought never to make an exception
in accord with the details of the case. As above, it is right for us to
expect relevance, but it is wrong not to allow that expectation to
be overruled.
Do the birds and bees need cognitive reform?
Peter Ayton
Department of Psychology, City University, London EC1V 0HB, United
Kingdom
p.ayton@city.ac.uk
Abstract: Stanovich & West argue that their observed positive correla-
tions between performance of reasoning tasks and intelligence strengthen
the standing of normative rules for determining rationality. I question this
argument. Violations of normative rules by cognitively humble creatures
in their natural environments are more of a problem for normative rules
than for the creatures.
What is “normative”? The assumption that decisions can be eval-
uated in relation to uncontentious abstract rules has proved allur-
ing. More than 300 years ago Leibniz envisaged a universal calcu-
lus or characteristic by means of which all reasonable people
would reach the same conclusions. No more disputes:
“Its authority will not be open any longer to doubt when it becomes
possible to reveal the reason in all things with the clarity and certainty
which was hitherto possible only in arithmetic. It will put an end to that
sort of tedious objecting with which people plague each other, and
which takes away the pleasure of reasoning and arguing in general.
(Leibniz 1677/1951, p. 23)
His plan was simple: characteristic numbers would be established
for all ideas and all disputes would be reduced to computations in-
volving those numbers. Leibniz felt that establishing the charac-
teristic numbers for all ideas would not take all that long to im-
plement:
I believe that a few selected persons might be able to do the whole thing
in five years, and that they will in any case after only two years arrive at
a mastery of the doctrines most needed in practical life, namely, the
propositions of morals and metaphysics, according to an infallible
method of calculation.” (pp. 2223)
What became of this plan? Though Leibniz was not alone, be-
fore long mathematicians and philosophers gave up the task of re-
ducing rationality to a calculus (see Daston 1988). Curiously, psy-
chologists have not (see Gigerenzer 1996b). The Stanovich &
West (S&W) target article claims that the systematic association
of normative responses with intelligence is embarrassing to those
who feel that norms are being incorrectly applied. But, as the au-
thors admit, there are unembarrassing accounts for this phenom-
enon: where there is ambiguity about how to construe the task,
people who are educated or motivated to reason with mathemat-
ics or logic might do so, while others reason according to some
other scheme.
Commentary/Stanovich & West: Individual differences in reasoning
666 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
In their experiments on the Linda problem, Hertwig and
Gigerenzer (1999) showed that people impute a nonmathemati-
cal meaning to the term “probability”; their reasonable pragmatic
inferences then result in violations of the conjunction rule.
Nonetheless, in different contexts, subjects correctly interpreted
the mathematical meaning of the term “probability” and were less
likely to violate the rule. Sensitivity to the cues for different inter-
pretations is likely to be conditioned by subjects’ education, mo-
tivation, and cognitive ability. This hypothesis could be tested.
However, if more intelligent people used the mathematical inter-
pretation more often, why should this condemn the responses of
those who reasoned differently?
S&W argue that we should try to avoid the conclusion that those
with more computational power are systematically computing the
nonnormative response and that, in any case, such an outcome
would be “an absolute first.” However, by the narrow standards of
normative theory, there is evidence of exactly this phenomenon.
Ayton and Arkes (1998) and Arkes and Ayton (1999) review stud-
ies that show that adults commit an error contrary to the nor-
mative cost-benefit rules of choice, whereas children and even
cognitively humble nonhuman organisms do not. Yet one need
not – unless one is committed to judging by the normative rule
alone – conclude that adults are in any general sense less rational.
According to the notion of bounded rationality (Simon 1956;
1992), both the computational limits of cognition and the struc-
ture of the environment may foster the use of “satisficing” strate-
gies that are effective despite violating normative rules.
The inadequacy of judging by normative rules is brought into
focus when we contemplate how we would deal with evidence of
normative violations in cognitively humble lower animals’ deci-
sions: would it make sense to claim that they are irrational? For
example, transitivity of choice is considered one of the corner-
stones of classical rationality (Fishburn 1991; McClennan 1990).
Nonetheless, Shafir (1994b) has shown that honey bees violate
transitivity in choosing between flowers. Should we conclude that
they are irrational and in need of cognitive reform? Bees have
been successfully going about their business for millions of years.
As they have managed to survive for as long as they have, whilst
violating one of the basic axioms of rationality, one feels that it is
the axiom that is limited in capturing what it means to be rational
– not the bees.
Shafir explained that the intransitivities indicate that bees make
comparative rather than absolute evaluations of flowers. Tversky
(1969) suggested that comparative decision-making was more ef-
ficient than absolute evaluation – it requires less processing. So,
once one takes their environment and the resource constraints
into account it may well be that bees’ behaviour is optimal – de-
spite not being predictable from any normative model.
Other researchers claim that wasps, birds, and fish commit the
sunk cost fallacy (Ayton & Arkes 1998; Arkes & Ayton 1999). In
wild animals, natural selection will ruthlessly expunge any strategy
that can be bettered at no extra cost but, of course, in nature there
are resource constraints. The extra computational resources
needed to behave as normative theory requires might be prohib-
itive – given that “fast and frugal” strategies operating in natural
environments can be highly effective whilst clearly violating ax-
ioms of rationality (Gigerenzer & Goldstein 1996). Birds do it,
bees do it, even educated Ph.D.s do it; why not violate normative
rules of rationality?
Slovic and Tversky’s (1974) paper beautifully illustrates the vul-
nerability of normative theory in its dependence on the accep-
tance of unverifiable axioms; they did not demonstrate that the
deeper the understanding of the independence axiom, the greater
the readiness to accept it – but even if they had this would hardly
“prove” it. Given his rejection of the axiom, should we assume that
Allais (1953) doesn’t understand it? Feeling uncertain, Edwards
(1982) polled decision theorists at a conference. They unani-
mously endorsed traditional Subjective Expected Utility theory as
the appropriate normative model and unanimously agreed that
people do not behave as that model requires. S&W’s interpreta-
tion of the positive manifold might be seen as an election about
normative rules – where votes are weighted according to IQ. But
science does not progress through elections.
Finally, two terminological concerns. “Meliorist” is an odd term
for those who assume that irrationality results from the inherent
nature of cognition. The term “Panglossian” is somewhat ironic
given that the inspiration for Voltaire’s comic character Dr. Pan-
gloss was that dreamer of the power of logic, Leibniz.
Alternative task construals, computational
escape hatches, and dual-system theories
of reasoning
Linden J. Ball and Jeremy D.Quayle
Institute of Behavioural Sciences, University of Derby, Mickleover, Derby, DE3
5GX, United Kingdom
{l.j.ball; j.d.quayle}@derby.ac.uk
IBS.derby.ac.uk/staff/Linden_Ball.html
Abstract: Stanovich & West’s dual-system represents a major develop-
ment in an understanding of reasoning and rationality. Their notion of Sys-
tem 1 functioning as a computational escape hatch during the processing
of complex tasks may deserve a more central role in explanations of rea-
soning performance. We describe examples of apparent escape-hatch pro-
cessing from the reasoning and judgement literature.
Stanovich & West (S&W) present impressive support for their
proposal that patterns of individual differences in performance
can advance an understanding of reasoning, rationality, and the
normative/descriptive gap. We find their evidence and arguments
compelling, and likewise believe that dual-system accounts are
central to clarifying the nature and limits of human rationality.
Many of S&W’s proposals surrounding the goals, constraints, and
operations of System 1 (contextualised, interactional intelligence)
and System 2 (decontextualised, analytic intelligence) strike us as
significant conceptual advances over previous dual-process ac-
counts (Epstein 1994; Evans & Over 1996; Sloman 1996), which,
because of elements of under-specification, have often raised as
many questions as they have answered. Other strengths of S&W’s
account derive from their recognition of the importance of inten-
tional-level constructs (e.g., metacognition and thinking styles) in
controlling cognition.
Still, certain claims about how a dual-system distinction can ex-
plain performance dichotomies between individuals of differing
analytic intelligence seem worthy of critical analysis. One claim
that forms our focus here is that “sometimes alternative constru-
als [arising from System 1] might be computational escape
hatches.... That is, an alternative construal might be hiding an
inability to compute the normative model” (S&W, n. 7). As an ex-
ample S&W note that some people may process abstract selection
tasks as inductive problems of optimal data sampling (Oaksford &
Chater 1994) because of difficulties in computing deductive re-
sponses via System 2. This computational-escape-hatch concept is
appealing; we have alluded to a similar notion (Ball et al. 1997, p.
60) when considering the processing demands (e.g., relating to the
requirement for meta-inference) of abstract selection tasks. We
wonder, however, whether the computational-escape-hatch idea
should feature more centrally in S&W’s dual-system account so
that it may generalise findings across a range of difficult tasks.
To explore this possibility it is necessary to examine S&W’s pro-
posals regarding the application of analytic abilities to override
System 1 task construals. They state that “for some individuals,
System 2 processes operating in parallel . . . will have the requisite
computational power . . . to override the response primed by Sys-
tem 1” (sect. 6.1, para. 5), and further note that this override func-
tion “might only be needed in a tiny minority of information pro-
cessing situations (in most cases, the two Systems will interact in
concert)” (sect. 6.3, para. 2). What we find revealing here is the
suggestion that all individuals will at least attempt to apply System
Commentary/Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 667
2 processes to achieve some form of decontextualised task con-
strual – albeit perhaps a fragmentary one. Having put the effort
into System 2 computation it is hard to imagine why any individ-
ual (even ones low in analytic intelligence) should then ignore the
System 2 output, unless, perhaps, they lack confidence about the
efficacy of their System 2 computations (i.e., they have metacog-
nitive awareness of having experienced computational difficul-
ties). Indeed considerable evidence exists that people do produce
normatively-optimal responses to computationally-tractable de-
ductive-reasoning problems (e.g., certain “one-model” syllogisms)
and that common non-normative responses to harder problems
reflect possible (but not necessary) inferences from attempts at
applying a deductive procedure (e.g., Johnson-Laird & Byrne
1991). Perhaps, then, tasks at the easy-to-intermediate end of the
complexity continuum do invoke System 2 responses for most in-
dividuals, whereas tasks at the intermediate-to-hard end differ-
entially favour a System 2 response from those of higher analytic
intelligence, and a last-resort System 1 response from those of
lower analytic intelligence (because they induce high levels of
metacognitive uncertainty about System 2 efficacy).
One upshot of this argument is that recourse to System 1 com-
putational escape hatches may, for any individual, vary from prob-
lem to problem depending on processing demands and levels of
metacognitive uncertainty about System 2 functioning. Thus,
whilst performing on nondeontic selection tasks may, for the ma-
jority, reflect either a fall-back System 1 response or a failed at-
tempt at System 2 processing, performance on deontic versions
may be within nearly everyone’s System 2 capabilities. Indeed,
Johnson-Laird (1995) presents an essentially System 2 account of
why deontic selections tasks (and nondeontic ones where counter-
examples can be invoked) may be easy to compute normative re-
sponses for. If deontic selection tasks reflect manageable System
2 processing, then this obviates any need to posit System 1 task
construals (e.g., based around pragmatic schemas or Darwinian al-
gorithms).
Another upshot of our argument about task difficulty, metacog-
nitive uncertainty and fall-back mechanisms is the possibility that
an escape-hatch response may actually be the default strategy for
any individual whose motivated attempt at a System 2 construal is
overloaded. As such, computational escape hatches may underlie
more responding than S&W seem willing to concede. One exam-
ple from the judgement literature is Pelham et al.’s (1994) pro-
posal that people fall back on a “numerosity 5 quantity” heuristic
when judging amount (e.g., of food) under conditions of task com-
plexity. Another example comes from our own account on belief-
bias effects in the evaluation of syllogistic conclusions (Quayle &
Ball, in press) which assumes that participants: (1) fulfil instruc-
tions to suspend disbelief and accept the truth of implausible
premises (i.e., by overriding initial System 1 processing); (2) at-
tempt the (System 2) application of a mental-models based rea-
soning strategy; and (3) produce belief-biased responses (i.e., use
System 1 as an escape hatch) when working-memory constraints
lead to threshold levels of metacognitive uncertainty being sur-
passed in relation to the perceived efficacy of (System 2) reason-
ing. The latter, we argue, is more likely to happen on invalid than
valid syllogisms since invalid syllogisms are difficult (see Ball &
Quayle 1999; Hardman & Payne 1995, for supporting evidence),
so explaining the standard interaction between belief and logic on
conclusion acceptances. This escape-hatch account predicts that
participants will be more confident with responses to valid than
invalid problems, and more belief-biased with invalid problems
when they have lower working-memory capacities than fellow rea-
soners. Our data support both predictions (Quayle & Ball, in
press) and are difficult to reconcile with theories positing selective
scrutiny of unbelievable conclusions (e.g., Evans et al. 1993).
In conclusion, although we believe that S&W’s proposals are a
milestone in the development of an understanding of reasoning
and rationality, we feel they may have downplayed the role of Sys-
tem 1 functioning as a computational escape hatch (whether trig-
gered by algorithmic-level limitations or intentional-level factors).
To test predictions of escape-hatch accounts of reasoning would
seem a fruitful avenue for investigation using process-tracing
methods, including protocol analysis, eye tracking and on-line
confidence assessment. Such techniques should help clarify as-
pects of metacognitive processing and the flow of control between
dual systems during reasoning.
Normative and prescriptive implications
of individual differences
Jonathan Baron
Department of Psychology, University of Pennsylvania, Philadelphia, PA
19104-6196
baron@psych.upenn.edu
www.sas.upenn.edu/~jbaron
Abstract: Stanovich & West (S&W) have two goals, one concerned with
the evaluation of normative models, the other with development of pre-
scriptive models. Individual differences have no bearing on normative
models, which are justified by analysis, not consensus. Individual differ-
ences do, however, suggest where it is possible to try to improve human
judgments and decisions through education rather than computers.
The extensive research program described in the target article is
apparently directed at two goals, one concerned with the evalua-
tion of normative models, the other concerned with exploration of
the possibilities for improvement of human judgment and deci-
sions. The latter seems more promising to me.
Stanovich & West (S&W) occasionally imply that normative
models gain support when smarter people endorse their conclu-
sions or when arguments on both sides lead people to endorse
them (the understanding-acceptance principle). In my view
(Baron, in press), normative models should not be justified by con-
sensus, even by consensus of experts or smart people.
Normative models, I argue, come from the analysis of situa-
tions. A good analogy is arithmetic. If you put two drops of water
into a flask, and then another two, you may get fewer than four
units of water in the flask, because the drops join together (Pop-
per 1962). Is this arithmetic falsified? We usually think not, be-
cause we carefully define the application of counting so as to ex-
clude the coalescing of drops.
Similarly, expected-utility theory (for example) comes from
analysis of certain situations into uncertain states of the world (be-
yond our control), acts (under our control), and outcomes (what
we care about), which depend jointly on the acts and states. We
further assume that our caring about outcomes defines a dimen-
sion along which differences are meaningful. Like arithmetic, this
is an idealization, but a useful one.
Normative models like expected-utility theory provide stan-
dards for the evaluation of judgments and decisions. If we define
the standards according to the intuitions of a majority of experts,
then we can never argue against the majority. And what is norma-
tive can change over time.
A good example, which S&W have not studied (yet), is the am-
biguity effect. People prefer to bet on a red (or white) ball being
drawn from an urn containing 50 red balls and 50 white ones than
to bet on a red (or white) ball being drawn from an (ambiguous)
urn containing an unknown proportion of red and white balls
(even if they pick the color, and even if the prize for the first urn
is slightly less). This kind of preference yields responses inconsis-
tent with a normative principle of independence (Baron, in press).
This principle can be derived from logic, once decisions have been
analyzed into acts, states, and outcomes. It is about as clearly nor-
mative as you can get, yet it is so strongly against our intuition that
several scholars have simply rejected it on this basis (e.g., Ellsberg
1961; Rawls 1971).
But where does intuition – even expert intuition – get such au-
thority? Must standards of reasoning depend on a leap of faith in
its power? I think not. We can, instead, understand our intuitions
Commentary/Stanovich & West: Individual differences in reasoning
668 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
as misunderstandings, overgeneralizations of principles that usu-
ally work. The ambiguity effect results from a principle against
choosing options when we lack information about their outcomes.
This is irrelevant to the urn problem, because the information is
unavailable. We know this, but our intuitive judgment is based on
a simpler description, which does not take into account our lack
of access to the composition of the ambiguous urn.
In sum, we need not accept the argument that intuitive judg-
ments – whether made by students in experiments or by philoso-
phers – have normative authority. Even if all subjects gave the
non-normative answer and all philosophers’ intuitions agreed with
the subjects’ responses, the proposed normative model could still
be correct. Likewise, the fact that “experts disagree” is not an an-
swer to an argument for some particular normative model. Truth
is defined by arguments and evidence, not consensus.
Although the S&W research program does not bear on the va-
lidity of normative models, it is highly relevant to prescriptive
questions, that is, questions about who should do what when judg-
ments and decisions fall short of normative standards. It is these
prescriptive questions that, in the end, justify our research. The
heuristics-and-biases tradition is most valuable as an answer to
questions of whether, and how, we can improve decisions in the
real world. (If the answer is that we cannot, because people are al-
ready doing as well as possible, then this is a pessimistic conclu-
sion, as S&W note.) Laboratory research is limited in what it can
tell us about thinking outside the laboratory, but, when we com-
bine laboratory results with observation of real-world problems
(e.g., the chapter on risk in Baron, in press), its implications for
practice can be as helpful in public policy as biology research can
be in medicine. As in medicine, we must test the implications in
the “clinic” too.
The results on individual differences suggest that, in some
cases, we can hope to teach people to improve their judgment on
their own. Some people are already doing well according to nor-
mative standards. In many of these cases, it is difficult to see how
limitations on mental capacity in the narrow sense (Baron 1985)
can prevent others people from improving. In other cases, such as
some of the observed failures to use prior probabilities, it seems
that few can learn to improve. When we find this, the thing to do
is not to throw out the normative model, but, rather, to rely more
on external aids, such as computers.
Reasoning strategies in syllogisms:
Evidence for performance errors along
with computational limitations
Monica Bucciarelli
Centro di Scienze Cognitiva, Università di Torino, 10123 Turin, Italy
monica@psych.unito.it
Abstract: Stanovich & West interpret errors in syllogistic reasoning in
terms of computational limitations. I argue that the variety of strategies
used by reasoners in solving syllogisms requires us to consider also per-
formance errors. Although reasoners’ performance from one trial to an-
other is quite consistent, it can be different, in line with the definition of
performance errors. My argument has methodological implications for
reasoning theories.
Stanovich & West (S&W) define performance errors as algorithm-
level problems that are transitory in nature, and computational
limitations as nontransitory problems at the algorithmic level that
would be expected to recur on a readministration of the task. The
authors find covariance between ability in the Scholastic Assess-
ment Test and – among others – the ability to draw syllogistic in-
ferences. They conclude that the gap between normative and de-
scriptive syllogistic reasoning can to a moderate extent be
accounted for by variation in computational limitations. They
claim that positing errors at this level is legitimate.
Alas, the proposal by S&W does not encompass the experi-
mental data on the different performances of reasoners dealing
with the same problem on two different trials. As a consequence,
S&W tend to underestimate the role of transitory algorithm-level
problems in syllogistic reasoning. Indeed, the experimental results
of Johnson-Laird and Steedman (1978) show that adults’ perfor-
mance in syllogistic reasoning can vary significantly when the
same problem is encountered twice. More recently, in line with
these results, Bucciarelli and Johnson-Laird (1999) observed a
great variability in performance – both between and within rea-
soners – in the strategies that they used to draw conclusions from
syllogistic premises. Within the framework of the theory of men-
tal models (Johnson-Laird 1983; Johnson-Laird & Byrne 1991),
they carried out a series of experiments to observe the sequence
of external models that the participants built in drawing their own
conclusions from syllogistic premises. These external models took
the form of shapes representing the various terms in the syllo-
gisms, which the participants could use help them reason. In one
experiment (Experiment 4), each participant carried out the in-
ferential task in two different experimental conditions for pur-
poses of comparison: once using the external models, and once
without using them. Their performance is best interpreted in
terms of computational limitations and performance errors.
Errors can be interpreted as the result of computational limita-
tions. Indeed, Bucciarelli and Johnson-Laird found that, although
their participants drew a slightly more diverse set of conclusions
when they constructed external models than when they did not,
they were moderately consistent in the conclusions that they drew
in the two conditions (60% of their conclusions were logically
identical). This result suggests that the performances of the par-
ticipants, when dealing with the same problems encountered
twice, were constrained by nontransitory problems at the algo-
rithmic level. What does make reasoners’ performance moder-
ately consistent in the two conditions? And more in general, what
does make reasoners differ enormously in their syllogistic ability.
In the perspective offered by mental model theory reasoners can
perform quite consistently with problems encountered twice be-
cause they always rely on their ability to construct and manipulate
models of the premises. As construction and search for the inte-
grated models of the premises is supported by the working mem-
ory, and individuals differ in their working memory capacity, work-
ing memory capacity can be considered a principle according to
which reasoning diverges from the normative principles. In par-
ticular, reasoners with poor working memory capacity are poorer
than those with high working memory capacity in solving syllo-
gisms (Bara et al. 1995; 2000). Although variation in working
memory is almost entirely captured by measures of general intel-
ligence – as S&W point out – it is predictive of the ability to solve
syllogisms. As a consequence, working memory capacity con-
tributes to determine the individual styles of reasoners.
However, in line with an interpretation of syllogistic errors in
terms of performance errors, the experimental results of the ex-
ternal models condition showed that the participants differed in
which premise they interpreted first, in how they interpreted the
premises, and in how they went about searching for counterex-
amples to the models of the premises constructed initially. In
other words, they used different sequences of operations to reach
the same result or different results. The most relevant finding, for
the present argument, is that these differences occurred not only
between individuals, but also within individuals from one problem
to another. Certain aspects of the participants’ performance were
predictable in a probabilistic way, for example the most common
interpretation of a premise in a particular mood. But, the results
show that it is impossible to predict precisely what an individual
will do on a particular trial. This result is consistent with studies in
the developmental literature, where it has been found that chil-
dren use a variety of strategies when reasoning with spatial prob-
lems (Ohlsson 1984) and causal problems (Shultz et al. 1986). In
particular, children as well as adults tend to use different strate-
gies with the same problem encountered twice.
Commentary/Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 669
The methodological implication of my argument is that a the-
ory of reasoning faces the problem of predicting reasoners’ per-
formance in spite of the great variability of the strategies they use.
Indeed, individuals seldom carry out a fixed deterministic strategy
in any sort of thinking (Johnson-Laird 1991). As it seems almost
impossible to characterize how a certain individual performs in a
specific reasoning problem, Johnson-Laird (in Bucciarelli & John-
son-Laird 1999) suggests that an element of nondeterminism
must be built into any theory of reasoning. One way to express
such a theory would be a grammar with alternative rules that al-
low for alternative ways in which to represent premises, formulate
conclusions, and search for counterexamples. In this way, the the-
ory could be used to “parse” each sequence of models constructed
in reaching conclusions. In my view, the introduction of nonde-
terminism into a theory of reasoning would be the solution to a
current debate in the psychology of reasoning, whose focus is the
appropriateness of the normative models used to evaluate rea-
soners’ performance. In particular, the expression of a theory of
reasoning in the form of a grammar would be the alternative so-
lution to the two extreme positions addressed by S&W, that is, ei-
ther reading off the normative from the descriptive, or to subtly
fine-tune and adjust normative applications based on descriptive
facts about reasoning performance.
ACKNOWLEDGMENTS
I thank Bruno Bara, Philip Johnson-Laird, and Vincenzo Lombardo who
criticized earlier versions of this comment. This work was supported by the
Ministero dell’Università e della Ricerca Scientifica e Tecnologica of Italy
(co-financed project 1999, Reasoning processes and mental models:
Analyses of spatial inferences).
Reversing figure and ground in the rationality
debate: An evolutionary perspective
W.Todd DeKay,
a
Martie G. Haselton,
b
and Lee A.
Kirkpatrick
c
a
Department of Psychology, Franklin and Marshall College, Lancaster PA
17604-3003;
b
Department of Psychology, University of Texas at Austin,
Austin, TX 78712;
c
Department of Psychology, College of William and Mary,
Williamsburg, VA 23188.
t_dekay@acad.fandm.edu
haselton@psy.utexas.edu lakirk@wm.edu
Abstract: A broad evolutionary perspective is essential to fully reverse fig-
ure and ground in the rationality debate. Humans’ evolved psychological
architecture was designed to produce inferences that were adaptive, not
normatively logical. This perspective points to several predictable sources
of errors in modern laboratory reasoning tasks, including inherent, sys-
tematic biases in information-processing systems explained by Error Man-
agement Theory.
Stanovich & West (S&W) suggest that one of their purposes is to
reverse the figure and ground in the rationality debate. We con-
tend that they have not gone far enough. In this commentary, we
extend their main point by offering a broader evolutionary per-
spective.
Life has existed on this planet for a few billion years, modern
Homo sapiens for perhaps 500 centuries, Wason tasks and bank-
teller problems for only a few decades. All species that have ever
existed have done so by virtue of evolved architectures designed
by natural selection to solve survival and reproductive problems
in ways that enhanced inclusive fitness relative to available alter-
native designs. The small subset of species possessing brains, in-
cluding humans, is characterized by a diverse array of specialized
evolved psychological mechanisms. Today we are capable of con-
structing, and sometimes even correctly solving, novel and clever
logical and mathematical problems.
The fact that people often make errors in solving Wason-task
and bank-teller problems is not surprising. The astonishing fact is
that we can conceptualize such problems at all – much less solve
them. Natural selection favors organic designs that outreproduce
alternative designs, not necessarily those that are normatively ra-
tional. In the currency of reproductive success, a rational infer-
ence that ultimately interferes with survival or reproduction al-
ways loses to a normatively “flawed” inference that does not. This
means that good decision-making will not necessarily conform to
the rules of formal logic.
“Panglossian” attempts to find explanations for reasoning errors
that “prevent the ascription of irrationality to subjects” (S&W,
sect. 1, para. 2) have clearly missed this point. The “Meliorist”
claim that “human cognition [is] characterized by systematic irra-
tionalities” – as if humans are fundamentally irrational and only
sometimes are accidentally rational – is equally misdirected. We
contend that it is most useful first to try to identify our evolved
psychological mechanisms and their proper functions, and then
attempt to determine how humans manage to recruit these mech-
anisms for purposes other than those for which they were de-
signed. An evolutionary understanding of the mechanisms used in
a given task will provide principled criteria with which we can de-
termine what constitutes a functional error in reasoning (mal-
adaptive decisions), as opposed to adaptive deviations from nor-
mative rationality.
From this perspective, there are several reasons why we might
expect evolved minds to make errors in laboratory reasoning prob-
lems. First, people bring into the lab a toolbox packed with spe-
cialized implements, none (usually) designed expressly for the task
at hand. The surprise occurs when a task does happen to fit a tool
with which we come equipped, as when a Wason task problem is
couched in social-contract terms (Cosmides 1989).
Second, logic problems typically require one to ignore vast
amounts of real-world knowledge to focus entirely on abstractions.
The gambler’s fallacy, for example, requires the subject to put
aside knowledge that many events in the world naturally occur in
cyclical causal patterns (e.g., weather patterns, resource distribu-
tions), so that a long period of not-X really is predictive that X is
overdue (Pinker 1997).
Third, as noted by S&W, our evolved psychological mechanisms
succeeded or failed historically not as a function of their adher-
ence to symbolic logic, but rather as a function of their effects on
survival and reproduction. Rabbits freeze or flee immediately in
response to any sudden sound, apparently mistaking benign
events for predators. These systematic errors result from an adap-
tive system. Ancestral rabbits that were not paranoid did not be-
come rabbit ancestors.
According to Error Management Theory (EMT; Haselton et al.
1998), when the costs and benefits of false negative and false pos-
itive errors differed recurrently over evolutionary history, selec-
tion favored psychological mechanisms biased toward making the
less costly (or more beneficial) error. Optimal designs are not nec-
essarily the most accurate. Some “errors” exist because they
helped humans reason effectively within ancestral environmental
constraints.
EMT explains some biases and errors in human inference that
might otherwise be wrongly attributed to computational limita-
tions or design flaws. Men, for example, tend to overestimate
women’s sexual intent when confronted with ambiguous cues such
as a smile; women, on the other hand, underestimate men’s com-
mitment intent (Haselton & Buss, in press). According to EMT,
these systematic errors evolved because men who erred on the
side of false positive inferences missed fewer sexual opportunities
than men who did not. Women who erred on the side of false neg-
ative inferences about men’s commitment intent were better able
to avoid deceptive men oriented towards uncommitted sex.
In the currency of reproductive success, these inferential biases
are smart because they resulted in lower fitness costs, not because
they are accurate, rational, or formally logical. Like paranoid rab-
bits, humans may possess adaptively biased inferential mecha-
nisms. These adaptive inferences, however, count as irrational
within traditional perspectives on reasoning. To truly reverse fig-
Commentary/Stanovich & West: Individual differences in reasoning
670 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
ure and ground in the rationality debate, we must first clarify what
constitutes a “good” inference. An evolutionary perspective is es-
sential for this task.
The rationality debate from the perspective
of cognitive-experiential self-theory
Seymour Epstein
Psychology Department, University of Massachusetts, Amherst, MA 01003
sepstein@psych.umass.edu
Abstract: A problem with Stanovich & West’s inference that there a non-
intellectual processing system independent from an intellectual one from
data in which they partialled out global intelligence is that they may have
controlled for the wrong kind of intellectual intelligence. Research on cog-
nitive-experiential self-theory over the past two decades provides much
stronger support for two independent processing systems.
More than two decades ago, I proposed a dual-process theory of
personality (Epstein 1973) subsequently labeled “cognitive-expe-
riential self-theory” (CEST). Since then, I have further developed
the theory, and my associates and I have investigated it with an ex-
tensive research program. The most fundamental assumption in
the theory is that there are two modes of information-processing,
experiential and rational. The operation of the experiential system
is preconscious, automatic, effortless, rapid, minimally demand-
ing of cognitive capacity, intimately associated with affect, holis-
tic, associative, and imagistic, and its outcome is experienced pas-
sively (we are seized by our emotions) and as self-evidently valid
(experiencing is believing). The operation of the rational system is
conscious, verbal, effortful, demanding of cognitive resources, af-
fect free, and relatively slow. It is experienced as volitonal and as
requiring logic and evidence to support beliefs. As the experien-
tial system in humans is essentially the same system used by
higher-order nonhuman animals to adapt to their environments by
learning from experience, it has a very long evolutionary history.
In contrast, the rational system is a verbal inferential system with
a very brief evolutionary history.
The two systems operate in parallel and are interactive. All be-
havior is assumed to be influenced by a combination of both sys-
tems, with their relative contribution varying from minimal to
maximal along a dimension. The systems usually interact in such
a harmonious, seamless manner that people believe they are op-
erating as a single system. The combined operation of the two sys-
tems usually results in compromises, but sometimes it produces
what are commonly identified as conflicts between the heart and
the head.
A paradox with implications for the continued existence of the
human species is that human thinking is often highly irrational, to
the point of being seriously destructive to self and others, despite
the human capacity for very high levels of rational thinking. Why
is it that people can solve the most complex technological prob-
lems, yet often fail to solve much simpler problems in living that
are more important to their existence and personal happiness?
The answer, according to CEST, is that the operation of the ratio-
nal mind is biased by the operation of the experiential mind. It fol-
lows that the only way people can be truly rational is to be in touch
with their experiential mind and take its influence into account.
This is not meant to suggest that the rational mind is always supe-
rior. The inferential, rational mind and the learning, experiential
mind each has its advantages and disadvantages. Sometimes the
promptings of the experiential mind, based on generalizations
from past experience, are more adaptive than the logical reason-
ing of the rational mind.
Our research program has provided support for almost all of the
above assumptions. We devoted particular attention to the as-
sumption that there are two independent information-processing
systems. Accordingly, I read the target article by Stanovich & West
(S&W) with considerable anticipation, hoping to find new, im-
portant information in support of dual-process theories. Unfortu-
nately, I found the evidence they cited disappointing.
S&W cite two kinds of evidence from their own research in sup-
port of dual-process theories. The first is that, with variance of in-
tellectual intelligence controlled, there are significant positive in-
tercorrelations among responses to a variety of problems in the
heuristics and biases literature. S&W argue that this common vari-
ance indicates the existence of a broad, nonintellectual cognitive
ability. The difficulty with their argument is that it does not take
into account that intellectual intelligence is not all of a piece. Thus,
it is possible that, had they partialled out the appropriate group
factor of intellectual ability, there would not have been any signif-
icant systematic variance left. I am not saying I believe that this is
what would actually happen, only that it is a possibility that needs
to be ruled out before their conclusion can be accepted.
The other evidence S&W cite consists of relations between
questionnaire responses and performance on problems from the
heuristics literature, controlling for intellectual intelligence. This
relation is here stated without elaboration. The same problem
concerning group intelligence factors holds as for the other re-
search. Moreover, without information on the nature of the ques-
tionnaire responses or a discussion of the meaningfulness of the
relations from a coherent theoretical perspective, it is impossible
to evaluate their broad statement. They may be able to supply such
information, but they have not done so in the present target arti-
cle, perhaps because of space limitations.
The research my associates and I have conducted provides
much stronger support for two processing systems. In the devel-
opment of a self-report questionnaire on individual differences in
thinking style, we found that experiential and rational thinking are
not opposite ends of a single dimension. Rather, they are uncor-
related with each other, and they establish different coherent pat-
terns of relations with other variables. We obtained similar find-
ings between a measure of the intelligence of the experiential
system and a measure of intellectual intelligence. The measures
were unrelated to each other and produced different and coher-
ent patterns of relations with a variety of variables. In studies with
problems from the heuristics literature, we found that people who
gave heuristic responses were often able to give rational responses
when instructed to do so. With the use of a unique experimental
paradigm that presents a conflict between the two processing
modes in the context of a game of chance in which significant
amounts of money are at stake, we found that most participants
responded nonoptimally and according to the principles of the ex-
periential system, while acknowledging that they knew it was fool-
ish to bet against the probabilities. Yet they only behaved irra-
tionally to a modest extent, producing responses indicative of
compromises between the two processing modes. Space limita-
tions do not allow me to present more extensive and detailed in-
formation on the research program. Further information can be
found in a number of published articles (for recent reviews, see
Epstein 1994; Epstein & Pacini 1999).
Fleshing out a dual-system solution
James Friedrich
Department of Psychology, Willamette University, Salem, OR 97301
jfriedri@willamette.edu
Abstract: A prospective integration of evolutionary and other approaches
to understanding rationality, as well as incorporation of individual differ-
ence concerns into the research agenda, are major contributions of
Stanovich & West’s analysis. This commentary focuses on issues of concern
in detailing a dual-system or dual-process model of the sort they propose
and using it as a basis for improving judgment.
Stanovich & West’s (S&W’s) proposed resolution of the rationality
issue in terms of a dual process approach – System 1 and System
Commentary/Stanovich & West: Individual differences in reasoning
BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5 671
2 – holds great promise, particularly as a bridge between evolu-
tionary and traditional “bias” perspectives. In many ways, the
bridge between evolutionary rationality and individual or instru-
mental rationality, to use their terms, is similar to the bridge be-
tween basic cognitive/sensory research and human factors engi-
neering. In designing an airplane cockpit, one can marvel at the
sophistication and phenomenal performance capabilities of the
human cognitive and perceptual system, but one must also con-
front the fact that its “normal” performance can be catastrophic in
certain circumstances. A vigorous defense of the general utility of
humans’ basic construal and performance strategies must avoid
the excess of implying that any resulting catastrophe is of no con-
sequence – that all “crashes” are somehow unimportant or deni-
able.
I suspect, however, that the interaction of the two systems is
likely to be far more subtle and complicated than is typically im-
plied by most dichotomies. In particular, the notion that System 2
might essentially function as an “override” for System 1 poses a
number of research challenges. First, it seems likely that the role
of System 2 activity is systematically overestimated in the research
literature. For example, as S&W note, cognitive ability predicts
performance primarily when System 1 and System 2 processes cue
different responses. But it is unclear how common such differ-
ential cuing is outside of controlled laboratory conditions. And
even for carefully devised experimental tasks, one cannot readily
infer System 2 processes on the basis of responses that conform to
normative standards. In many cases, successful performance may
simply be a byproduct of System 1 processes (cf. Friedrich 1993).
The pervasiveness of System 2 processing may also be overesti-
mated by virtue of the populations studied. To the extent that a
disproportionate amount of the published literature emanates
from elite institutions studying their own student populations,
“g’s” positive manifold with performance on many bias tasks sug-
gests that performance base rates, and the manner in which Sys-
tem 2 is typically deployed, could be easily mischaracterized.
Combined with situational demands for “high g” research partic-
ipants to be on their best behavior and to provide the kind of an-
alytical responses they might think a scientist/researcher values, it
is easy to obtain distorted pictures of typical performance and Sys-
tem 2 engagement.
A second critical element of a dual process account concerns
the manner in which System 2 interacts with System 1. The no-
tion of an “override” system would suggest a fairly clear indepen-
dence of the two. But the automatic, non-conscious nature of Sys-
tem 1 likely contaminates System 2 (cf. Wilson & Brekke 1994).
Conscious, deliberative processes may act primarily on the out-
puts of more primitive System 1 processes. Thus, certain cogni-
tive structures, values, and preferences on which controlled pro-
cessing might be performed are themselves often comprehended
vaguely if at all at any conscious level (Bargh & Chartrand 1999).
One consequence of this is that System 2 overrides may lead to
sub-optimal decisions in certain circumstances. For example, Wil-
son and colleagues (e.g., Wilson et al. 1989) have shown that when
people are asked to give reasons for their attitudes, the subse-
quently generated attitudes seem to be less reflective of people’s
evaluative responses and dispositions to act. Conscious, delibera-
tive processes may seek to explain or justify preferences that are
dimly understood. In this process of “sense making,” they will not
necessarily do this accurately or in ways that serve the individual’s
goals.
Perhaps one of the most far-reaching contributions of the pa-
per is the reintroduction of individual differences into the picture.
Although such differences have typically been treated as error
variance in cognitive research, S&W’s use of the positive manifold
establishes their significance in answering the question of norma-
tiveness and, at least by implication, in addressing concerns of me-
lioration. Of perhaps equal or greater importance than cognitive
ability for understanding the operation of a dual system model,
however, are motivational differences explored more thoroughly
elsewhere by these and other authors (e.g., Cacioppo et al. 1996;
Stanovich 1999). In essence, motivational differences are central
to the question of when System 2 capabilities will be deployed.
Although motivationally relevant “thinking dispositions” have
been linked to the engagement of System 2 processes (e.g., Sà et
al. 1999; Stanovich & West 1997), a critical element only indirectly
addressed in this literature is the perceived instrumentality of an-
alytical processing. Motivation to engage System 2 capabilities is
more than just a preference for thoughtful reflection, as reflection
and contemplation can occur in either highly contextualized or de-
contextualized ways. Specifically, it may be the case that even
among individuals high in System 2 capabilities, some will not per-
ceive analytical, decontextualized processing as instrumental in
bringing about better outcomes. Perhaps the prototypical case ex-
ample is the psychology major receiving methodological training.
A student may dutifully learn and master principles of good de-
sign and scientific inference but remain unconvinced that such ap-
proaches are truly instrumental in achieving “better” understand-
ings of behavior. Thus, apparent failures to generalize such
training beyond an “exam” may reflect an epistemological stance
of sorts; if System 2 capabilities exist but are not viewed as help-
ful, such capabilities are unlikely to be invoked (cf. Friedrich
1987).
In terms of melioration, then, reducing cognitive bias and en-
hancing critical thinking may require more than adapting the en-
vironment to human capabilities, augmenting those capabilities,
and enhancing people’s pleasure in contemplative, open-minded
thought. Persuading people of the instrumental value of decon-
textualized System 2 processing would also seem critical, though
perhaps among the hardest of things to accomplish. For example,
the mere presence of simple and easily processed quantitative in-
formation in arguments has been shown to shift people from ana-
lytical to more heuristic-based processing (Yalch & Elmore-Youch
1984). Highly contextualized System 1 processes are generally
quite effective for dealing with the vicissitudes of daily life. Add to
this certain self-serving attributions regarding one’s naive judg-
ment strategies and an imperfect feedback system that cancels
and reinforces many reasoning errors, and instrumentalities for
System 2 processing might be very difficult to strengthen. Never-
theless, a thorough treatment of such motivational elements is es-
sential for deriving the full conceptual and corrective benefits of
a dual-system understanding of rationality.
The tao of thinking
Deborah Frisch
Department of Psychology, University of Oregon Eugene, OR 97403-1227
dfrisch@oregon.uoregon.edu
Abstract: I discuss several problems with Stanovich & West’s research and
suggest an alternative way to frame the rationality debate. The debate is
about whether analytic (System 2) thinking is superior to intuitive (System
1) thinking or whether the two modes are complementary. I suggest that
the System 1/System 2 distinction is equivalent to the Chinese concepts
of yin and yang and that the two modes are complementary.
A common finding in research on judgment and decision making
(JDM) is that people’s responses to simple reasoning tasks often
conflict with the experimenters’ definitions of the correct answers.
Researchers disagree about whether this discrepancy is due to a
flaw in the subjects or to a flaw in the experimenters’ standards.
Stanovich & West (S&W) have demonstrated that subjects who
agree with the experimenters on the most controversial questions
tend to have higher SAT scores than those who do not. They ar-
gue that this supports the claim that the flaw is in the subjects. I
shall discuss why I disagree with this conclusion.
First, if S&W’s theory is correct, they should expect to find the
same correlation in a replication of their studies with JDM re-
searchers as subjects. That is, their analysis predicts that JDM re-
Commentary/Stanovich & West: Individual differences in reasoning
672 BEHAVIORAL AND BRAIN SCIENCES (2000) 23:5
searchers who challenge the conventional wisdom are less intelli-
gent than researchers who accept the standard answers to reason-
ing tasks. However, it is extremely unlikely that the most vocal crit-
ics of the conventional wisdom – Allais (1953), Ellsberg (1961),
Gigerenzer (1991b; 1996a) and Lopes (1981b; 1996) are less in-
telligent than other researchers in the field.
S&W’s claim is also weakened by the fact that SAT score only
predicts performance on “controversial” problems. For example,
on the controversial “destination” version of the four card prob-
lem, the effect of cognitive ability is “diluted” (sect. 6). There are
two problems with this explanation. First, if both modes cue the
same response on the uncontroversial problems, you would expect
more consensus on these problems than on the controversial ones.
However, there is less consensus on the uncontroversal problems
than on the controversial ones. Second, S&W claim that the
source of variability on controversial problems is intelligence and
the source of variability on uncontroversial problems is some other
unspecified factor. However, they do not give a reason why there
are two sources of individual differences on these problems. Thus,
their account is incomplete and not parsimonious.
S&W describe the two camps in the rationality debate as “Pan-
glossian” versus “Meliorist.” This frame distorts the “non-Meliorist”
perspective. Specifically, they assume that the main reason re-
searchers “reject the norm” is because many subjects do. How-
ever, several researchers have rejected the conventional norms for
intellectual reasons. For example, Allais (1953) and Lopes (1981b;
1996) have argued that it is perfectly rational to violate the inde-
pendence axiom of utility theory. Similarly, Kahneman and Tver-
sky (1984) and Frisch and Jones (1993) have argued that violations
of description invariance (framing effects) can be sensible.
The target article suggests an alternative way to frame the ra-
tionality debate. In Table 3, S&W describe a distinction between
an intuitive mode of reasoning and an analytic one. The debate
among JDM researchers boils down to the question of how these
two modes are related. The “Meliorist” view assumes that the an-
alytic mode is superior to the intuitive one. The “non-Meliorist”
view does not assume that either mode is superior to the other.
A better term for the “non-Meliorist” view is the “complemen-
tary view.” I suggest this term because of a striking similarity be-
tween S&W’s System 1/System 2 distinction and the Chinese dis-
tinction between yin and yang. Capra (1982) describes yin
thinking as intuitive, synthetic, and feminine and yang thinking as
rational, analytic, and masculine. This is essentially the same as the
System 1/System 2 distinction. In S&W’s view, yang is superior to
yin. But in Chinese thought, the two modes are complementary.
As Capra (1982) says, “What is good is not yin or yang but the dy-
namic balance between the two; what is bad or harmful is imbal-
ance” (p. 36). A growing number of JDM researchers including
Epstein (1994), Hammond (1996), and Klein (1998) have en-
dorsed views similar to the complementary view.
In the “Meliorist” versus “Panglossian” frame, only the Melior-
ists offer advice for improving the quality of thinking. The Pan-
glossians think things are fine the way they are. In the Meliorist
(or classical) versus complementary frame, both sides acknowl-
edge it is possible to improve the quality of thinking but they of-
fer different advice about how to achieve this goal. Advocates of
the classical view believe that to improve the quality of thinking,
a person should increase the extent to which he relies on the ana-
lytic mode. Advocates of the complementary view believe that im-
proving the quality of thinking involves achieving an integration
between intuitive and analytic processing. In fact, on the comple-
mentary view, becoming more analytic can be detrimental if a per-
son was already out of balance in that direction to begin with.
S&W endorse the classical view, while I favor the complemen-
tary view (although I realize it is not really in the spirit of comple-
mentarity to pit the two views against each other). In closing, I
would like to comment on a novel justification provided by S&W
for the superiority of System 2 (yang) over System 1 (yin). In sec-
tion 6.3, they say that “‘Life,’ in fact, is becoming more like the
tests!” (p. 35). The idea is that in our increasingly technological so-
ciety, the analytic mode is more adaptive than the intuitive mode.
However, while it is true that the human made world is increas-
ingly domina