ChapterPDF Available

On Second Thought: Reflections on the Reflection Defense

Authors:

Abstract

This chapter sheds light on a response to experimental philosophy that has not yet received enough attention: the reflection defense. According to proponents of this defense, judgments about philosophical cases are relevant only when they are the product of careful, nuanced, and conceptually rigorous reflection. The chapter argues that the reflection defense is misguided: Five studies (N>1800) are presented, showing that people make the same judgments when they are primed to engage in careful reflection as they do in the conditions standardly used by experimental philosophers.
9
On Second Thought
Reections on the Reection Defense
Markus Kneer, University of Zurich
David Colaço, Tulane University
Joshua Alexander, Siena College
Edouard Machery, University of Pittsburgh
1. The Restrictionist Challenge
This much should be uncontroversial: The method of cases plays an import-
ant role in contemporary philosophy. While there is disagreement about
how best to interpret this method,¹ there is little doubt that philosophers
often proceed by considering actual or hypothetical situations, and use
intuitions about such situations to assess philosophical theories. Despite its
central role in philosophical practice, the method of cases has recently come
under pressure: A series of experimental studies suggests that judgments
regarding classic philosophical thought experiments (aka cases) are sen-
sitive to factors such as culture, gender, affect, framing, and presentation
order, factors, that is, that are not standardly thought to be of philosophical
relevance (for review and discussion, see Alexander, 2012; Machery, 2017;
Stich and Machery, forthcoming).
Critics of experimental philosophy have responded to this challenge in
various ways.² Our goal in this chapter is to shed light on one response that
has not yet received enough attention: the reection defense (for previous
discussion, see Weinberg et al., 2012). The reection defense targets features
¹ See Williamson (2007); Malmgrem (2011); Alexander (2012); Cappelen (2012); Deutsch
(2015); Nado (2016); Colaço and Machery (2017); Machery (2017); Strevens (2019).
² For discussion, see, e.g., Alexander and Weinberg (2007); Weinberg et al. (2010); Machery
(2011, 2012, 2017); Alexander (2012, 2016); Cappelen (2012); Schwitzgebel and Cushman
(2012, 2015); Deutsch (2015); Mizrahi (2015).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
Markus Kneer, David Colaço, Joshua Alexander and Edouard Machery, On Second Thought: Reections On The
Reection Defense In: Oxford Studies in Experimental Philosophy Volume 4. Edited by Tania Lombrozo, Joshua
Knobe and Shaun Nichols, Oxford University Press. © Markus Kneer, David Colaço, Joshua Alexander and Edouard
Machery 2021. DOI: 10.1093/oso/9780192856890.003.0010
This is an uncorrected proof. Please refer to the published version.
of the deliberative process invoked in experimental studies of ordinary
judgments about philosophical cases: According to proponents of this
defense, judgments about philosophical cases are relevant only when they
are the product of careful, nuanced, and conceptually rigorous reection,
while, they hold, the judgments elicited in experimental studies are swift
shots from the hip that lack the necessary deliberative care; as such, they are
easily distorted by irrelevant factors. Proponents of the reection defense
conclude that, since these kinds of judgments are unt to serve as input for
responsible philosophical inquiry, experimental studies that reveal their
vagaries can be safely ignored.
We suspect that the reection defense is misguided, and this chapter is an
attempt to defend this suspicion. The reection defense assumes that reec-
tion (i) inuences how people think about philosophical cases and (ii) brings
their judgments more into alignment with philosophical orthodoxy (where it
exists). We call this the Inuence and Alignment Assumption.To illus-
trate the point, take Gettier cases, invoked, for instance, in Kauppinnens
exposition of the reection defense: The assumption is that increased reec-
tion not only occasions a different rate of knowledge ascriptions in Gettier
cases than the standardly high rates of folk ascriptions, butin line with
textbook epistemologyalower rate of knowledge ascriptions. The idea is
thus that increased reection inuences andfrom the point of view of
philosophical orthodoxyimproves the responses to the cases at hand.
In order to examine the Inuence and Alignment Assumption, we present
studies that explore how the folk think about four philosophical cases, or
pairs of cases, that have generated a great deal of attention from both
traditional and experimental philosophers across various areas of philoso-
phy: Cases used to challenge the idea that knowledge is justied true belief,
the idea that reference is xed by description, the idea that knowledge
depends only on epistemic considerations, and the idea that knowledge
entails belief. For each of these cases, we attempted to manipulate reective
care using four common tools from social psychology and behavioral eco-
nomics: A standard delay manipulation, a standard incentivization procedure,
a standard manipulation for increased accountability, and a standard prime
for analytic thinking. We also examined whether people who are primed to
give more reective responses actually respond differently to philosophical
cases than people who are not so primed. Finally, we explored correlations
between how people responded to the cases at hand and individual differences
in preference for slow, careful deliberation using the Rational-Experiential
Inventory (Epstein et al., 1996). Nothing mattered. People seem to make the
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
258  , ç, .
same judgments when they are primed to engage in careful reection as they
do in the conditions standardly used by experimental philosophers. The
reection defense thus seems unwarranted to presume that reection rele-
vantly changes how people think about philosophical cases.
We proceed as follows: In Section 2 we discuss the reection defense in
more detail, and describe our strategy for addressing it in Section 3. After
setting the stage for our empirical research, we present ve experimental
studies and their results in Sections 48, and conclude in Section 9 by
explaining what we take these results to mean for the reection defense.
2. The Reection Defense
Lets begin with a few particularly clear examples of the reection defense,
starting with Ludwigsinuential formulation:
We should not expect that in every case in which we are called on to make a
judgment we are at the outset equipped to make correct judgments without
much reection. Our concepts generally have places in a family of related
concepts, and these families of concepts will have places in larger families
of concepts. How to think correctly about some cases we are presented can
be a matter that requires considerable reection. When a concept, like that
of justication, is interconnected without our thinking in a wide variety of
domains, it becomes an extremely complex matter to map out the concep-
tual connections and at the same time sidestep all the confusing factors.
(2007, 149)
Kauppinen largely concurs:
When philosophers claim that according to our intuitions, Gettier cases are
not knowledge, they are not presenting a hypothesis about gut reactions to
counterfactual scenarios but, more narrowly, staking a claim of how
competent and careful users of the ordinary concept of knowledge would
pre-theoretically classify the case in suitable conditions. The claim, then, is
not about what I will call surface intuitions but about robust intuitions.
(2007, 97)
Liao presents the argument from robust intuitions(without embracing it)
as follows:
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   259
[S]ome might think that one should distinguish between surface intuitions,
which are rst-offintuitions that may be little better than mere guesses;
and robust intuitions, which are intuitions that a competent speaker might
have under sufciently ideal conditions such as when they are not biased.
(2008, 256)
Horvath presents the reection defense (without embracing it) as follows:
[T]he existing studies only aim at spontaneous responses to hypothetical cases
(...).Theopposingclaim(...)isthatwhatweactuallyrelyoninphilosophy
are reective intuitions, which are, it is suggested, of a much better epistemic
quality than the typically spontaneousand unreectiveintuitive responses
ofthe folk (...).But if theintuitions”’ (...)really haveto be understood as
reective intuitions,then the available experimental studies do not contrib-
ute much to its support, or so the objection goes. (2010, 453)
Finally, Nado (2015) also discusses the reection defense, connecting it to
the place of expertise in philosophical methodology (see also Swain et al.,
2008, section 3; Bengson, 2013; Gerken and Beebe, 2016).
The basic idea contained in these passages is rather straightforward:
Philosophers who use the method of cases are only interested in judgments
generated by careful reection about the cases themselves and the concepts
we deploy in response to these cases, and whatever it is that experimental
philosophers have been studying, they have not been studying those kinds of
things. Thus, experimental studies revealing that unreective judgments are
susceptible to a host of irrelevant factors do nothing to disqualify reective
judgments from playing a role in philosophical argumentation.
In more detail, the reection defense begins with a necessary condition
for the philosophical relevance of judgments about thought experiments:
These judgments are philosophically relevant only when they result from
careful reection (Premise 1). It then makes a claim about experimental-
philosophy studies: These studies do not examine judgments that result
from careful reection (Premise 2). It concludes that experimental philoso-
phy ndings are not philosophically relevant.
The two premises of the reection defense call for clarication. First, and
least important, Premise 2 can be formulated in several different ways. The
weakest formulation would merely assert that experimental philosophers
have not clearly demonstrated that their studies examine the right kind of
judgment; for all experimental philosophers have shown, their studies could
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
260  , ç, .
bear on the vagaries of unreective judgments. A stronger formulation
would assert that extant experimental philosophy studies fail to examine
the right kind of judgment, while leaving open the possibility that improved
studies would get at the right kind of judgment. The strongest formulation
would assert that experimental-philosophy studies are necessarily unable to
examine the judgments that result from careful reection, perhaps because
of what careful reection involves. Kauppinen comes close to embracing the
strongest reading, asserting that experimental-philosophy studies are neces-
sarily unable to elicit reective judgments:
Testing for ideal conditions and careful consideration does not seem to be
possible without engaging in dialogue with the test subjects, and that, again,
violates the spirit and letter of experimentalist quasi-observation. ( . . . )
We can imagine a researcher going through a test subjects answers together
with her, asking for the reasons why she answered one way rather than
another (. . .). But this is no longer merely probingthe test subjects. It is not
doing experimental philosophy in the new and distinct sense, but rather a
return to the good old Socratic method. (2007, 106)
The content and plausibility of Premises 1 and 2 also depend on how the
distinction between robust and surface judgments, or, as we will say in the
remainder of this chapter, reective and unreective judgments, is charac-
terized. It is useful to tease apart thin and thick characterizations of this
distinction.³ One end of this continuum is anchored by what we will call the
thin characterization of reective judgment.A judgment is thinly reect-
ive just in case it results from a deliberation process involving attention,
focus, cognitive effort, and so onthe type of domain-general psychological
resources that careful and attentive thinking requiresand unreective
otherwise. We suspect that the thin characterization of reection is similar
to both lay and psychological conceptions of reection (e.g., Paxton et al.,
2012). Horvath (2010) and Nado (2015) also seem to understand reection
thinly.
Thicker conceptions of reective judgments add requirements to the thin
conception of reective judgments. For Kauppinen, for example, reective
³ Weinberg and Alexander (2014) provide an overview of the different conceptions of
intuitionused in current metaphilosophical debates. Both the thin and thicker characteriza-
tions of reective judgment discussed below count as thick conceptions on their way of carving
up the landscape.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   261
judgments are the products of the kind of dialogical activity central to the
Socratic method:
[T]here is no way for a philosopher to ascertain how people would respond
in such a situation without (. . .) entering into dialogue with them, varying
examples, teasing out implications, presenting alternative interpretations
to choose from to separate the semantic and the pragmatic, and so on. I will
call this approach the Dialogue Model of the epistemology of folk concepts.
(2007, 109)
Ludwig is also interested only in thick reective judgments, and on his view
reective judgments are based solely on conceptual competence:
Conducting and being the subject of a thought experiment is a reective
exercise. It requires that both the experimenter and the subject understand
what its point is. As it is a reective exercise, it also presupposes that the
subject of the thought experiment is able to distinguish between judgments
solely based on competence (or recognition of the limits of competence) in
deploying concepts in response to the described scenario. (2007, 135)
Ludwig claries what he means by conceptual competencein a footnote:
Failing to draw a distinction between unreective judgments based on
empirical beliefs and judgments based solely on competence in the deploy-
ment of concepts not uncommonly leads to a failure to appreciate the
special epistemic status of the latter, the special role that rst person
investigation of them plays in the acquisition of a priori knowledge, and
the stability of the judgments which are reached on this basis. (2007, 136)
That is, on his view, reective judgments are epistemically analytic (that is,
entertaining the propositions they express can be sufcient for their
justication).
These are just some ways that we might think about reective judgment
(for another proposal, see Hannon, 2018); whats important for our pur-
poses is just that the reection defense will take different forms depending
on which characterization of reection is involved. It is beyond the scope of
a single chapter to address all the possible variants in depth, and so we will
focus here only on a version of the reection defense that appeals to the thin
conception of reection presented above. While this means that we will be
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
262  , ç, .
leaving thicker versions of the defense to the side for now, we think that
there are good reasons for focusing on a thin version of the reection
defense. First, this version is the most easily tractable by means of experi-
mental toolsthe tools we intend to deploy in what follows. There is a
wealth of tools in psychology and behavioral economics to single out judg-
ments that result from reective deliberation thinly understood, and we
can use these to assess the reection defense. Second, versions of the
reection defense that appeal to thick characterizations of reection face
problems of their own.First, thicker versions of the reection defense
face what we will call a descriptive-inadequacyproblem: the thicker the
notion of reection appealed to, the less likely it is that philosophers
judgments in usual philosophical debates result from a reective deliber-
ation so understood. To illustrate, consider Ludwigs claim that answers to
philosophical cases must be judgments based solely on competence.
Although we will not argue for this claim here, we doubt that the judg-
ments elicited by cases in philosophy are typically of this kind; many of
them do not seem to express analytic propositions at all (Williamson, 2007;
Cappelen, 2012; Machery, 2017). Second, thicker versions of the reection
defense face what we will call a stipulationproblem: Characterizations of
reective judgments should not make it the case by stipulation that experi-
mental philosophersndings happen to bear only on unreective judgments.
Stipulative victories are no victories at all, and it should be an empirical
question whether reective judgments suffer from the vagaries evidenced by
fteen years of experimental philosophy. To illustrate, when Ludwig proposes
that reective judgments are solely based on conceptual competence, he
makes it the case by sheer stipulation that a large part of experimental
philosophy, which examines the inuence of pragmatic factors on judgments
about thought experiments, happens to be studying unreective judgments.
A more satisfying strategy, we propose, would specify reectionso as to
allow for the empirical study of whether reective judgments are immune to
the inuence of pragmatic considerations.
Weinberg and Alexander (2014) also propose a set of conditions that must be met by
anyone attempting to argue that experimental philosophers simply have not been studying the
right kind of judgments or intuitions.Among those conditions is one that they call the
current practice condition; failure to meet their current practice condition is very similar to
what we will call the descriptive-inadequacyproblem below.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   263
3. Addressing the Reection Defense
Our goal in this chapter is to assess a presupposition of the reection
defense: the Inuence and Alignment Assumption, that is, the idea that,
when people consider a thought experiment reectively, they would tend to
judge differently than in the conditions standardly employed by experimen-
tal philosophers, and their responses would be more in line with what
philosophical orthodoxy considers correct. For instance, while many people
may judge that, in a fake barn case, the character knows that she is seeing a
real barn under standard experimental-philosophy conditions (Colaço et al.,
2014), they would come to the opposite conclusion if they considered the
fake barn case reectively, or so proponents of the reection defense assume.
To determine whether the judgments made in response to a case result
from the process of careful reection (thinly understood), we looked at two
distinct types of properties: dispositional qualities pertaining to the subject
responding to thought experiments and circumstances pertaining to the
process of deliberation itself; careful reection can either be fostered by an
inherent inclination to engage in careful analytic thinking or else by appro-
priate conditions of deliberation. So, our strategy was to examine whether
the judgments of people disposed to make reective judgments or the
judgments of people primed to engage in deliberation differ from the
judgments made under conditions standardly employed by experimental
philosophers.
One way to measure peoples disposition to reection is the Need for
Cognition (NFC) test (Cacioppo and Petty, 1982; Cacioppo et al., 1986).
Some individuals are naturally drawn to complex analytic-thinking tasks
and might thus manifest the necessary care and reection required for
reective judgments. An alternative measure that targets much the same
dispositional quality is the Cognitive Reection Test (Frederick, 2005; Toplak
et al., 2011) or CRT for short. Previous empirical studies using the NFC and
CRT (Weinberg et al. 2012; Gerken and Beebe, 2016) found little support for
the reection defense; neither high NFC nor high CRT scores correlated
with decreased sensitivity to distortive factors such as contextual priming,
print font, or presentation order.
Pinillos et al. (2011) use the CRT as a proxy to measure general intelligence,and nd that
those who display higher general intelligence are less likely to exhibit the Knobe Effect(124).
However, as long as one does not defend a bias account of the Knobe Effect, according to which
peoples judgments of intentionality are systematically distorted by outcome valence, the
ndings of Pinillos and colleagues do not constitute evidence for the reection defense.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
264  , ç, .
In our experiments, we employed a third standard psychological questionnaire
to measure peoplesdispositiontoreection, namely the Rational-Experiential
Inventory or REI (Epstein et al., 1996; Pacini and Epstein, 1999). Subjects on
the rationalend of the spectrum typically manifest an increased ability to
think logically and analytically; those on the experientialend of the spec-
trum manifest a stronger reliance on and enjoyment of feelings and intu-
itions in making decisions(Pacini and Epstein, 1999, 974). Differently put,
rationalsubjects are more prone to analytic cognition, experientialsub-
jects to more intuition-driven, cognitively less effortful cognition.
Consistent with the studies cited above, we proposed to operationalize the
distinction between reective and unreective judgments in terms of the
rational/experiential distinction developed by Epstein and colleagues. If
people who are reective as measured by yet a third standard psychological
measure (in addition to the NFC scale and the CRT already used by
Gonnerman et al., 2011; Weinberg et al., 2012; and Gerken and Beebe,
2016) do not differ from people who are unreective, then this would be
evidence that reection does not change the judgments people make in
response to cases.
Naturally, it could be that the Rational-Experiential Inventory fails to
really measure peoples tendency to engage in reective deliberation, even
thinly understood, or that these people fail to act on their tendency in our
studies. To address these concerns, we needed to look at other ways of
determining whether people are reporting reective judgments. A second
way to distinguish reective from unreective judgments draws on the
circumstances that lead people to engage in reection when making judg-
ments about philosophical cases. Kauppinen (2007, 104) highlights the
importance of such circumstances: Reective judgment can take hard
thinking and time, and the attempt could be thwarted by passions or
loss of interest, while there is a general requirement to think through the
implications of individual judgementsa hasty judgement . . . will not count
as ones robust intuition about the case.”⁶ We attempted to encourage
careful reective processes by means of four standard experimental manipu-
lations familiar from social psychology and experimental economics: forced
delay, nancial incentive, response justication via provision of reasons, and
priming of analytic cognition.
Discussion of the circumstances that lead to reection are also present in Deweys (1910)
ve steps of reective thought: He notes that one must take time to deliberate on a case, rather
than prematurely accepting the conclusion at which one arrives (734).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   265
In the forced-delay condition, participants were encouraged to read the
vignette slowly, carefully, and to think about possible variations of the
scenario. They could only proceed to a screen registering their answer
after a certain delay, which varied from 40 to 60 seconds depending on the
word count of the vignettes. Delay manipulations are frequently used in
social-psychological research; the speed/accuracy trade-off is one of the
most well-studied and pervasive effects in human judgment, perception,
and decision making. Slower responses tend to correlate positively with
improved accuracy and are less susceptible to biases or other distorting
factors.Forced delay has been applied in various kinds of experiments.
Rand et al. (2012), for example, compare peoples level of altruistic behavior
in a one-shot public good game with and without a time delay, stating that in
the former condition decisions are expected to be driven more by reec-
tion(428). Rand and colleagues nd that people become less altruistic in
the latter condition, and conclude that intuition supports cooperation in
social dilemmas, and that reection can undermine these cooperative
impulses(427). In their fourth experiment, Pizarro et al. (2003) do not
impose a delay on participantsanswers, but they asked participants in
the rational-instructions condition to make these judgments from (. . .)
a deliberative perspective (i.e., my most rational, objective judgment is
that . . .)(657), which is similar to the instructions we used. Pizarro and
colleagues found that the moral assessment of causally deviant acts
changes when people are asked to judge from this rational, objective
perspective.
In the nancial-incentive condition, participants were promised double
compensation in case they got the answer right,which was intended to
encourage careful reection. All participants in this condition received extra
compensation independently of the answer chosen. Hertwig and Ortmann
(2001) survey studies in experimental economics that invoke nancial
incentives and conclude that in certain areas—“in particular, research on
judgement and decision making(395)such incentives lead to conver-
gence of the data toward the performance criterion and reduction of the
datas variance.The authors recommend that nancial incentives, which
are common practice in experimental economics, be used more widely in
psychological studies so as to obtain more reliable and robust data. Camerer
and Hogarths (1999) literature review also shows that while incentivizing
See Garrett (1922); Hick (1952); Ollman (1966); Schouten and Bekker (1967); Pachella
(1973); Wickelgren (1977); Ratcliff and Rouder (1998); Forstmann et al. (2008).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
266  , ç, .
participants nancially does not always improve the rational standing of
their decision and judgment, it actually improves it in tasks similar to
making a judgment in response to a thought experiment.
In the reasons condition, the vignette and questions were preceded by a
screen which instructed participants that they would have to provide
detailed explanations of their answers. The aim of this manipulation con-
sisted in fostering an increased sensitivity to rational justication of the
chosen response. A large literature suggests that, in many cases, increasing
accountability by asking participants to justify their judgment or decision
improves the rational standing of these. For instance, Koriat et al. (1980)
nd that requiring participants to list reasons for and against each of the
alternatives prior to choosing an answerreduces the overcondence bias
(see also, e.g., the reduction of the sunk-cost fallacy in Simonson and Nye
1992). In their important literature review, Lerner and Tetlock (1999)
conclude that [w]hen participants expect to justify their judgments [. . .]
[they tend to] (a) survey a wider range of conceivably relevant cues; (b) pay
greater attention to the cues they use; (c) anticipate counter arguments,
weigh their merits relatively impartially, and factor those that pass some
threshold of plausibility into their overall opinion or assessment of the
situation; and (d) gain greater awareness of their cognitive processes by
regularly monitoring the cues that are allowed to inuence judgment and
choice’” (263).
Anal condition made use of analytic priming: Before receiving the
vignettes and questions, participants had to solve a simple mathematical
puzzlea standard procedure to trigger analytic cognition. To our know-
ledge, the puzzle we used has not been employed in the social-psychological
literature, but the procedure of triggering analytic cognition by means of a
mathematical problem is standard practice. Paxton et al. (2012) and Pinillos
et al. (2011) use the CRT, which consists of three simple mathematical
puzzles with counterintuitive answers, to prime reection and reasoning in
their participants before giving them some trolley-style moral dilemmas. In a
study regarding different explanations of the contrast-sensitivity of knowledge
ascriptions, Gerken and Beebe (2016) also employ the CRT. On their view, the
contrast effect of knowledge ascription is due to a bias in focus on selective bits
of evidence. High CRT scores, they hypothesized in ways consistent with the
Increasing accountability, e.g., by reason giving, can also aggravate, rather than attenuate,
certain biases in judgment and decision making (Lerner and Tetlock, 1999). It would be
interesting to see how advocates of the reection defense respond to such ndings.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   267
reection defense, should correlate with lesser susceptibility to the bias, but
they failed to nd any such correlation.
All four manipulations were independent ways to elicit reection during
the process of deliberation. The control condition, in which vignette and
questions were presented without further ado, was intended to be similar to
the characteristics of past empirical research of experimental philosophers,
which have allegedly failed to elicit reective judgments.
Finally, reaction time was measured for all ve conditions to explore
whether people who answered more slowly, presumably because they
reected more before reporting a judgment, answered differently from
those who answered more quickly (and probably unreectively).
Our choice of cases was guided by the following four considerations. First,
they should have received widespread attention. Second, there should be
relatively little controversy among professional philosophers about what the
correctresponse is. As the advocates of the reection defense make plain,
not just any change in judgment occasioned is welcome: They expect
reection to foster increased alignment with the responses favored by
professional philosophers, at least if there is a consensus. Third, in order
to assess whether encouraging reection leads people to give responses
aligned to philosophers, the cases must have elicited some disagreement
among lay people (in light of past research). Finally, the cases must be drawn
from several areas of philosophy. Overall, we chose four scenarios compris-
ing inuential classics and more recent cases. Since all of them are rather
well-known, we will conne ourselves to brief summaries here.
The rst vignette was a Gettier case, an adaptation of Russells (1948)
well-known Clock scenario: Wanda reads the time off a clock at the train
station. This clock has been broken for days, yet happens to display the
correct time when Wanda looks at it. Philosophers by and large agree that
Wanda does not know what time it is (Sartwell, 1992 is an exception), and
Machery et al. (2018) show that lay people are divided about this case:
A surprisingly large proportion ascribe knowledge in this case.
A second vignette focused on the thesis that knowledge entails belief.
Myers-Schulz and Schwitzgebel (2013) have reported astonishing evidence
according to which people are sometimes willing to ascribe knowledge
without ascribing belief (see also Murray et al., 2013). We used Myers-
Schulz and Schwitzgebels scenario, which is an adaptation of Radfords
(1966) famous Queen Elizabeth example. Kate has studied hard for her
history exam; when she faces a question about the year of Queen
Elizabeths death, she blanks, despite the fact that she has prepared the
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
268  , ç, .
answer and recited it to a friend. Eventually, Kate settles on a precise year
without much conviction1603which is the correct response. In a
between-subjects design, participants receiving the rst condition were
asked whether Kate believed Elizabeth died in 1603; in the second condition,
they were asked whether Kate knew Elizabeth died in 1603. Most philo-
sophers hold that knowledge entails belief (but see Radford, 1966; Williams,
1973), but Myers-Schulz and Schwitzgebel (2013) as well as Murray et al.
(2013) suggest that many lay people are willing in some circumstances to
ascribe knowledge while denying belief.
The third experiment explored the epistemic side-effect effector
ESEE. Beebe and Buckwalter (2010) report that knowledge ascriptions
regarding side effects are sensitive to the latters general desirability.
Beebe (2013) has produced similar data for belief ascriptions.
We used a scenario from Beebe and Jensen (2012) inspired by Knobes
(2003) inuential case: The CEO of a movie studio is approached by his vice-
president who suggests implementing a new policy. The new policy would
increase prots and make the movies better or worse from an artistic
standpoint. The CEO replies that he does not care about the artistic qualities
of the movies; the policy is implemented and the vice-presidents predictions
are borne out. The question asked whether the CEO knew or believed the
new policy would make the lms better or worse from an artistic standpoint.
To our knowledge, few philosophers, if any, think that the proper applica-
tion of the concepts of knowledge and belief is sensitive to desirability; by
contrast, the extensive body of research on the ESEE suggests that for many
lay people the ascription of knowledge and belief is sensitive to this factor.
The fourth and nal vignette was an adaptation of Kripkes (1972) Gödel
case, drawn from Machery et al. (2004). John has learned in school that a
man called Gödelproved the incompleteness theorem, but it turns out that
the proof was in fact accomplished by Gödels friend, Schmidt. The question
asked whether the name Gödelrefers to the man who proved the incom-
pleteness theorem or the man who got hold of the manuscript and claimed
credit for it. Nearly all philosophers share Kripkes judgment that Gödel
refers to the man originally called in the scenario Gödel,but extensive
research suggests that many Americans report the opposite judgment
(Machery et al., 2004, 2010, 2017).
See also Beebe and Jensen (2012); Beebe and Shea (2013); Dalbauer and Hergovich (2013);
Buckwalter (2014); Turri (2014); Beebe (2016); Kneer (2018).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   269
4. Experiment 1
4.1 Participants and materials
Participants were recruited on Amazon Mechanical Turk in exchange for a
small compensation (0.2 USD).¹Data sets from 60 participants failing a
general attention test or a vignette-specic comprehension test were dis-
carded. The nal sample consisted of 179 respondents (male: 44.7%; age
M=35 years, SD=12 years; age range: 1869).
The rst experiment used the Clock vignette; participants were randomly
assigned to one of the ve conditions described above: Control, Delay (40
seconds), Incentive, Reasons, Priming. Response times were collected for all
ve conditions. Having responded to the target questions, all participants
completed a 10-item version of Epsteins Rational-Experiential Inventory
and a demographic questionnaire.
4.2 Results and discussion
4.2.1 Main results
A logistic regression was performed to ascertain the effect of condition on
the likelihood that participants judge that the character does not have know-
ledge. The logistic regression model was not statistically signicant, χ²(4)
=4.35, p=.36. The model only explained 3.2% (Nagelkerke ) of the variance
in participantsanswers and correctly classied only 58.7% of the data points.
With standard assumptions of α=.05 and a moderate effect size (w=.3), the
power of our χ²-test is very high (>.91); power remains high (>.7) for smaller
effect sizes (w.23), but is low for small effect sizes (Faul etal., 2007).
Figure 9.1 presents the proportion of does not knowanswers for the ve
conditions. Hence, we failed to nd any evidence that encouraging careful
reection makes a difference to peoples judgments about the Clock case.
4.2.2 Response time
We also examined whether people who answer more slowly answer differ-
ently, excluding participants in the Delay condition. Averaging across the
¹For all ve experiments, the compensation was doubled for those participants in the
nancial incentive condition (independently of whether their response actually t the philo-
sophical consensus or not).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
270  , ç, .
four other conditions, we did not nd any evidence that slower participants
answer differently (r(141)=.02, p=.85).¹¹ Figure 9.2 reports the proportion of
answers in line with philosophersconsensual judgment for the 50% faster
0
10
20
30
40
50
60
70
80
90
100
Control Delay Incentive Reasons Priming
Percentage of “Does not know” responses
Figure 9.1 Percentages of participants who deny knowledge in the 5 conditions
of Experiment 1
Note: bars: 95% condence intervals.
0
10
20
30
40
50
60
70
80
90
100
Gettier:
She doesn’t know
KwoB: Knowledge KwoB: Belief Gödel case:
Kripkean answer
Percentage
Slow Fast
Figure 9.2 Percentage of participants responding She does not knowin
Experiment 1 (Gettier case), She knowsin the knowledge condition of
Experiment 3, She believesin the belief condition of Experiment 3, and giving
a Kripkean response in Experiment 5
Note: bars: 95% condence intervals.
¹¹ The results are similar if one excludes the reaction times two standard deviations below
and above the mean RT (r(141)=.07, p=.55).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   271
and 50% slower participants in the Clock case (Does not know) and two
other cases with categorical data of the following experiments.
The results are similar when one looks at each condition (including
Delay) separately (Control: r(39)=.04, p=.80; Delay: r(38)=.24, p=.15;
Incentive: r(36)=.16, p=.36; Reasons: r(32)=.04, p=.83; Priming: r(34)
=.05, p=.79). Thus, we failed to nd any evidence that people who answer
more slowly, possibly because they reect about the case, answer differently.
4.2.3 Analytic thinking
In addition, we examined whether people who report a preference for
analytic thinking answer differently. Averaging across the ve conditions,
we did not nd any evidence that REI scores predict participantsresponse
to the Clock case (r(179)=.12, p=.12). Figure 9.3 reports the proportion of
answers in line with philosophersjudgment for the 50% most reective and
50% least reective participants for the Clock case, as well as two cases used
in the other experiments with categorical data. The results are similar when
one looks at each condition separately (Control: r(39)=.18, p=.27; Delay: r
(38)=.31, p=.06; Incentive: r(36)=.05, p=.78; Reasons: r(32)=.16, p=.37;
Priming: r(34)=.03, p=.85). So, there is no evidence that people who have a
preference for thinking answer differently from people who dont have such
preference.
0
10
20
30
40
50
60
70
80
90
100
Gettier:
She doesn’t know
KwoB: Knowledge KwoB: Belief Gödel case:
Kripkean answer
Percentage
Low REI High REI
Figure 9.3 Comparison of more reective and less reective participants
responding She does not knowin Experiment 1 (Gettier case), She knowsin
the knowledge condition of Experiment 3, She believesin the belief condition
of Experiment 3, and giving a Kripkean response in Experiment 5
Note: bars: 95% condence intervals.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
272  , ç, .
Note that in contrast to other Gettier cases (Machery et al., 2017), lay
people tend not to share philosophersjudgment that the protagonist in a
Clock case does not know the relevant proposition. This is in line with
previous studies examining the Clock case (Machery et al., 2018).
5. Experiment 2: Follow-Up to Experiment 1
While we failed to nd any signicant result with our manipulations, two of
the manipulations of Experiment 1 seemed to lead participants to agree
more with philosophers: asking participants to provide reasons for their
answer and providing monetary incentives to think things through in detail.
To explore these results further, we replicated the Incentive, Reasons, and
Control conditions of Experiment 1 with a larger sample size.
5.1 Participants and materials
Participants were recruited on Amazon Mechanical Turk in exchange for a
small compensation (0.3 USD). Datasets from 16 participants who failed the
attention check or answered the comprehension question incorrectly were
removed. Our nal sample consisted of 264 respondents (male: 39.0%; age
M=40 years, SD=13 years; age range: 1973). Participants were randomly
assigned to the Control, Incentive, or Reasons conditions. The vignette,
instructions, and procedure were otherwise identical to those of Experiment 1.
5.2 Results and discussion
5.2.1 Main results
Participants in the three conditions answered differently (χ²(2, 264)=8.2,
p=.017),¹² but the two manipulations did not lead participants to agree with
philosophers about the Clock case; rather, they led them to judge that the character
in the Clock case knows that it is 3:00 p.m. Figure 9.4 visualizes the results.
One may be surprised by the difference between the results in the
Incentive and Reasons conditions in this study and in Experiment 1. We
¹² Control vs. Incentive: χ²(1, 186)=8.1, p=.004; Control vs. Reasons: χ²(1, 171)=2.1, p=.15.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   273
do not have a ready explanation, except for the fact it may be simply random
sampling variation.
5.2.2 Response time
We also examined whether people who answer more slowly answer differ-
ently. Averaging across the three conditions, we did not nd any evidence
that slower participants do so (r(264)=.02, p=.77; see Figure 9.2).¹³ The
results are similar when one looks at each condition separately (Control: r
(91)=.05, p=.64; Incentive: r(93)=0.0, p=.99; Reasons: r(78)=.10, p=.38).
Thus, as was the case in Experiment 1, we failed to nd any evidence that
people who answer more slowly answer differently.
5.2.3 Analytic thinking
In addition, we examined again whether people who report a preference for
thinking answer differently. Averaging across the three conditions, we
did not nd any evidence that REI scores predict participantsresponse to
the Clock case (r(264)=.09 , p=.16; see Figure 9.3). The results are similar
in the Incentive condition (r(93)=.06, p=.56) and the Reasons condition
(r(78)=.11, p=.32). By contrast, participants in the control condition with
higher REI scores (participants who report a taste for thinking) were more
0
10
20
30
40
50
60
70
80
90
100
Control Incentive Reasons
Percentage of “Does not know” responses
Figure 9.4 Percentages of participants who judge that Wanda does not know the
time in the 3 conditions of Experiment 2
Note: bars: 95% condence intervals.
¹³ The results are similar if one excludes the reaction times two standard deviations below
and above the mean RT (r(258)=.01, p=.92).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
274  , ç, .
likely to disagree with philosophers and judge that the character in the
Gettier case knows that it is 3:00 p.m. (r(93)=.21, p=.05). Again, there is
little evidence that people who have a preference for thinking answer
differently from people who dont have such preference; and when they
do, the evidence suggests they tend to disagree more with philosophers.
Differently put, the inuence of increased reection is limited, and where it
does produce a difference, it decreases alignment with textbook epistemology.
6. Experiment 3: Knowledge and Belief
So far, the results suggest that reective judgments do not foster increased
alignment with philosophical orthodoxy. In fact, reection does not have
much of an inuence in the rst place. But our results so far are limited to a
single case drawn from one area of philosophy (the Clock case in epistem-
ology). The following studies examine whether our ndings generalize to
other thought experiments and other areas of philosophy, starting with
another case in epistemology. Experiment 3 focuses on the question of
whether knowledge entails belief. The issue was rst raised by Radford
(1966), whose central thought experiment, Queen Elizabeth (described in
Section 3), was previously tested by Myers-Schulz and Schwitzgebel (2013).
6.1 Participants and materials
Participants were recruited on Amazon Mechanical Turk in exchange for a
small compensation (0.2 USD). Datasets from 233 participants who failed
the attention check or answered the comprehension question incorrectly
were removed. Our nal sample consisted of 385 respondents (male: 35.6%;
age M=35 years, SD=16 years; age range: 1883).
Our study had a 5x2 between-subjects design. Participants were randomly
assigned to one of the ten conditions invoking ve manipulations (Control,
Delay, Incentive, Reasons, and Priming) and two epistemic states (know-
ledge and belief). Participants in the Knowledge conditions had to decide
whether the character in the vignette knew that Queen Elizabeth died in
1603, participants in the Belief conditions whether she believed it. The
instructions and procedures were identical to those of Experiment 1. The
only difference consisted in the delay in the Delay condition. Participants
had to wait 60 seconds before they could register their response, which we
estimated was twice as long as it would take to read the case leisurely.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   275
6.2 Results and discussion
6.2.1 Main results
A logistic regression was performed to ascertain the effect of our manipu-
lations and of the Knowledge vs. Belief factor on the probability that
participants judge that the character knows or believes that Queen Elizabeth
died in 1603. The logistic regression model was statistically signicant, χ²(4)
=112.5, p<.001. It explained 33.8% (Nagelkerke ) of the variance in partici-
pantsanswers and correctly classied 74.5% of the data points. The Knowledge
vs. Belief factor was statistically signicant: Participants were signicantly
less likely to answer that the character believes that Queen Elizabeth died in
1603 than they were likely to answer that she knows that Queen Elizabeth
died in 1603 (Wald=85.9, p<.001). By contrast, the manipulations were not
statistically signicant (Wald=1.1, p=.86). With standard assumptions of α=.05
and a moderate effect size (w=.3), the power of our χ²-test is very high (>.99);
power remains high (>.7) for small to moderate effectsizes (w.16), but is low
for small effect sizes (Faul etal., 2007). Figure9.5 presents the proportion of
knowsand believesanswer for the ve conditions.
In our experiment, we replicated the results reported by Myers-Schulz
and Schwitzgebel (2013), which cast doubt on the entailment thesis (but see
Rose and Schaffer, 2013; Buckwalter et al., 2015). We failed to nd any
evidence that compelling people to take their time in answering, telling
them in advance that they will have to justify their answers, paying them
to be accurate, or priming them to embrace an analytic cognitive style make
any difference in their ascription of either knowledge or belief to the
character in Schwitzgebel and Myers-Schulzs case.
0
10
20
30
40
50
60
70
80
90
100
Control Delay Incentive Reasons Priming
Percentage of responses
“She knew” “She believed”
Figure 9.5 Percentages of knowledge ascription and belief ascription in the 5
conditions of Experiment 3
Note: bars: 95% condence intervals.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
276  , ç, .
6.2.2 Response time
We also examined whether people who answer more slowly answer differ-
ently, excluding participants in the Delay condition. Averaging across the
four other conditions, we did not nd any evidence that in the Knowledge
condition or in the Belief condition slower participants answer differently
(respectively, r(125)=.02, p=.81 and r(191)=.02, p=.75; see Figure 9.2).¹The
results are largely similar when one looks at each condition separately (Belief
condition: Control: r(45)=.112, p=.44; Delay: r(44)=.26, p=.09; Incentive: r
(52)=.14, p=.33; Reasons: r(44)=.17, p=.28; Priming: r(50)=.001, p=.99;
Knowledge condition: Control: r(38)=.18, p=.28; Delay: r(25)=.26, p=.21;
Incentive: r(28)=.29, p=.14; Reasons: r(31)=.06, p=.77; Priming: r(28)
=.16 p=.41). Thus, we failed to nd any evidence that people answer
differently when, on their own, they take their time in considering
Schwitzgebel and Myers-Schulzs case and in providing an answer.
6.2.3 Analytic thinking
Finally, we examined whether people who report a preference for thinking
analytically answer differently. In the Knowledge condition, averaging across
the ve conditions, we did not nd any evidence that REI scores predict
participantsresponse to the target question (r(150)=.07, p=.42; see
Figure 9.3). In the Belief condition, averaging across the ve conditions,
we did not nd any evidence that REI scores predict participantsresponse
(r(235)=.06, p=.37; see Figure 9.3). For the individual conditions, none of the
Bonferroni-corrected p-values attained signicance (all ps>.1). The results are
largely similar when one looks at uncorrected p-values. Belief condition:
Control: r(45)=.01, p=.93; Delay: r(44)=.33, p=.03; Incentive: r(52)=.02,
p=.90; Reasons: r(44)=.08, p=.62; Priming: r(50)=.01, p=.92. Knowledge
condition: Control: r(38)=.09, p=.61; Delay: r(25)=.20, p=.34; Incentive:
r(28)=.37, p=.05; Reasons: r(31)=.02, p=.91; Priming: r(28)=.17 p=.40.
In two sub-conditionsIncentive/Knowledge and Delay/Beliefthe
uncorrected p-values just about reach signicance. Given that none of the
overall p-values, or any of the individual Bonferroni-corrected p-values
attain signicance, this clearly does not constitute systematic evidence that
people drawn to more analytic thinking answer differently from those who
are not so disposed.
¹The results are similar if one excludes the reaction times two standard deviations below
and above the mean RT (r(145)=.07, p=.40 and r(288)=.09, p=.19).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   277
7. Experiment 4: Epistemic Side-Effect Effect
Experiment 4 investigates further whether lay peoplesreective judgments
in reaction to epistemological cases vary from the judgments reported so far
by experimental philosophers. In this case we focused on the asymmetric
ascriptions of knowledge and belief regarding differently desirable side
effects. The ndings of previous studies by Beebe and Buckwalter (2010)
and Beebe (2013) are just as much at odds with standard epistemological
doctrine as those reported by Myers-Schulz and Schwitzgebel (2013).
7.1 Participants and materials
Participants were recruited on Amazon Mechanical Turk in exchange for a
small compensation (0.2 USD). The datasets of 237 participants who failed
the attention check or answered the comprehension question incorrectly
were removed. Our nal sample consisted of 701 respondents (male: 41.7%;
age M=35 years, SD=12 years; age range: 1873).
Our study used the Movie Studio scenario (described in Section 3), and
was a 5x2x2 between-subjects design. Each participant was assigned to one
of the 20 conditions differing with respect to manipulation (Control, Delay,
Incentive, Reasons, Priming), desirability of the side effect (better movies,
worse movies), and epistemic state (knowledge, belief). Answers were col-
lected on a 7-point Likert scale; participants reported to what extent they
agreed or disagreed that the protagonist believed or knew that the newly
adapted policy would make the movies better or worse from an artistic
standpoint. The instructions and procedures were identical to those of
Experiment 1. The only difference consisted in the delay in the Delay
condition. Participants had to wait 40 seconds before they could register
their response, which we estimated was twice as long as it would take to read
the case leisurely.
7.2 Results and discussion
7.2.1 Main results
An ANOVA with the ve manipulations, the Better vs. Worse factor, and
the Knowledge vs. Belief factor was performed to ascertain the effect of our
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
278  , ç, .
manipulations on the probability that participants judge that the character
knows or believes that the movies were made worse or better. The Better vs.
Worse factor was signicant (F(1, 681)=262.76, p<.001, η²=.28) as was the
Knowledge vs. Belief factor (F(1, 681)=159.87, p<.001, η²=.07). Participants
are more likely to ascribe knowledge and belief in the Worse condition than
in the Better condition, and they are more likely to ascribe knowledge than
belief. By contrast, our manipulations did not produce any signicant effect
(F(1, 681)=.35, p=.85). With standard assumptions of α=.05 and a moderate
effect size (f=.25), the power of an F-test is very high (>.99); assuming a
small effect size (f=.10), power (.75) is still high (Faul et al., 2007). Figure 9.6
presents the means of the knowsand believesanswers for the ve
conditions for the worse and better conditions.
Thus, we replicated Beebe and Buckwalters (2010) and Beebes (2013)
ndings: The desirability of an action inuences the ascription of knowledge
and belief. In addition, we failed to nd any evidence that compelling people
to take their time in answering, telling them in advance that they will have to
justify their answers, paying them to be accurate, or priming them to
embrace a reective cognitive style make any difference in their answers,
and it is likely that if there were small or moderate effects to be found, we
would have found them.
1
2
3
4
5
6
7
Control Delay Incentive Reasons Priming Control Delay Incentive Reasons Priming
Belief Knowledge
Mean respons e
Better Wors e
Figure 9.6 Mean agreement with the claim that the director knew or believed
the movies would become better or worse from an artistic standpoint in the 20
conditions of Experiment 4
Note: bars: 95% condence intervals.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   279
7.2.2 Response time
We also examined whether people who answer more slowly answer differ-
ently, excluding participants in the Delay condition. Averaging across the
four other conditions, we did not nd any evidence that they do (Harm and
Knowledge conditions: r(142)=.06, p=.51; Help and Knowledge conditions: r
(136)=.02, p=.80; Harm and Belief conditions: r(145)=.05, p=.56; Help
and Belief conditions: r(140)=.07, p=.43).¹Figures 9.7a (belief) and 9.7b
(knowledge) report the scatterplot for these four conditions.
7
6
5
4
3
Belief Ascription
2
1
0204060
Response Times
80 100
Outcome
Better
Wor s e
(a)
Figure 9.7a Scatterplot and regression lines for the ascription of belief in
experiment 4 as a function of participantsresponse time
Note: data points larger than 2 SD above the mean are excluded.
¹The results are similar if one excludes the reaction times two standard deviations below
and above the mean RT (Help and Knowledge conditions: r(134)=.07, p=.42; Harm and Belief
conditions: r(140)=.07, p=.39; Help and Belief conditions: r(139)=.07, p=.39), except for the
Harm and Knowledge conditions: r(139)=.20, p=.02.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
280  , ç, .
In sum, we failed to nd any evidence that people who, on their own, take
their time in considering the relevant case and in answering judge differently
from people who dont.
7.2.3 Analytic thinking
Finally, we examined whether people who report a preference for thinking
answer differently. Averaging across the four other conditions, we did not
nd any evidence that participants with higher REI scores answer differently
(Harm and Knowledge conditions: r(231)=.09, p=.23; Help and
Knowledge conditions: r(119)=.02, p=.77; Harm and Belief conditions: r
(180)=.02, p=.77; Help and Belief conditions: r(177)=.05, p=.53).
Figures 9.8a (belief) and 9.8b (knowledge) report the scatterplot for these
four conditions.
7
(b)
6
5
4
3
Knowledge Ascription
2
1
0204060
Response Times
80 100
Outcome
Better
Wor s e
Figure 9.7b Scatterplot and regression lines for the ascription of knowledge in
experiment 4 as a function of participantsresponse time
Note: data points larger than 2 SD above the mean are excluded.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   281
So, there is no evidence that people who have a preference for analytic
thinking answer differently from people who dont have such a preference.
8. Experiment 5: The Gödel Case
Our nal experiment examined whether the ndings reported so far gener-
alize to another area of philosophy: the philosophy of language. Following
Kripke (1972), most philosophers assume that in the Gödel case the proper
name Gödelrefers to the man who stole the theorem. Previous work
suggests, however, that for a substantial proportion of Americans (between
25% and 40%), the Gödel case elicits judgments more in line with the
descriptivist theory of reference (Machery et al., 2004, 2010, 2017).
Experiment 5 examined whether lay people agree more with philosophers
when they report their reective judgment.
7
6
5
4
3
Belief Ascription
2
1
123
REI
45
Outcome
Better
Wor s e
(a)
Figure 9.8a Scatterplot and regression lines for the ascription of belief in
experiment 4 as a function of participantsREI scores
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
282  , ç, .
8.1 Participants and materials
Participants were recruited on Amazon Mechanical Turk in exchange for a
small compensation (0.5 USD). The datasets of 80 participants who failed
the attention check, answered the comprehension question incorrectly, or
attempted to complete the survey multiple times (as evidenced by their IP
address) were removed. Our nal sample consisted of 274 respondents
(male: 54.4%; age M=43 years, SD=14 years; age range: 2179).
All participants were randomly assigned to one of ve conditions:
Control, Delay, Incentive, Reasons, or Priming. The instructions and pro-
cedures were identical to those of Experiment 1. Participants had to decide
whether the protagonist in the case is talking about the man who stole the
theorem (Kripkean answer) or about the man who discovered the theorem
(descriptivist answer). The only difference consisted in the delay in the
7
6
5
4
3
Knowledge Ascription
2
1
123
REI
45
Outcome
Better
Wor s e
(b)
Figure 9.8b Scatterplot and regression lines for the ascription of knowledge in
experiment 4 as a function of participantsREI scores
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   283
Delay condition. Participants had to wait 60 seconds, which we estimated
was twice as long as it would take to read the Gödel case leisurely.
8.2 Results and discussion
8.2.1 Main results
A logistic regression was performed to ascertain the effect of our manipu-
lations on the probability that participants judge that the character is talking
about the character originally called Gödelwhen he uses the proper name
Gödel.The logistic regression model was not statistically signicant, χ²(4)
=5.10, p=.28. The model explained 2.5% (Nagelkerke ) of the variance in
participantsanswers and correctly classied 60.9% of the data points. The
power of the χ² test, assuming a moderate effect size (w=.3) was very high
(.99); power remains high (>.7) for small to moderate effect sizes (w.19),
but is low for small effect sizes (Faul et al., 2007). We also note that
all the manipulations decreased the proportion of Kripkean responses,
although not signicantly so. Figure 9.9 presents the proportion of the
Gödelanswer for the ve conditions.
Thus, we failed to nd any evidence that compelling people to take their
time in answering, telling people in advance that they will have to justify
0
10
20
30
40
50
60
70
80
90
100
Control ReasonsIncentiveDelay Priming
Percentage of Kripkean responses
Figure 9.9 Percentages of Kripkean response in the 5 conditions of
Experiment 5
Note: bars: 95% condence intervals.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
284  , ç, .
their answers, paying them to be accurate, or priming them to embrace a
reective analytic style improved peoples responses to the Gödel case, and it
is likely that if there were a moderate or even a small to moderate effect to be
found, we would have found it.
8.2.2 Response time
We also examined whether people answer differently when they answer
more slowly, excluding participants in the Delay condition. Averaging
across the four other conditions, we did not nd any evidence that they
do (r(219)=.007, p=.92; see Figure 9.2).¹None of the uncorrected
(and a fortiori Bonferroni-corrected) p-values for the individual conditions
attained signicance: Control: r(54)=.10, p=.48; Delay: r(55)=.12, p=.39;
Incentive: r(59)=.07, p=.62; Reasons: r(53)=.045 p=.76; Priming: r(53)
=.08, p=.57. Thus, we failed to nd any evidence that people who, on their
own, take their time in considering the Gödel case and in answering answer
differently from people who dont.
8.2.3 Analytic thinking
In addition, we examined whether people who report a preference for
thinking are more likely to agree with philosophers. Averaging across
the ve conditions, we did not nd any evidence that REI scores
predict participantsresponse to the Gödel case (r(274)=.04, p=.512;
see Figure 9.3). None of the Bonferroni-corrected p-values for the indi-
vidual conditions attained signicance (all ps>.05). Except for the priming
condition, the same held for uncorrected p-values: Control: r(54)=.14,
p=.32; Delay: r(55)=.23, p=.10; Incentive: r(59)=.04, p=.75; Reasons:
r(53)=.11 p=.45; Priming: r(53)=.33, p=.01. So, there is no systematic
evidence that people who have a preference for analytic thinking
agree more with philosophers about the Gödel case than people who
dont have such a preference. While suggestive, the correlation in the
Priming condition should not give solace to philosophers since, if it isnt
a mere accident, it goes in the wrong direction: People who are less
reective are more likely to give the response in line with philosophers
judgments.
¹The results are similar if one excludes the reaction times two standard deviations below
and above the mean RT (r(213)=.003, p=.97).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   285
9. Discussion
9.1 Meta-philosophical implications
of the experimental studies
The reection defense assumes that increased reection inuences peoples
responses to philosophical cases and improves them, in the sense of bringing
them more into alignment with philosophical orthodoxy. Since experimen-
tal philosophers do not ordinarily encourage extensive reection character-
istic of the philosophical method, the data thus collectedor so the
argument goesis of no use. We put the empirical adequacy of the reec-
tion defense to the test with respect to four well-known thought experi-
ments.¹For each, philosophers agree as to what constitutes the correct
response.
Focusing on a thin conception of reection and reective judgment, we
have examined two types of factors that might be conducive to reective
deliberation: the circumstances under which the deliberative process takes
place, and individual dispositions to engage in careful reection. As regards
the former, we adapted a variety of standard manipulations from social
psychology and experimental economics to encourage diligent reection:
time delay, nancial incentives, reason specication, and analytic priming.
Out of the 18 conditions with manipulations contrasted with the respective
control conditions across ve experiments, we could only detect a signicant
differenceor some sign of inuence”—in two comparisons: In Experiment
2, nancial incentives and reason specication somewhat changed responses
vis-à-vis the control condition, yet in both cases the increased reection
produced results that were less in alignment with philosophical theory. As
regards circumstances then, the Inuence and Alignment assumption has
proven a nonstarter: In nearly all conditions we failed to detect an inuence of
reection in the rst place, and in the few cases where an inuence was
detected, it decreased alignment.
Concerning individual dispositions to engage in reection, we contrasted
the responses of participants who, out of their own free will, spent more time
with the task with those who responded quickly on the one hand. We
couldnt detect a signicant difference in slow vs. fast responses for a single
condition of any of the ve experiments. We also explored whether people
¹We also note, though do not elaborate on this point, that we replicated all the original
experimental-philosophy studies, in line with Cova et al. (2021).
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
286  , ç, .
with a penchant for more analytic thinking (i.e., subjects on the rational
end of the REI) respond differently from those who tend toward a more
intuitive thinking style (those on the experientialend of the REI spec-
trum). Averaging across conditions, we did not nd a signicant difference
in any of the ve experiments. The Bonferroni corrected p-values for each of
the 23 conditions were also nonsignicant. In short, participants who have a
natural disposition to engage in more analytic thinking responded the same
as those who do not. These results are consistent with the ndings reported
in previous studies, which attempted to measure the disposition to engage in
reection by means of the NFC inventory or the CRT.
Taking stock: In a series of ve experimental studies with a total of over
1800 individual subjects, we found that neither a disposition to engage in
reection nor circumstantial factors conducive to reective judgment bring
folk judgments into alignment with philosophical orthodoxy. In nearly all
cases tested, they fail to have an impact entirely. Our studies had sufcient
power to detect medium-sized effects with a very high probability, and small
to medium effects with high probability. We cannot exclude the possibility
of small effects induced by reection. Note, however, that even if those were
to be found, its far from clear that this would make the reection defense
any more convincing. Take, for instance, Radfords uncondent examinee
case, where in the control condition we found more than 80% of the
participants to ascribe knowledge, while a mere 30% ascribed belief. The
difference, defying orthodox epistemology, constitutes a large effect
(h=1.15). Or consider the well documented epistemic side-effect effect: In
the control conditions, the effect size of the divergence between positively
and negatively valenced outcomes was very large for both belief (d=1.01)
and knowledge (d=1.02). Now assume it could be shown that extensive
reection produces small effects in line with philosophical orthodoxy, e.g.,
increasing epistemic state ascriptions somewhat in the positively valenced
Knobe-type cases. The epistemic side-effect effect will still be of at least
moderate size, and it is more likely that they will remain large. The overall
conclusionthat folk judgments frequently differ strongly from philosoph-
ical consensus and that extensive reection does not bring the two into
alignmentremains the same.
At this point, there are two responses available to proponents of the
reection defense: First, they could argue that the reason we did not nd
any effect is that our manipulations are poor means of leading people to
engage in sufciently reective deliberation about philosophical cases even
when reection is thinly construed. Second, they could argue that the thin
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   287
conception of reective judgment that we have been working with here is
not what they had in mind. We do not nd either response compelling. Lets
start with the rst kind of response. Combined with earlier studies, we now
have eight different ways of inducing reection thinly construed, and none
of them seem to support the central presuppositions of the reection
defense, which are either that reection sufciently immunizes judgments
about philosophical cases from sensitivity to allegedly irrelevant factors, or
at least that it changes our judgments about those cases.
The second kind of response is no more compelling than the rst. Of
course, it is perfectly possible that reection more thickly construed might
lead people to change their judgments about philosophical cases, and we
happily admit that we have done nothing to address this possibility. Having
said that, it should be obviously unacceptable to attempt to rebut an
empirical challenge to the way philosophers standardly use the method of
cases by appealing to some unspecied account of reection. A convincing
response to the experimental challenge must explain not only what proper-
ties the judgments studied by experimental philosophers apparently lack,
but also why these properties are important to the way philosophers stand-
ardly employ the method of cases. And here it is important that proponents
of the reection defense do not rely on a bait-and-switch strategy. If
the reection defense is deemed plausible and appealing at all, it is
largely, we submit, because the notion of reection is characterized thinly:
Philosophical arguments, we agree, do not appeal to gut reactionsto cases
or to shots from the hipin response to these, but rather to careful and
reective judgments. However, the intuitive plausibility and appeal of the
reection defense thinly understood do not transfer to versions of the
argument that appeal to thicker characterizations of reection. If its plaus-
ible that only careful, slow, reective judgments about cases are philosophic-
ally relevant, is it equally plausible that only epistemically analytic judgments
about cases are philosophically relevant? Surely not. For one, many deny that
any judgment is epistemically analytic (e.g., Williamson, 2007; Machery,
2017). And even if some are, it is not evident at all that the judgments made
in response to cases are epistemically analytic. Whats the upshot? Proponents
of the reection defense who appeal to thicker characterizations of reection
cant simply trade upon the initial plausibility and appeal of the reection
defense when reection is characterized thinly on pain of engaging in bait-
and-switch. What is required is a detailed characterization of reection
understood in some more substantial way, and clear arguments to the effect
that the method of cases demands this type of thick reection.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
288  , ç, .
Until now, proponents of the reection defense have not provided
compelling arguments for thicker characterizations of the notion of reection.
Furthermore, we submit, compelling arguments will be hard to nd. As noted
in Section 2, an adequate characterization of reection must be consistent
with the way philosophers use thought experiments (the descriptive-
inadequacy problem), but the thicker the characterization, the more likely it
is that it will fall prey to the descriptive-inadequacy problem (Cappelen, 2012;
Machery, 2017, chapter 1). For instance, Kauppinens dialogical conception
of reection fails to capture how philosophers usually judge in response to
cases. There is no doubt that, as Kauppinen insists, philosophers compare
cases and try to identify ways in which particular cases are alike or differ, but
they typically do not do this in the process of making a judgment about these
cases. Rather, having made judgment about several cases, they try to identify
potential reasons that explain their pattern of judgments. When Gettier (1963)
proposes his ten-coin case, he does not compare it to other cases to conclude
that the agent does not know that he has ten coins in his pocket. Nor do his
readers. Rather, once we have judged in response to several cases, wecompare
those judgments to identify the reasons that explain the relevant pattern of
judgments. It is thus implausible that only judgments understood along
Kauppinens dialogical conception of reection are philosophically relevant.
Furthermore, a thick conception of reection must not imply by stipula-
tion that the research done by experimental philosophers is not relevant for
philosophical methodology (the stipulation problem). Differently put, sim-
ply stipulating a concept of reection according to which the kinds of biases
identied by experimental philosophers cannot arise is unhelpful. Instead, it
must be shown by means of arguments or empirical evidence that reection,
understood in some thicker manner, results in judgments that do not fall
prey to such biases, or at least do so at a much lower rate.
9.2 Why doesntreection inuence judgment about cases?
It is surprising that people who are disposed to engage in careful analytic
thinking and people who are primed to engage in reective deliberation do
not judge differently in response to cases than people who read cases under
conditions that are standardly used by experimental philosophers. Why is
that? There are at least two answers to this question.
First, it could be that, contrary to what proponents of the reection defense
assume (Premise 2 of the argument sketched in Section 2), participants in
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   289
experimental-philosophy studies are already engaged, by themselves, in
reective deliberation when they respond to cases under conditions stand-
ardly used by experimental philosophers. If this were the case, then it would
be unsurprising that priming people to be reective or looking at people
disposed to careful thinking would not make any difference, as we found.
While some subjects may, by themselves, engage in reective deliberation
on their own under conditions standardly used by experimental philo-
sophers, we doubt that this is the case for all subjects, and while this
explanation may be partly correct, it is incomplete. The reason is that
many participants respond rather quickly to cases, too quickly for them to
have had the time to engage in careful, reective deliberation.
The second explanation of our surprising result is more radical: Typically,
reection does not change the judgments made in response to cases (for a
similar view about moral judgments, see Haidt 2001, and for a discussion of
the limits of reection, see Kornblith 2010). Rather, it merely leads people to
nd reasons for the judgments they made unreectively in response to these
cases. People who are inclined to engage in analytic thinking are merely
better at nding arguments for the judgments they make about cases; people
who are primed to engage in reection are primed to nd reasons for their
judgments about cases. If nding arguments or justication is the product of
reection, then it is unsurprising that priming people to be reective or
looking at people disposed to careful thinking would not make any differ-
ence, as we found.
One may wonder why reection does not change judgments about
philosophical cases much, while it appears to inuence judgment in the
social-psychological and behavioral-economical literature, allowing people
to overcome their spontaneous answers. We propose to explain the differ-
ence between our ndings and the past research on reection as follows.
The judgments people make in the types of situations examined by social
psychologists and behavioral economists are frequently mistaken by
peoples own lights; when this happens, they tend to change their responses
when given an opportunity to reect. By contrast, when judgments are made
with condence and are not erroneous by participantsown lights, reection
does not inuence judgment much; rather, it leads people to think of
arguments for the judgments independently made.
The explanation proposed in this section, we submit, is the deepest
reason why the reection defense fails. It misunderstands the role of reec-
tion. It assumes that reection would lead us to judge differently in response
to cases instead of, as our results suggest, prompting us to explore reasons,
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
290  , ç, .
arguments, and justications for them, while leaving the judgments
themselves unchanged.
10. Conclusion
In this chapter, we have addressed the reection defense put forward in
response to the challenge against the use of cases inspired by experimental
philosophy. We have shown experimentally that there is no systematic
evidence which suggests that reection, thinly understood, leads people to
respond differently to cases than under standard experimental-philosophy
conditions. This nding undermines the view according to which reective
and unreective judgments about cases differ. Instead, we would like to
suggest, peoples response to thought experiments typically expresses deep-
seated judgments, and reection merely bolsters this judgment by pushing
people to explore potential reasons for their judgment. While it is possible that
the reection defense might appeal to some thicker reection, we see little
reason for optimism. Given that both the expertise defense and the reection
defense have so far proven inadequate, we conclude that philosophers should
take the experimentalists challenge against the use of cases seriously.
Acknowledgments
For very helpful feedback, we would like to thank Justin Sytsma, Joe Ulatowski,
Jonathan Weinberg, the editors, several anonymous referees and the audiences at the
Society of Philosophy and Psychology Meeting (2016), the Buffalo Annual
Experimental Philosophy Conference (2016), XPhi under Quarantine (2020) and
the Guilty Minds Lab Zurich (2020). While working on this chapter, Markus Kneer
was supported by an SNSF Ambizione Grant (# PZ00P1_179912); Joshua Alexander
was supported by a Templeton Foundation Grant (# 15628).
References
Alexander, J. (2012). Experimental Philosophy: An Introduction. Cambridge:
Polity.
Alexander, J. (2016). Philosophical expertise. In J. Sytsma and W. Buckwalter
(eds.), A Companion to Experimental Philosophy (pp. 55567). Malden, MA:
Wiley-Blackwell.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   291
Alexander, J. and Weinberg, J. M. (2007). Analytic epistemology and experi-
mental philosophy. Philosophy Compass,2(1), 5680.
Beebe, J. R. (2013). A Knobe effect for belief ascriptions. Review of Philosophy
and Psychology,4(2), 23558.
Beebe, J. R. (2016). Do bad people know more? Interactions between attributions
of knowledge and blame. Synthese,193(8), 263357.
Beebe, J. R. and Buckwalter, W. (2010). The epistemic side-effect effect. Mind &
Language,25(4), 47498.
Beebe, J. R. and Jensen, M. (2012). Surprising connections between knowledge
and action: the robustness of the epistemic side-effect effect. Philosophical
Psychology,25(5), 689715.
Beebe, J. R. and Shea, J. (2013). Gettierized Knobeeffects. Episteme,10(3), 21940.
Bengson, J. (2013). Experimental attacks on intuitions and answers. Philosophy
and Phenomenological Research,86(3), 495532.
Buckwalter, W. (2014). The mystery of stakes and error in ascriber intuitions. In
J. Beebe (ed.), Advances in Experimental Epistemology (pp. 14574). New
York: Bloomsbury Academic.
Buckwalter, W., Rose, D., and Turri, J. (2015). Belief through thick and thin.
Noûs,49(4), 74875.
Cacioppo, J. T. and Petty, R. E. (1982). The need for cognition. Journal of
Personality and Social Psychology,42(1), 11631.
Cacioppo, J. T., Petty, R. E., Kao, C. F., and Rodriguez, R. (1986). Central and
peripheral routes to persuasion: an individual difference perspective. Journal
of Personality and Social Psychology,51(5), 103243.
Camerer, C. F. and Hogarth, R. M. (1999). The effects of nancial incentives in
experiments: a review and capital-labor-production framework. Journal of
Risk and Uncertainty,19(13), 742.
Cappelen, H. (2012). Philosophy Without Intuitions. Oxford: Oxford University
Press.
Colaço, D., Buckwalter, W., Stich, S., and Machery, E. (2014). Epistemic intu-
itions in fake-barn thought experiments. Episteme,11(2), 199212.
Colaço, D. and Machery, E. (2017). The intuitive is a red herring. Inquiry,60(4),
40319.
Cova, F. et al. (2021). Estimating the reproducibility of experimental philosophy.
Review of Philosophy and Psychology,12,944.
Dalbauer, N. and Hergovich, A. (2013). Is what is worse more likely?the
probabilistic explanation of the epistemic side-effect effect. Review of
Philosophy and Psychology,4(4), 63957.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
292  , ç, .
Deutsch, M. E. (2015). The Myth of the Intuitive: Experimental Philosophy and
Philosophical Method. Cambridge, MA: MIT Press.
Dewey, J. (1910). How We Think. Boston: DC: Heath and Company.
Epstein, S., Pacini, R., Denes-Raj, V., and Heier, H. (1996). Individual differences
in intuitiveexperiential and analyticalrational thinking styles. Journal of
Personality and Social Psychology,71(2), 390405.
Faul, F., Erdfelder, E., Lang, A. G., and Buchner, A. (2007). G* Power 3: a exible
statistical power analysis program for the social, behavioral, and biomedical
sciences. Behavior Research Methods,39(2), 17591.
Forstmann, B. U., Dutilh, G., Brown, S., Neumann, J., Von Cramon, D. Y.,
Ridderinkhof, K. R., and Wagenmakers, E.-J. (2008). Striatum and pre-SMA
facilitate decision-making under time pressure. Proceedings of the National
Academy of Sciences,105(45), 1753842.
Frederick, S. (2005). Cognitive reection and decision making. The Journal of
Economic Perspectives,19(4), 2542.
Garrett, H. E. (1922). A Study of the Relation of Accuracy to Speed (Vol. 8). New
York: Columbia University.
Gerken, M. and Beebe, J. R. (2016). Knowledge in and out of contrast. Noûs,
50(1), 13365.
Gettier, E. L. (1963). Is justied true belief knowledge? Analysis,23(6), 1213.
Gonnerman, C., Reuter, S., and Weinberg, J. M. (2011). More oversensitive
intuitions: print fonts and could choose otherwise. Paper presented at the
108th Annual Meeting of the American Philosophical Association, Central
Division, Minneapolis, MN.
Haidt, J. (2001). The emotional dog and its rational tail: a social intuitionist
approach to moral judgment. Psychological Review,108(4), 81434.
Hannon, M. (2018). Intuitions, reective judgments, and experimental philoso-
phy. Synthese,195(9), 414768.
Hertwig, R. and Ortmann, A. (2001). Experimental practices in economics: a
methodological challenge for psychologists? Behavioral and Brain Sciences,
24(3), 383403.
Hick, W. E. (1952). On the rate of gain of information. Quarterly Journal of
Experimental Psychology,4(1), 1126.
Horvath, J. (2010). How (not) to react to experimental philosophy. Philosophical
Psychology,23(4), 44780.
Kauppinen, A. (2007). The rise and fall of experimental philosophy.
Philosophical Explorations,10(2), 95118.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   293
Kneer, M. (2018). Perspective and epistemic state ascriptions. Review of
Philosophy and Psychology,9(2), 31341.
Knobe, J. (2003). Intentional action and side effects in ordinary language.
Analysis,63(279), 1904.
Koriat, A., Lichtenstein, S., and Fischhoff, B. (1980). Reasons for condence.
Journal of Experimental Psychology: Human Learning and Memory,6(2),
10718.
Kornblith, H. (2010). What reective endorsement cannot do. Philosophy and
Phenomenological Research,80(1), 119.
Kripke, S. A. (1972). Naming and necessity. In D. Davidson and G. Harman
(eds.), Semantics of Natural Language (pp. 253355). Dordrecht: Springer.
Lerner, J. S. and Tetlock, P. E. (1999). Accounting for the effects of accountabil-
ity. Psychological Bulletin,125(2), 25575.
Liao, S. M. (2008). A defense of intuitions. Philosophical Studies,140(2), 24762.
Ludwig, K. (2007). The epistemology of thought experiments: rst person versus
third person approaches. Midwest Studies in Philosophy,31(1), 12859.
Machery, E. (2011). Thought experiments and philosophical knowledge.
Metaphilosophy,42(3), 191214.
Machery, E. (2012). Expertise and intuitions about reference. THEORIA. Revista
de Teoría, Historia y Fundamentos de la Ciencia,27(1), 3754.
Machery, E. (2017). Philosophy Within Its Proper Bounds. Oxford: Oxford
University Press.
Machery, E., Deutsch, M., Sytsma, J., Mallon, R., Nichols, S., and Stich, S. P.
2010. Semantic intuitions: reply to Lam. Cognition,117, 3616.
Machery, E., Mallon, R., Nichols, S., and Stich, S. P. (2004). Semantics, cross-
cultural style. Cognition,92(3), B1B12.
Machery, E., Stich, S., Rose, D., Chatterjee, A., Karasawa, K., Struchiner, N.,
Usui, N., and Hashimoto, T. (2017). Gettier across cultures. Noûs,51(3),
64564.
Machery, E, Stich, S., Rose, D., Chatterjee, A., Karasawa, K., Struchiner, N.,
Sirker, S., Usui, N., and Hashimoto, T. (2018). Gettier was framed! In S. Stich,
M. Mizumoto, and E. McCready (eds.), Epistemology for the Rest of the World
(pp. 12348). Oxford: Oxford University Press.
Malmgren, A. S. (2011). Rationalism and the content of intuitive judgements.
Mind,120(478), 263327.
Mizrahi, M. (2015). Three arguments against the expertise defense.
Metaphilosophy,46(1), 5264.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
294  , ç, .
Murray, D., Sytsma, J., and Livengood, J. (2013). God knows (but does God
believe?). Philosophical Studies,166(1), 83107.
Myers-Schulz, B. and Schwitzgebel, E. (2013). Knowing that P without believing
that P. Noûs,47(2), 37184.
Nado, J. (2015). Intuition, philosophical theorizing, and the threat of skepticism.
In E. Fischer and J. Collins (eds.), Experimental Philosophy, Rationalism, and
Naturalism: Rethinking Philosophical Method (chapter 9). New York:
Routledge.
Nado, J. (2016). The intuition deniers. Philosophical Studies,173(3), 781800.
Ollman, R. (1966). Fast guesses in choice reaction time. Psychonomic Science,
6(4), 1556.
Pachella, R. G. (1973). The interpretation of reaction time in information
processing research. Michigan University Ann Arbor Human Performance
Center (No. TR-45).
Pacini, R. and Epstein, S. (1999). The relation of rational and experiential
information processing styles to personality, basic beliefs, and the ratio-bias
phenomenon. Journal of Personality and Social Psychology,76(6), 97287.
Paxton, J. M., Ungar, L., and Greene, J. D. (2012). Reection and reasoning in
moral judgment. Cognitive Science,36(1), 16377.
Pinillos, N. Á., Smith, N., Nair, G. S., Marchetto, P., and Mun, C. (2011).
Philosophys new challenge: experiments and intentional action. Mind &
Language,26(1), 11539.
Pizarro, D. A., Uhlmann, E., and Bloom, P. (2003). Causal deviance and the
attribution of moral responsibility. Journal of Experimental Social Psychology,
39(6), 65360.
Radford, C. (1966). Knowledge: by examples. Analysis,27(1), 111.
Rand, D. G., Greene, J. D., and Nowak, M. A. (2012). Spontaneous giving and
calculated greed. Nature,489(7416), 42730.
Ratcliff, R. and Rouder, J. N. (1998). Modeling response times for two-choice
decisions. Psychological Science,9(5), 34756.
Rose, D. and Schaffer, J. (2013). Knowledge entails dispositional belief.
Philosophical Studies,166(1), 1950.
Russell, B. (1948). Human Knowledge: Its Scope and Its Limits. London: George
Allen & Unwin.
Sartwell, C. (1992). Why knowledge is merely true belief. The Journal of
Philosophy,89(4), 16780.
Schouten, J. and Bekker, J. (1967). Reaction time and accuracy. Acta
Psychologica,27, 14353.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
   295
Schwitzgebel, E. and Cushman, F. (2012). Expertise in moral reasoning? Order
effects on moral judgment in professional philosophers and non-philo-
sophers. Mind & Language,27(2), 13553.
Schwitzgebel, E. and Cushman, F. (2015). Philosophersbiased judgments persist
despite training, expertise and reection. Cognition,141, 12737.
Simonson, I. and Nye, P. (1992). The effect of accountability on susceptibility to
decision errors. Organizational Behavior and Human Decision Processes,
51(3), 41646.
Stich, S. P. and Machery, E. (forthcoming). Demographic differences in philo-
sophical intuition: a reply to Joshua Knobe. Review of Philosophy and
Psychology.
Strevens, M. (2019). Thinking Off Your Feet: How Empirical Psychology
Vindicates Armchair Philosophy. Cambridge, MA: Harvard University Press.
Swain, S., Alexander, J., and Weinberg, J. M. (2008). The instability of philo-
sophical intuitions: running hot and cold on truetemp. Philosophy and
Phenomenological Research,76(1), 13855.
Toplak, M. E., West, R. F., and Stanovich, K. E. (2011). The Cognitive Reection
Test as a predictor of performance on heuristics-and-biases tasks. Memory &
Cognition,39(7), 127589.
Turri, J. (2014). Knowledge and suberogatory assertion. Philosophical Studies,
167(3), 55767.
Weinberg, J. M. and Alexander, J. (2014). Intuitions through Thick and Thin.
Intuitions, 187231.
Weinberg, J. M., Alexander, J., Gonnerman, C., and Reuter, S. (2012).
Restrictionism and reection: challenge deected, or simply redirected? The
Monist,95(2), 20022.
Weinberg, J. M., Gonnerman, C., Buckner, C., and Alexander, J. (2010). Are
philosophers expert intuiters? Philosophical Psychology,23(3), 33155.
Wickelgren, W. A. (1977). Speed-accuracy tradeoff and information processing
dynamics. Acta Psychologica,41(1), 6785.
Williams, B. (1973). Deciding to believe. In B. Williams (ed.), Problems of the Self
(pp. 13651). Cambridge: Cambridge University Press.
Williamson, T. (2007). The Philosophy of Philosophy. Oxford: Blackwell.
OUP UNCORRECTED PROOF FIRST PROOF, 11/9/2021, SPi
296  , ç, .
... Researchers frequently find that inducing reflective and unreflective reasoning is more difficult than earlier work suggested (Deppe et al. 2015;Enke et al. 2021;Meyer et al. 2015;Thompson et al. 2013;Yılmaz & Sarıbay, 2016). The difficulty of detecting the effects of momentary manipulations of reflection on philosophical tendencies has reinvigorated normative disputes about experimental philosophy and reflection (Kneer et al. 2022). Of course, it may be that reflection has a different impact on intuitions about strange cases than it does on our settled philosophical stances about more ordinary matters. ...
... However, this research did find support for the first three hypotheses, suggesting that even philosophers' reflectiveness can be a robust predictor of their philosophical views related to the existence of god, the reality of scientific theories, and the referents of words even after controlling for factors that better explained variance in other philosophical tendencies. Overall, these data provide further empirical justification of philosophers "preoccupation with reflection" (Doris, 2015 Chapter 2) and its role in philosophical thinking (Byrd, 2021) even if there is more to philosophical tendencies than just reflective reasoning (Kneer et al. 2022;Knobe, 2021;Stich & Machery, 2022). ...
Article
Full-text available
Prior research found correlations between reflection test performance and philosophical tendencies among laypeople. In two large studies (total N = 1299)-one pre-registered-many of these correlations were replicated in a sample that included both laypeople and philosophers. For example, reflection test performance predicted preferring atheism over theism and instrumental harm over harm avoidance on the trolley problem. However, most reflection-philosophy correlations were undetected when controlling for other factors such as numeracy, preferences for open-minded thinking, personality, philosophical training, age, and gender. Nonetheless, some correlations between reflection and philosophical views survived this multivariate analysis and were only partially confounded with either education or self-reported reasoning preferences. Unreflective thinking still predicted believing in God whereas reflective thinking still predicted believing that (a) proper names like 'Santa' do not necessarily refer to entities that actually exist and (b) science does reveal the fundamental nature of the world. So some robust relationships between reflection and philosophical tendencies were detected even among philosophers, and yet there was clearly more to the link between reflection and philosophy. To this end, demographic and metaphilosophical hypotheses are considered.
... Existing attempts to find empirical support for such appeals to reflection have been less than encouraging. For example, experimentally increasing reflection may have little impact on philosophical judgments (Colaço, Kneer, Alexander, & Machery, 2016;Deppe et al., 2015;Meyer et al., 2015;Shenhav, Rand, & Greene, 2012) aside from increases doubt about one's existing beliefs (Yilmaz & Isler, 2019). ...
Poster
Full-text available
Prior research found correlations between reflection test performance and philosophical beliefs among lay people. In two large studies (total N > 1200)-one pre-registered replication and extension, many of these correlations were found among philosophers. For example, less reflective philosophers preferred theism over atheism (a la Pennycook et al., 2016) and instrumental harm over harm avoidance on the trolley problem (a la Hannikainen & Cova, in prep.;). However, some of these reflection-philosophy correlations were undetected when controlling for factors like education, gender, numeracy, and personality. Moreover, remaining correlations between reflection and philosophical beliefs were partially mediated by education and self-reported actively open-minded thinking. So although some relationships between reflection and philosophical belief remained robust among philosophers, there is more to the link between reflection and philosophy than previously understood. Normative implications are also discussed-e.g., obstacles to inferring the quality of philosophical views from their correlations with reflection test performance.
... Researchers frequently find that inducing reflective and unreflective reasoning is more difficult than earlier work suggested (Yılmaz & Sarıbay, 2016;Deppe et al., 2015;Enke et al., 2020;Kneer et al., 2021;Meyer et al., 2015;Thompson et al., 2013). Moreover, even if momentary manipulation of reflection had a reliable impact on intuitions about thought experiments, such manipulations may have little or no impact on more considered philosophical beliefs-e.g., the philosophical beliefs of academic philosophers. ...
Preprint
Pubslished version: https://www.researchgate.net/publication/359114933. Prior research found correlations between reflection test performance and philosophical beliefs among lay people. In two large studies (total N > 1200)-one pre-registered replication and extension-many of these correlations were found among philosophers. For example, less reflective philosophers preferred theism over atheism and instrumental harm over harm avoidance on the trolley problem. However, some of these reflection-philosophy correlations were undetected when controlling for factors like education, gender, numeracy, and personality. Moreover, the remaining correlations between reflection and philosophers' beliefs about language god, and science were partially mediated by education and self-reported preferences for actively open-minded thinking. So although some robust relationships between reflection and philosophers' philosophical beliefs were detected, there is clearly more to the link between reflection and philosophy. To this end, cultural and metaphilosophical hypotheses are considered. Then two normative implications are discussed-e.g., inferring the quality of philosophical judgments from correlations with reflection test performance.
... 155-158). For its empirical assessment, see, e.g., de Bruin (2021),Kneer et al., 2021;Schwitzgebel and Cushman (2015). ...
Article
Full-text available
Philosophers are often credited with particularly well-developed conceptual skills. The ‘expertise objection’ to experimental philosophy builds on this assumption to challenge inferences from findings about laypeople to conclusions about philosophers. We draw on psycholinguistics to develop and assess this objection. We examine whether philosophers are less or differently susceptible than laypersons to cognitive biases that affect how people understand verbal case descriptions and judge the cases described. We examine two possible sources of difference: Philosophers could be better at deploying concepts, and this could make them less susceptible to comprehension biases (‘linguistic expertise objection’). Alternatively, exposure to different patterns of linguistic usage could render philosophers vulnerable to a fundamental comprehension bias, the linguistic salience bias, at different points (‘linguistic usage objection’). Together, these objections mount a novel ‘master argument’ against experimental philosophy. To develop and empirically assess this argument, we employ corpus analysis and distributional semantic analysis and elicit plausibility ratings from academic philosophers and psychology undergraduates. Our findings suggest philosophers are better at deploying concepts than laypeople but are susceptible to the linguistic salience bias to a similar extent and at similar points. We identify methodological consequences for experimental philosophy and for philosophical thought experiments.
... This sentiment is supported by even the briefest of surveys of the extant record, which contains an impressive array of work on the potential effects of a wide range of variables on Gettier reactions. These include variables pertaining to the demographics of experimental participants such as their gender (e.g., Adleberg et al. 2015;Machery et al. 2017b), age (Colaço et al. 2014;Machery et al. 2017b), academic background (Starmans and Friedman 2020), need for cognition (Kneer et al. 2021;Machery et al. 2017b;Weinberg et al. 2012), and personality (Holtzman 2013;Machery et al. 2017b). In addition, there is a large body of work examining the impact of nondemographic variables on Gettier responses. ...
Article
Full-text available
We present three experiments that explore the robustness of the authentic-apparent effect—the finding that participants are less likely to attribute knowledge to the protagonist in apparent- than in authentic-evidence Gettier cases. The results go some way towards suggesting that the effect is robust to assessments of the justificatory status of the protagonist’s belief. However, not all of the results are consistent with an effect invariant across two demographic contexts: American and Indian nationalities.
... A recent study suggests this impact may be negligible. Markus Kneer et al. found that 'people make the same judgments when they are primed to engage in careful reflection as they do in the conditions standardly used by experimental philosophers' [30]. That is to say, people's intuitive responses to vignettes of the sort used in Cannold's study are expressions of deep-seated judgments. ...
Article
Full-text available
Can discussion with members of the public show philosophers where they have gone wrong? Leslie Cannold argues that it can in her 1995 paper ‘Women, Ectogenesis and Ethical Theory’, which investigates the ways in which women reason about abortion and ectogenesis (the gestation of foetuses in artificial wombs). In her study, Cannold interviewed female non-philosophers. She divided her participants into separate ‘pro-life’ and ‘pro-choice’ groups and asked them to consider whether the availability of ectogenesis would change their views about the morality of dealing with an unwanted pregnancy. The women in Cannold’s study gave responses that did not map onto the dominant tropes in the philosophical literature. Yet Cannold did not attempt to reason with her participants, and her engagement with the philosophical literature is oddly limited, focussing only on the pro-choice perspective. In this paper, I explore the question of whether Cannold is correct that philosophers’ reasoning about abortion is lacking in some way. I suggest that there are alternative conclusions to be drawn from the data she gathered and that a critical approach is necessary when attempting to undertake philosophy informed by empirical data.
Chapter
Many philosophers and scientists claim that there is a ‘hard problem of consciousness’, that qualia, phenomenology, or subjective experience cannot be fully understood with reductive methods of neuroscience and psychology, and that there is a fact of the matter as to ‘what it is like’ to be conscious and which entities are conscious [13]. Eliminativism and related views such as illusionism argue against this. They claim that consciousness does not exist in the ways implied by everyday or scholarly language. However, this debate has largely consisted of each side jousting analogies and intuitions against the other. Both sides remain unconvinced. To break through this impasse, I present consciousness semanticism, a novel eliminativist theory that sidesteps analogy and intuition. Instead, it is based on a direct, formal argument drawing from the tension between the vague semantics in definitions of consciousness such as ‘what it is like’ to be an entity [41] and the precise meaning implied by questions such as, ‘Is this entity conscious?’ I argue that semanticism naturally extends to erode realist notions of other philosophical concepts, such as morality and free will. Formal argumentation from precise semantics exposes these as pseudo-problems and eliminates their apparent mysteriousness and intractability.KeywordsPhilosophy of mindConsciousnessHard problem of consciousnessArtificial IntelligenceNeuroscienceEliminativismIllusionismMaterialismSemanticsSemanticism
Article
We summarize research and theory to show that, from early in human ontogeny, much information about other minds can be gleaned from reading the eyes. This analysis suggests that eyes serve as uniquely human windows into other minds, which critically extends the target article by drawing attention to what might be considered the neurodevelopmental origins of knowledge attribution in humans.
Article
I accept the main thesis of the article according to which representation of knowledge is more basic than representation of belief. But I question the authors’ contention that humans' unique capacity to represent belief does not underwrite the capacity for the accumulation of cultural knowledge.
Article
Full-text available
In a recent paper, Joshua Knobe (2019) offers a startling account of the metaphilosophical implications of findings in experimental philosophy. We argue that Knobe’s account is seriously mistaken, and that it is based on a radically misleading portrait of recent work in experimental philosophy and cultural psychology.
Article
Full-text available
Responding to recent concerns about the reliability of the published literature in psychology and other disciplines, we formed the X-Phi Replicability Project (XRP) to estimate the reproducibility of experimental philosophy (osf.io/dvkpr). Drawing on a representative sample of 40 x-phi studies published between 2003 and 2015, we enlisted 20 research teams across 8 countries to conduct a high-quality replication of each study in order to compare the results to the original published findings. We found that x-phi studies – as represented in our sample – successfully replicated about 70% of the time. We discuss possible reasons for this relatively high replication rate in the field of experimental philosophy and offer suggestions for best research practices going forward.
Article
Full-text available
This article explores whether perspective taking has an impact on the ascription of epistemic states. To do so, a new method is introduced which incites participants to imagine themselves in the position of the protagonist of a short vignette and to judge from her perspective. In a series of experiments (total N=1980), perspective proves to have a significant impact on belief ascriptions, but not on knowledge ascriptions. For belief, perspective is further found to moderate the epistemic side-effect effect significantly. It is hypothesized that the surprising findings are driven by the special epistemic authority we enjoy in assessing our own belief states, which does not extend to the assessment of our own knowledge states.
Article
Full-text available
Experimental philosophers are often puzzled as to why many armchair philosophers question the philosophical significance of their research. Armchair philosophers, in contrast, are often puzzled as to why experimental philosophers think their work sheds any light on traditional philosophical problems. I argue there is truth on both sides.
Article
This target article is concerned with the implications of the surprisingly different experimental practices in economics and in areas of psychology relevant to both economists and psychologists, such as behavioral decision making. We consider four features of experimentation in economics, namely, script enactment, repeated trials, performance-based monetary payments, and the proscription against deception, and compare them to experimental practices in psychology, primarily in the area of behavioral decision making. Whereas economists bring a precisely defined “script” to experiments for participants to enact, psychologists often do not provide such a script, leaving participants to infer what choices the situation affords. By often using repeated experimental trials, economists allow participants to learn about the task and the environment; psychologists typically do not. Economists generally pay participants on the basis of clearly defined performance criteria; psychologists usually pay a flat fee or grant a fixed amount of course credit. Economists virtually never deceive participants; psychologists, especially in some areas of inquiry, often do. We argue that experimental standards in economics are regulatory in that they allow for little variation between the experimental practices of individual researchers. The experimental standards in psychology, by contrast, are comparatively laissez-faire. We believe that the wider range of experimental practices in psychology reflects a lack of procedural regularity that may contribute to the variability of empirical findings in the research fields under consideration. We conclude with a call for more research on the consequences of methodological preferences, such as the use on monetary payments, and propose a “do-it-both-ways” rule regarding the enactment of scripts, repetition of trials, and performance-based monetary payments. We also argue, on pragmatic grounds, that the default practice should be not to deceive participants.
Article
Research on moral judgment has been dominated by rationalist models, in which moral judgment is thought to be caused by moral reasoning. The author gives 4 reasons for considering the hypothesis that moral reasoning does not cause moral judgment; rather, moral reasoning is usually a post hoc construction, generated after a judgment has been reached. The social intuitionist model is presented as an alternative to rationalist models. The model is a social model in that it deemphasizes the private reasoning done by individuals and emphasizes instead the importance of social and cultural influences. The model is an intuitionist model in that it states that moral judgment is generally the result of quick, automatic evaluations (intuitions). The model is more consistent than rationalist models with recent findings in social, cultural, evolutionary, and biological psychology, as well as in anthropology and primatology.