Article

P-curving x-phi: Does experimental philosophy have evidential value?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article, we analyse the evidential value of the corpus of experimental philosophy (x-phi). While experimental philosophers claim that their studies provide insight into philosophical problems, some philosophers and psychologists have expressed concerns that the findings from these studies lack evidential value. Barriers to evidential value include selection bias (i.e., the selective publication of significant results) and p-hacking (practices that increase the odds of obtaining a p-value below the significance level). To find out whether the significant findings in x-phi papers result from selection bias or p-hacking, we applied a p-curve analysis to a corpus of 365 x-phi chapters and articles. Our results suggest that this corpus has evidential value, although there are hints of p-hacking in a few parts of the x-phi corpus.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Introduction 9 statistics (Colombo, Duev, Nuijten, & Sprenger, 2018) and the literature generally lacks publication bias and shows evidential value (Stuart, Colaço, & Machery, 2019). 2 A noteworthy episode in the history of experimental philosophy is the study by Machery, Mallon, Nichols, and Stich (2004). ...
... This triggered great unrest in the community, a flurry of papers, development of new tools, and the birth of a new academic field; meta-science. Experimental philosophy as well was investigated on the replicability of its results (Cova et al., 2018), errors in reported statistics (Colombo et al., 2018), and the evidential value of these statistics (Stuart et al., 2019). Although, much has happened in the intervening years, there is still room for improvement. ...
... In an effort to improve the replicability, and cross-disciplinary comparability of this line of research on semantic intutions, we encourage researchers to consider quality assessment schemes from other disciplines (e.g., in medicine), in addition to other checks on research quality, like high-powered replication studies (e.g., Cova et al., 2018), checking for statistical reporting errors (e.g., Colombo et al., 2018), and meta-science tools for the uncovering publication bias and questionable research practices (e.g., Stuart et al., 2019). 5. ...
Preprint
The content of this dissertation spans four years of work, which was carried out in the Netherlands (Tilburg University and University of Amsterdam) and Italy (University of Turin). It is part of the ERC project “Making Scientific Inference More Objective” led by professor Jan Sprenger, for which philosophy of science and empirical research were combined. The dissertation can be summarized as a small set of modest attempts to contribute to improving scientific practice. Each of these attempts was geared towards either increasing understanding of a particular problem or making a contribution to how science can be practiced. The general focus was on philosophical nuance while remaining methodologically practicable. The five papers contained in this dissertation are both methodologically and philosophically diverse. The first three (Chapters 2 through 4) are more empirical in nature and are focused on understanding and evaluating how science is practiced: a meta-analysis of semantic intuitions research in experimental philosophy; a systematic review on essay literature on the null hypothesis significance test; and an experiment on how teams of statisticians analyze the same data. The last two (Chapters 5 and 6) are focused on the improvement of scientific practice by providing tools for the improvement of empirical research with a strong philosophical foundation: a practicable and testable definition of scientific objectivity and a Bayesian operationalization of Popper’s concept of a severe test.
... It might therefore be that experimental philosophers are especially apt at avoiding hasty generalizations. Indeed, research found that many x-phi studies were more replicable (Cova et al. 2021), contained fewer statistical reporting inconsistencies (Colombo et al. 2017), and were less affected by common questionable research practices (e.g., phacking) than psychology studies (Stuart et al. 2019). Consequently, it has been suggested that experimental philosophers may be "more sensitive to certain methodological questions, such as what counts as strong evidence for a given claim" (Cova et al. 2021, 31). ...
Article
Full-text available
Scientists may sometimes generalize from their samples to broader populations when they have not yet sufficiently supported this generalization. Do such hasty generalizations also occur in experimental philosophy? To check, we analyzed 171 experimental philosophy studies published between 2017 and 2023. We found that most studies tested only Western populations but generalized beyond them without justification. There was also no evidence that studies with broader conclusions had larger, more diverse samples, but they nonetheless had higher citation impact. Our analyses reveal important methodological limitations of many experimental philosophy studies and suggest that philosophical training may not protect against hasty generalizations.
... In any case, factors such as sample size, respect for experimental protocols, and the need to pre-register studies became common in the methodological debates on experimental philosophy. And, in fact, experimental philosophy as a discipline fares rather well in the general replicability of the domain (Stuart et al, 2019;Cova et al., 2019). From this generally optimistic outlook, one should not conclude, however, that the field is devoid of methodological pitfalls (Colombo et al., 2018). ...
... In any case, factors such as sample size, respect for experimental protocols, and the need to pre-register studies became common in the methodological debates on experimental philosophy. And, in fact, experimental philosophy as a discipline fares rather well in the general replicability of the domain (Stuart et al, 2019;Cova et al., 2019). From this generally optimistic outlook, one should not conclude, however, that the field is devoid of methodological pitfalls (Colombo et al., 2018). ...
... Experimental philosophers rarely sample from different populations, and typically extrapolate on the basis of convenience samples from online pools of participants examined in a single language . To provide evidence for this point, I randomly selected 10% of the articles (36 articles including 88 studies) on the list of experimental-philosophy studies developed by Stuart et al. (2019), which includes all the original experimental philosophy studies we could locate until 2017 (available at https://osf.io/2z87f/). 5 I coded whether the samples in each of the 88 studies were intentionally drawn from different populations (e.g., different countries, different religious groups, different genders, different age groups, etc.; Y/N/unspecified), and for what purpose (i.e., which variation was hypothesized), whether they were made of students (Y/N/unspecified), whether they were drawn from online pools of participants (Y/N/unspecified), and whether they were a community sample (Y/N/unspecified). ...
... In addition, some of the studies reviewed above exhibit methodological weaknesses, such as low power (Stanley, Carter, and Doucouliagos 2018), unrepresentative samples (Henrich, Heine, and Norenzayan 2010), and a failure to sufficiently motivate their research question(s) by theory (Muthukrishna and Henrich 2019). Moreover, there is some evidence of questionable research practices in experimental moral psychology more broadly (Stuart, Colaço, and Machery 2019), and of publication bias for some of the research programs reviewed above in particular (Landy and Goodwin 2015;McDonald et al. 2021). Thus, even among results that have not yet been challenged, there is a good chance that some of them will not hold up in the future (cf. ...
... Researchers will need to resist the urge to selectively report results or to "torture" their data, in an effort to yield significant findings (i.e. "p-hacking") (6). This is especially true, as the field begins to establish itself. ...
Article
Full-text available
Commentary on Aftab's "Experimental Philosophy of Psychiatry" for the Association for the Advancement of Philosophy and Psychiatry (AAPP) Bulletin, 28(1).
... Several substantial methodological criticisms of experimental philosophy's modus operandi have been raised before, for example by Polonioli (2017), Stuart et al. (2019), and Woolfolk (2013). Specifically, Woolfolk (2013) has argued convincingly that one of the main challenges that experimental philosophy faces is the credibility of selfreport questionnaires. ...
Article
Full-text available
A key challenge in experimental social science research is the incentivisation of subjects such that they take the tasks presented to them seriously and answer honestly. If subject responses can be evaluated against an objective baseline, a standard way of incentivising participants is by rewarding them monetarily as a function of their performance. However, the subject area of experimental philosophy is such that this mode of incentivisation is not applicable as participant responses cannot easily be scored along a true-false spectrum by the experimenters. We claim that experimental philosophers’ neglect of and claims of unimportance about incentivisation mechanisms in their surveys and experiments has plausibly led to poorer data quality and worse conclusions drawn overall, potentially threatening the research programme of experimental philosophy in the long run. As a solution to this, we propose the adoption of the Bayesian Truth Serum, an incentive-compatible mechanism used in economics and marketing, designed for eliciting honest responding in subjective data designs by rewarding participant answers that are surprisingly common. We argue that the Bayesian Truth Serum (i) adequately addresses the issue of incentive compatibility in subjective data research designs and (ii) that it should be applied to the vast majority of research in experimental philosophy. Further, we (iii) provide an empirical application of the method, demonstrating its qualified impact on the distribution of answers on a number of standard experimental philosophy items and outline guidance for researchers aiming to apply this mechanism in future research by specifying the additional costs and design steps involved.
... Sytsma & Livengood, 2016), due to the increasing professionalization of experimental philosophy. Ironically, experimental philosophy has turned out to be in even better shape than many subfields of psychology itself-for example, social psychology-which suffer from questionable research practices and replication failure (Colombo, Duev, Nuijten, & Sprenger, 2018;Cova et al., 2018;Stuart et al., 2019). ...
Article
Full-text available
In this paper, we first briefly survey the main responses to the challenge that experimental philosophy poses to the method of cases, given the common assumption that the latter is crucially based on intuitive judgments about cases. Second, we discuss two of the most popular responses in more detail: the expertise defense and the mischaracterization objection. Our take on the expertise defense is that the available empirical data do not support the claim that professional philosophers enjoy relevant expertise in their intuitive judgments about cases. In contrast, the mischaracterization objection seems considerably more promising than its largely negative reception has suggested. We argue that the burden of proof is thus on philosophers who still hold that the method of cases crucially relies on intuitive judgments about cases. Finally, we discuss whether conceptual engineering provides an alternative to the method of cases in light of the challenge from experimental philosophy. We argue that this is not clearly the case, because conceptual engineering also requires descriptive information about the concepts it aims to improve. However, its primarily normative perspective on our concepts makes it largely orthogonal to the challenge from experimental philosophy, and it can also benefit from the empirical methods of the latter.
Chapter
The Cambridge Handbook of Moral Psychology is an essential guide to the study of moral cognition and behavior. Originating as a philosophical exploration of values and virtues, moral psychology has evolved into a robust empirical science intersecting psychology, philosophy, anthropology, sociology, and neuroscience. Contributors to this interdisciplinary handbook explore a diverse set of topics, including moral judgment and decision making, altruism and empathy, and blame and punishment. Tailored for graduate students and researchers across psychology, philosophy, anthropology, neuroscience, political science, and economics, it offers a comprehensive survey of the latest research in moral psychology, illuminating both foundational concepts and cutting-edge developments.
Article
Norm violations have been demonstrated to impact a wide range of seemingly non-normative judgments. Among other things, when agents' actions violate prescriptive norms they tend to be seen as having done those actions more freely, as having acted more intentionally, as being more of a cause of subsequent outcomes, and even as being less happy. The explanation of this effect continue to be debated, with some researchers appealing to features of actions that violate norms, and other researcher emphasizing the importance of agents' mental states when acting. Here, we report the results of two large-scale experiments that replicate and extend twelve of the studies that originally demonstrated the pervasive impact of norm violations. In each case, we build on the pre-existing experimental paradigms to additionally manipulate whether the agents knew that they were violating a norm while holding fixed the action done. We find evidence for a pervasive impact of ignorance: the impact of norm violations on non-normative judgments depends largely on the agent knowing that they were violating a norm when acting. Moreover, we find evidence that the reduction in the impact of normality is underpinned by people's counterfactual reasoning: people are less likely to consider an alternative to the agent's action if the agent is ignorant. We situate our findings in the wider debate around the role or normality in people's reasoning.
Article
Full-text available
Philosophers and scientists refer to the special character of phenomenal consciousness, something supposedly obvious to all conscious persons. However, we had no empirical evidence about the folk view of consciousness until the first studies were carried out in the experimental philosophy of consciousness. According to the leading interpretation of these results, laypersons—people without academic knowledge about consciousness—do not notice the phenomenal aspect of consciousness. The aim of the article is to answer the question of whether we can trust these results. I show that there are serious doubts about the validity of the experimental philosophy of consciousness research. As a result, the leading interpretation should be rejected, and the question about the folk nature of the concept of consciousness must be regarded as open.
Article
I am grateful for Joshua Alexander and Jonathan Weinberg’s, Avner Baz’s and Max Deutsch’s insightful comments on Philosophy Within Its Proper Bounds. I have learned a lot thinking about them, identifying points of convergence and places where differences remain unbridgeable and trying to address the most pressing criticisms. In what follows, I will engage their commentaries in turn. 1. In defence of reliability From its very beginning, experimental philosophy has been thought to raise a challenge to some common ways of doing philosophy, in particular to what is now known as ‘the method of cases’, but for a long time the exact structure of this argument has been unclear and as a result hard to evaluate. One of the motivations behind Philosophy Within Its Proper Bounds was to articulate and defend the strongest argument possible bringing experimental-philosophy results to bear on the validity of the method of cases. The first section of Chapter 3, ‘Fooled by Cognitive Artifacts’, presents an argument schema (The Master Argument) and proposes a particular instantiation of this argument that appeals to reliability¹ (instead of other candidates such as hopefulness and calibration): in short, judgements elicited by philosophical cases (henceforth, ‘case judgements’) cannot be trusted because they are unreliable.
Article
This article responds to Chris Crandall's and John Symons's critical discussions of Philosophy Within Its Proper Bounds. I examine the significance of experimental-philosophy research for philosophy and for psychology and discuss the methodological shortcomings of experimental philosophy. I also consider how we can come to know metaphysical necessities of philosophical importance and defend a pragmatist take on conceptual engineering.
Chapter
Full-text available
Philosophical conceptual analysis is an experimental method. Focusing on this helps to justify it from the skepticism of experimental philosophers who follow Weinberg, Nichols & Stich (2001). To explore the experimental aspect of philosophical conceptual analysis, I consider a simpler instance of the same activity: everyday linguistic interpretation. I argue that this, too, is experimental in nature. And in both conceptual analysis and linguistic interpretation, the intuitions considered problematic by experimental philosophers are necessary but epistemically irrelevant. They are like variables introduced into mathematical proofs which drop out before the solution. Or better, they are like the hypotheses that drive science, which do not themselves need to be true. In other words, it does not matter whether or not intuitions are accurate as descriptions of the natural kinds that undergird philosophical concepts; the aims of conceptual analysis can still be met.
Article
Full-text available
Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science.
Article
Full-text available
The idea behind ego depletion is that willpower draws on a limited mental resource, so that engaging in an act of self-control impairs self-control in subsequent tasks. To present ego depletion as more than a convenient metaphor, some researchers have proposed that glucose is the limited resource that becomes depleted with self-control. However, there have been theoretical challenges to the proposed glucose mechanism, and the experiments that have tested it have found mixed results. We used a new meta-analytic tool, p-curve analysis, to examine the reliability of the evidence from these experiments. We found that the effect sizes reported in this literature are possibly influenced by publication or reporting bias and that, even within studies yielding significant results, the evidential value of this research is weak. In light of these results, and pending further evidence, researchers and policymakers should refrain from drawing any conclusions about the role of glucose in self-control.
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Article
Full-text available
When studies examine true effects, they generate right-skewed p-curves, distributions of statistically significant results with more low (.01 s) than high (.04 s) p values. What else can cause a right-skewed p-curve? First, we consider the possibility that researchers report only the smallest significant p value (as conjectured by Ulrich & Miller, 2015), concluding that it is a very uncommon problem. We then consider more common problems, including (a) p-curvers selecting the wrong p values, (b) fake data, (c) honest errors, and (d) ambitiously p-hacked (beyond p < .05) results. We evaluate the impact of these common problems on the validity of p-curve analysis, and provide practical solutions that substantially increase its robustness.
Article
Full-text available
This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package "statcheck." statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called "co-pilot model," and to use statcheck to flag possible inconsistencies in one's own manuscript or during the review process.
Article
Full-text available
Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study. Science , this issue 10.1126/science.aac4716
Article
Full-text available
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as "p-hacking," occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
Article
Full-text available
Because scientists tend to report only studies (publication bias) or analyses (p-hacking) that “work,” readers must ask, “Are these effects true, or do they merely reflect selective reporting?” We introduce p-curve as a way to answer this question. P-curve is the distribution of statistically significant p values for a set of studies (ps < .05). Because only true effects are expected to generate right-skewed p-curves—containing more low (.01s) than high (.04s) significant p values—only right-skewed p-curves are diagnostic of evidential value. By telling us whether we can rule out selective reporting as the sole explanation for a set of findings, p-curve offers a solution to the age-old inferential problems caused by file-drawers of failed studies and analyses.
Article
Full-text available
Experimental philosophy is a new interdisciplinary field that uses methods normally associated with psychology to investigate questions normally associated with philosophy. The present review focuses on research in experimental philosophy on four central questions. First, why is it that people's moral judgments appear to influence their intuitions about seemingly nonmoral questions? Second, do people think that moral questions have objective answers, or do they see morality as fundamentally relative? Third, do people believe in free will, and do they see free will as compatible with determinism? Fourth, how do people determine whether an entity is conscious?
Article
Full-text available
In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.
Article
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without requiring access to nonsignificant results. It capitalizes on the fact that the distribution of significant p values, p-curve, is a function of the true underlying effect. Researchers armed only with sample sizes and test results of the published findings can correct for publication bias. We validate the technique with simulations and by reanalyzing data from the Many-Labs Replication project. We demonstrate that p-curve can arrive at conclusions opposite that of existing tools by reanalyzing the meta-analysis of the “choice overload” literature.
Article
A substantial number of studies have been published over the last decade, claiming that transcranial direct current stimulation (tDCS) can influence performance on cognitive tasks. However, there is some skepticism regarding the efficacy of tDCS, and evidence from meta-analyses are mixed. One major weakness of these meta-analyses is that they only examine outcomes in published studies. Given biases towards publishing positive results in the scientific literature, there may be a substantial "file-drawer" of unpublished negative results in the tDCS literature. Furthermore, multiple researcher degrees of freedom can also inflate published p-values. Recently, Simonsohn, Nelson and Simmons (2014) created a novel meta-analytic tool that examines the distribution of significant p-values in a literature, and compares it to expected distributions with different effect sizes. Using this tool, one can assess whether the selected studies have evidential value. Therefore, we examined a random selection of studies that used tDCS to alter performance on cognitive tasks, and tDCS studies on working memory in a recently published meta-analysis (Mancuso et al., 2016). Using a p-curve analysis, we found no evidence that the tDCS studies had evidential value (33% power or greater), with the estimate of statistical power of these studies being approximately 14% for the cognitive studies, and 5% (what would be expected from randomly generated data) for the working memory studies. It is likely that previous tDCS studies are substantially underpowered, and we provide suggestions for future research to increase the evidential value of future tDCS studies.
Article
In a well-known article, Carney, Cuddy, and Yap (2010) documented the benefits of “power posing”. In their study, participants (N=42) who were randomly assigned to briefly adopt expansive, powerful postures sought more risk, had higher testosterone levels, and had lower cortisol levels than those assigned to adopt contractive, powerless postures. In their response to a failed replication by Ranehill et al. (2015), Carney, Cuddy, and Yap (2015) reviewed 33 successful studies investigating the effects of expansive vs. contractive posing, focusing on differences between these studies and the failed replication, to identify possible moderators that future studies could explore. But before spending valuable resources on that, it is useful to establish whether the literature that Carney et al. (2015) cited actually suggests that power posing is effective. In this paper we rely on p-curve analysis to answer the following question: Does the literature reviewed by Carney et al. (2015) suggest the existence of an effect once we account for selective reporting? We conclude not. The distribution of p-values from those 33 studies is indistinguishable from what is expected if (1) the average effect size were zero, and (2) selective reporting (of studies and/or analyses) were solely responsible for the significant effects that are published. Although more highly powered future research may find replicable evidence for the purported benefits of power posing (or unexpected detriments), the existing evidence is too weak to justify a search for moderators or to advocate for people to engage in power posing to better their lives.
Article
In this article, we discuss critically some of the key themes in Max Deutsch’s excellent book, The Myth of the Intuitive. We focus in particular on the shortcomings of his historical analysis – a missed opportunity by our lights, on the claim that philosophers present arguments in support of the judgments elicited by thought experiments, and on the claim that experimental philosophy is only relevant for the methodology of philosophy if thought experiments elicit intuitions.
Article
We performed an exhaustive meta-analysis of 73 peer-reviewed journal articles on syntactic priming from the seminal Bock (1986) paper through 2013. Extracting the effect size for each experiment and condition, where the effect size is the log odds ratio of the frequency of the primed structure X to the frequency of the unprimed structure Y, we found a robust effect of syntactic priming with an average weighted odds ratio of 1.67 when there is no lexical overlap and 3.26 when there is. That is, a construction X which occurs 50% of the time in the absence of priming would occur 63% if primed without lexical repetition and 77% of the time if primed with lexical repetition. The syntactic priming effect is robust across several different construction types and languages, and we found strong effects of lexical overlap on the size of the priming effect as well as interactions between lexical repetition and temporal lag and between lexical repetition and whether the priming occurred within or across languages. We also analyzed the distribution of p-values across experiments in order to estimate the average statistical power of experiments in our sample and to assess publication bias. Analyzing a subset of experiments in which the primary result of interest is whether a particular structure showed a priming effect, we did not find evidence of major p-hacking and the studies appear to have acceptable statistical power: 82%. However, analyzing a subset of experiments that focus not just on whether syntactic priming exists but on how syntactic priming is moderated by other variables (such as repetition of words in prime and target, the location of the testing room, and the memory of the speaker), we found that such studies are, on average, underpowered with estimated average power of 53%. Using a subset of 45 papers from our sample for which we received raw data, we estimated subject and item variation and give recommendations for appropriate sample size for future syntactic priming studies.
Book
A defense of traditional philosophical method against challenges from practitioners of “experimental philosophy.” In The Myth of the Intuitive, Max Deutsch defends the methods of analytic philosophy against a recent empirical challenge mounted by the practitioners of experimental philosophy (xphi). This challenge concerns the extent to which analytic philosophy relies on intuition—in particular, the extent to which analytic philosophers treat intuitions as evidence in arguing for philosophical conclusions. Experimental philosophers say that analytic philosophers place a great deal of evidential weight on people's intuitions about hypothetical cases and thought experiments. Deutsch argues forcefully that this view of traditional philosophical method is a myth, part of “metaphilosophical folklore,” and he supports his argument with close examinations of results from xphi and of a number of influential arguments in analytic philosophy. Analytic philosophy makes regular use of hypothetical examples and thought experiments, but, Deutsch writes, philosophers argue for their claims about what is true or not true in these examples and thought experiments. It is these arguments, not intuitions, that are treated as evidence for the claims. Deutsch discusses xphi and some recent xphi studies; critiques a variety of other metaphilosophical claims; examines such famous arguments as Gettier's refutation of the JTB (justified true belief) theory and Kripke's Gödel Case argument against descriptivism about proper names, and shows that they rely on reasoning rather than intuition; and finds existing critiques of xphi, the “Multiple Concepts” and “Expertise” replies, to be severely lacking. Bradford Books imprint
Article
In this article, we present evidence that in four different cultural groups that speak quite different languages (Brazil, India, Japan, and the USA) there are cases of justified true beliefs that are not judged to be cases of knowledge. We hypothesize that this intuitive judgment, which we call “the Gettier intuition,” may be a reflection of an underlying innate and universal core folk epistemology, and we highlight the philosophical significance of its universality.
Article
In “Normativity and Epistemic Intuitions” (NEI), Weinberg, Nichols and Stich famously argue from empirical data that East Asians and Westerners have different intuitions about Gettier-style cases. We attempted to replicate their study about the Gettier Car Case. Our study used the same methods and case taken verbatim, but sampled an East Asian population 2.5 times greater than NEI's 23 participants. We found no evidence supporting the existence of cross-cultural difference about the intuition concerning the case. Taken together with the failures of both of the existing replication studies (Nagel et al. 2013; Seyedsayamdost 2014), our data provide strong evidence that the purported cross-cultural difference in intuitions about Gettier-style cases does not exist.
Article
The field of experimental philosophy has received considerable attention, essentially for producing results that seem highly counter-intuitive and at the same time question some of the fundamental methods used in philosophy. One of the earlier influential papers that gave rise to the experimental philosophy movement titled ‘Normativity and Epistemic Intuitions’ by Jonathan M. Weinberg, Shaun Nichols and Stephen Stich (2001), reported that respondents displayed different epistemic intuitions depending on their ethnic background as well as socioeconomic status. These findings, if robust, would have important implications for philosophical methodology in general and epistemology in particular. Because of the important implication of its findings, Weinberg et al. (2001) has been very influential - currently with more than four hundred citations - and the subject of extensive debate. Despite the paper's significance and despite all the debates this paper has generated, there has not been a replication attempt of its experiments. We collected data from four different sources (two on-line and two in-person) to replicate the experiments. Despite several different data sets and in various cases larger sample sizes, we failed to detect significant differences between the above-mentioned groups. Our results suggest that epistemic intuitions are more uniform across ethnic and socioeconomic groups than Weinberg et al. (2001) indicates. Given our data, we believe that the notion of differences in epistemic intuitions among different ethnic and socioeconomic groups advanced by Weinberg et al. (2001) and accepted by many researchers needs to be corrected.
Article
To address the underrepresentation of women in philosophy effectively, we must understand the causes of the early loss of women. In this paper we challenge one of the few explanations that has focused on why women might leave philosophy at early stages. Wesley Buckwalter and Stephen Stich (2014, Experimental philosophy. Oxford: Oxford University Press) offer some evidence that women have different intuitions than men about philosophical thought experiments. We present some concerns about their evidence and we discuss our own study, in which we attempted to replicate their results for 23 different responses (intuitions or judgments) to 14 scenarios (thought experiments). We also conducted a literature search to see if other philosophers or psychologists have tested for gender differences in philosophical intuitions. Based on our findings, we argue that that it is unlikely that gender differences in intuitions play a significant role in driving women from philosophy.
Article
This article offers a critique of research practices typical of experimental philosophy. To that end, it presents a review of methodological issues that have proved crucial to the quality of research in the biobehavioral sciences. It discusses various shortcomings in the experimental philosophy literature related to (1) the credibility of self‐report questionnaires, (2) the validity and reliability of measurement, (3) the adherence to appropriate procedures for sampling, random assignment, and handling of participants, and (4) the meticulousness of study reporting. It argues that the future standing of experimental philosophy will hinge upon improvements in research methods.
Article
The use of power to infer null hypotheses from negative results has recently come under severe attack. In this article, I show that the power of a test can justify accepting the null hypothesis. This argument also gives us a new powerful reason for not treating p-values and power as measures of the strength of evidence.
Article
In their paper titled “Gender and philosophical intuition,” Buckwalter and Stich (forthcoming) argue that the intuitions of women and men differ significantly on various types of philosophical questions. Furthermore, men's intuitions, so the authors claim, are more in line with traditionally accepted solutions of classical problems. This inherent bias, so the argument goes, is one of the factors that leads more men than women to pursue degrees and careers in philosophy. These findings have received a considerable amount of attention and the paper is to appear in the second edition of Experimental Philosophy edited by Knobe and Nichols (2013), which itself is an influential outlet. Given the exposure of these results, we attempted to replicate three of the classes of questions that Buckwalter and Stich review in their paper and for which they report significant differences. We failed to replicate the results using several different sources for data collection (one being identical to the original procedures). Given our results, we do not believe the outcomes from Buckwalter and Stich (forthcoming) that we examined for this paper to be robust. That is, men and women do not seem to differ significantly in their intuitive responses to these philosophical scenarios.
Article
Despite well-established results in survey methodology, many experimental philosophers have not asked whether and in what way conclusions about folk intuitions follow from people’s responses to their surveys. Rather, they appear to have proceeded on the assumption that intuitions can be simply read off from survey responses. Survey research, however, is fraught with difficulties. I review some of the relevant literature—particularly focusing on the conversational pragmatic aspects of survey research—and consider its application to common experimental philosophy surveys. I argue for two claims. First, that experimental philosophers’ survey methodology leaves the facts about folk intuitions massively underdetermined; and second, that what has been regarded as evidence for the instability of philosophical intuitions is, at least in some cases, better accounted for in terms of subjects’ reactions to subtle pragmatic cues contained in the surveys.
Article
Experimental philosophy seeks to examine empirically various factual issues that, either explicitly or implicitly, lie at the foundations of philosophical positions. A study of this genre (Miller & Feltz, 2011) was critiqued. Questions about the study were raised and broader issues pertaining to the field of experimental philosophy were discussed.
  • E Machery
  • S P Stich
  • D Rose
  • A Chatterjee
  • K Karasawa
  • N Struchiner
  • S Sirker
  • N Usui
  • T Hashimoto
Machery, E., S.P. Stich, D. Rose, A. Chatterjee, K. Karasawa, N. Struchiner, S. Sirker, N. Usui and T. Hashimoto. 2017. Gettier across cultures. Noû s 51: 645-64.
Experimental philosophy: what is it good for?
  • E O'neill
  • E Machery
O'Neill, E. and E. Machery. 2014. Experimental philosophy: what is it good for? In Current Controversies in Experimental Philosophy, eds. E. Machery and E. O'Neill, vii-xxix. New York: Routledge.
Philosophy vs. imitation psychology
  • T Williamson
Williamson, T. 2010. Philosophy vs. imitation psychology. New York Times, August 19. <http://www.nytimes.com/roomfordebate/2010/08/19/x-phis-new-take-on-old-proble ms/philosophy-vs-imitation-psychology?> last accessed 31 January 2019.
  • Machery
False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant
  • Simmons