Uri Simonsohn's research while affiliated with Universitat Ramon Llull and other places

Publications (73)

Article
Meta-analysts’ practice of transcribing and numerically combining all results in a research literature can generate uninterpretable and/or misleading conclusions. Meta-analysts should instead critically evaluate studies, draw conclusions only from those that are valid and provide readers with enough information to evaluate those conclusions.
Article
We identify 15 claims Pham and Oh (2020) make to argue against pre‐registration. We agree with 7 of the claims, but think that none of them justify delaying the encouragement and adoption of pre‐registration. Moreover, while the claim they make in their title is correct ‐ pre‐registration is neither necessary nor sufficient for a credible science –...
Article
In this article, we (1) discuss the reasons why pre‐registration is a good idea, both for the field and for individual researchers, (2) respond to arguments against pre‐registration, (3) describe how to best write and review a pre‐registration, and (4) comment on pre‐registration’s rapidly accelerating popularity. Along the way, we describe the (bi...
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
Full-text available
Empirical results hinge on analytical decisions that are defensible, arbitrary and motivated. These decisions probably introduce bias (towards the narrative put forward by the authors), and they certainly involve variability not reflected by standard errors. To address this source of noise and bias, we introduce specification curve analysis, which...
Article
Recent public backlash to corporate experimentation was likely caused by the policies the experiments contained rather than a more general “experiment aversion.”
Article
Full-text available
Several researchers have relied on, or advocated for, internal meta-analysis, which involves statistically aggregating multiple studies in a paper to assess their overall evidential value. Advocates of internal meta-analysis argue that it provides an efficient approach to increasing statistical power and solving the file-drawer problem. Here we sho...
Article
Several researchers have relied on, or advocated for, internal meta-analysis, which involves statistically aggregating multiple studies in a paper to assess their overall evidential value. Advocates of internal meta-analysis argue that it provides an efficient approach to increasing statistical power and solving the file-drawer problem. Here we sho...
Article
Full-text available
p-curve, the distribution of significant p-values, can be analyzed to assess if the findings have evidential value, whether p-hacking and file-drawering can be ruled out as the sole explanations for them. Bruns and Ioannidis (2016) have proposed p-curve cannot examine evidential value with observational data. Their discussion confuses false-positiv...
Article
Abundant research has shown that people fail to disregard to-be-ignored information (e.g., hindsight bias, curse of knowledge), which has contributed to the popular notion that people are unwillingly and unconsciously affected by information. Here we provide evidence that, instead, people simply do not want to ignore such information. The findings:...
Article
We describe why we wrote "False-Positive Psychology," analyze how it has been cited, and explain why the integrity of experimental psychology hinges on the full disclosure of methods, the sharing of materials and data, and, especially, the preregistration of analyses.
Article
In 2010–2012, a few largely coincidental events led experimental psychologists to realize that their approach to collecting, analyzing, and reporting data made it too easy to publish false-positive findings. This sparked a period of methodological reflection that we review here and call Psychology’s Renaissance. We begin by describing how psycholog...
Article
We define transactions as weird when they include unexplained features, that is, features not implicitly, explicitly, or self-evidently justified, and propose that people are averse to weird transactions. In six experiments, we show that risky options used in previous research paradigms often attained uncertainty via adding an unexplained transacti...
Article
In a well-known article, Carney, Cuddy, and Yap (2010) documented the benefits of “power posing”. In their study, participants (N=42) who were randomly assigned to briefly adopt expansive, powerful postures sought more risk, had higher testosterone levels, and had lower cortisol levels than those assigned to adopt contractive, powerless postures. I...
Article
Full-text available
Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some...
Preprint
The Transparency and Openness Promotion (TOP) Committee met in November 2014 to address one important element of the incentive systems - journals’ procedures and policies for publication. The outcome of the effort is the TOP Guidelines. There are eight standards in the TOP guidelines; each move scientific communication toward greater openness. Thes...
Article
I agree with Schwarz & Clore on the importance of considering differences between original and replication studies when interpreting replication failures. I disagree on the proposition that without manipulation checks replications cannot be statistically analyzed as such, and disagree on their approach to considering hypotheses for why a replicatio...
Article
When studies examine true effects, they generate right-skewed p-curves, distributions of statistically significant results with more low (.01 s) than high (.04 s) p values. What else can cause a right-skewed p-curve? First, we consider the possibility that researchers report only the smallest significant p value (as conjectured by Ulrich & Miller,...
Article
Full-text available
INSIGHTS Design principles for synthetic ecology p. 1425 ▶ Whacking hydrogen into metal p. 1429 PE R S PE C T IVE S SCIENTIFIC INTEGRITY Self-correction in science at work By Bruce Alberts, 1 Ralph J. Cicerone, 2 Stephen E. Fienberg, 3 Alexander Kamb, 4 Marcia McNutt, 5 * Robert M. Nerem, 6 Randy Schekman , 7 Richard Shiffrin, 8 Victoria Stodden, 9...
Article
Full-text available
Transparency, openness, and reproducibility are readily recognized as vital features of science (1, 2). When asked, most scientists embrace these features as disciplinary norms and values (3). Therefore, one might expect that these valued features would be routine in daily practice. Yet, a growing body of evidence suggests that this is not the case...
Article
This article introduces a new approach for evaluating replication results. It combines effect-size estimation with hypothesis testing, assessing the extent to which the replication results are consistent with an effect size big enough to have been detectable in the original study. The approach is demonstrated by examining replications of three well...
Article
Empirical results often hinge on data analytic decisions that are simultaneously defensible, arbitrary, and motivated. To mitigate this problem we introduce Specification-Curve Analysis, which consists of three steps: (i) identifying the set of theoretically justified, statistically valid, and non-redundant analytic specifications, (ii) displaying...
Article
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without requiring access to nonsignificant results. It capitalizes on the fact that the distribution of significant p values, p-curve, is a function of the tru...
Article
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without requiring access to nonsignificant results. It capitalizes on the fact that the distribution of significant p values, p-curve, is a function of the tru...
Article
Full-text available
There is growing appreciation for the advantages of experimentation in the social sciences. Policy-relevant claims that in the past were backed by theoretical arguments and inconclusive correlations are now being investigated using more credible methods. Changes have been particularly pronounced in development economics, where hundreds of randomize...
Article
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without requiring access to nonsignificant results. It capitalizes on the fact that the distribution of significant p-values, p-curve, is a function of the tru...
Article
I discuss points of agreement and disagreement with Francis (2013), and argue that the main lesson from his numerous one-off publication bias critiques is that developers of new statistical tools ought to anticipate their potential misuses and develop safeguards to prevent them.
Article
I argue that requiring authors to post the raw data supporting their published results has the benefit, among many others, of making fraud much less likely to go undetected. I illustrate this point by describing two cases of suspected fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of...
Article
Because scientists tend to report only studies (publication bias) or analyses (p-hacking) that "work," readers must ask, "Are these effects true, or do they merely reflect selective reporting?" We introduce p-curve as a way to answer this question. P-curve is the distribution of statistically significant p values for a set of studies (ps < .05). Be...
Article
When does a replication attempt fail? The most common standard is: when it obtains p>.05. I begin here by evaluating this standard in the context of three published replication attempts, involving investigations of the embodiment of morality, the endowment effect, and weather effects on life satisfaction, concluding the standard has unacceptable pr...
Article
Full-text available
Many professionals, from auditors, venture capitalists, and lawyers, to clinical psychologists and journal editors, divide continuous flows of judgments into subsets. College admissions interviewers, for instance, evaluate but a handful of applicants a day. We conjectured that in such situations, individuals engage in narrow bracketing, assessing e...
Article
In this presentation, we discussed how researchers' commitment to avoid p-hacking will affect their research lives. One conclusion is that most experimental research cannot be successful without at least 50 observations per condition.
Article
We revisit a recent failure-to-replicate the finding that arbitrary anchors influence monetary valuations. Though in the replication the point estimate is indeed not significantly different from zero, it is also not significantly different from a sizable effect. This is partially explained by a high share of valuations near $0 in the replication, c...
Article
Francis (2012a, 2012b, 2012c, 2012d, 2012e, in press) attacks individual papers through critiques that apply faulty logic to analyses ironically biased by cherry picking. However well intentioned, the critiques are probably counterproductive to their stipulated goal and certainly unfair to the targeted authors. © The Author(s) 2012.
Article
One year after publishing "False-Positive Psychology," we propose a simple implementation of disclosure that requires but 21 words to achieve full transparency. This article is written in a casual tone. It includes phone-taken pictures of milk-jars and references to ice-cream and sardines.
Article
Uri Simonsohn explains how he uncovered wrongdoing in psychology research.
Article
Full-text available
In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an...
Article
Full-text available
Abundant research has documented that incidental factors (cognitive and emotional) can influence both judgment and choice. This paper tested such influences in the field, by assessing the impact of weather during the visit of prospective undergraduate students, on their decision to enroll in the visited school. As expected, cloudiness during their...
Article
In Simonsohn (2011) I reported the results from 14 studies that suggest all existing evidence of implicit egotism in marriage, job, and location decisions is spurious. Lack of diligence by Pelham and colleagues explains in great part why the confounds behind their findings were not addressed in time. They almost never included controls, were dismis...
Article
Implicit egotism is the notion that major life decisions are influenced by name-similarity. This paper revisits the evidence for the most systematic test of this hypothesis. Anseel & Duyck (2008) analyzed data from 1/3 of all Belgian employees and found that a disproportionate fraction of them shared their initial with their employer. Using a datas...
Article
Full-text available
Although experimental studies have documented systematic decision errors, many leading scholars believe that experience, competition, and large stakes will reliably extinguish biases. We test for the presence of a fundamental bias, loss aversion, in a high-stakes context: professional golfers' performance on the PGA Tour. Golf provides a natural se...
Article
In 2007, Consumer Reports released, and two weeks later retracted, a flawed report on the safety of infant car seats. Analyzing data from 5471 online auctions for car seats ending before, during, and after the information was considered valid, this article shows that (1) consumers responded to the new information and, more surprisingly, (2) they pr...
Article
Three articles published in the Journal of Personality and Social Psychology have shown that a disproportionate share of people choose spouses, places to live, and occupations with names similar to their own. These findings, interpreted as evidence of implicit egotism, are included in most modern social psychology textbooks and many university cour...
Article
Where do people's reference points come from? We conjectured that round numbers in performance scales act as reference points and that individuals exert effort to perform just above rather than just below such numbers. In Study 1, we found that professional baseball players modify their behavior as the season is about to end, seeking to finish with...
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, an...
Article
Computing the average ‘effect’, the difference in means of the dependent variable between conditions, is by far the most common way of analyzing experimental data. In many settings, however, such comparison is incomplete if not misleading with respect to the question of interest. An alternative is to examine the share of subjects exhibiting an effe...
Article
This paper introduces the notion of 'psychology-compatible' to refer to elicitation mechanisms that do not interfere with the psychological processes behind the phenomenon being studied. Its importance is demonstrated with one of the most commonly used mechanisms in experimental economics, the multiple-price-list, where subjects answer yes/no to th...
Article
Do firms neglect competition when making entry decisions? This paper addresses this question analyzing the time of day at which eBay sellers set their auctions to end. Consistent with competition neglect, it is found that (i) a disproportionate share of auctions end during peak bidding hours, (ii) such hours exhibit lower selling rates and prices,...
Article
Do firms neglect competition when making entry decisions? This paper addresses this question analyzing the time of day at which eBay sellers set their auctions to end. Consistent with competition neglect, it is found that (i) a disproportionate share of auctions end during peak bidding hours, (ii) such hours exhibit lower selling rates and prices,...
Article
Does current utility bias predictions of future utility for high stakes decisions? Here I provide field evidence consistent with such Projection Bias in one of life's most thought-about decisions: college enrolment. After arguing and documenting with survey evidence that cloudiness increases the appeal of academic activities, I analyse the enrolmen...
Article
In 2007 Consumer Reports released, and two weeks later retracted, a flawed report on the safety of infant carseats. Analyzing data from 5,471 online auctions for carseats ending before, during and after the information was considered valid I find that (1) consumers responded to the new information, and –more interestingly- that (2) they promptly ce...
Article
Why would people pay more for a $50 gift certificate than for the opportunity to receive a gift certificate worth either $50 or $100, with equal probability? This article examines three possible mechanisms for this recently documented uncertainty effect (UE): First, awareness of the better outcome may devalue the worse one. Second, the UE may have...
Article
The internet and other large textual databases contain billions of documents: is there useful information in the number of documents written about different topics? We propose, based on the premise that the occurrence of a phenomenon increases the likelihood that people write about it, that the relative frequency of documents discussing a phenomeno...
Article
Why do different people give to different causes? We show that the sympathy inherent to a close relationship with a victim extends to other victims suffering from the same misfortunes that have afflicted their friends and loved ones. Both sympathy and donations are greater among those related to a victim, and they are greater among those in a commu...
Article
We document that eBay bidders exhibit a biased preference for auctions with more bids, even if these are non-diagnostic of quality, creating an incentive for sellers to lower starting prices to attract early bids. We find that lowering starting prices succeeds in increasing the likelihood that an auction will receive additional bids, conditioning o...
Article
The internet contains billions of documents. We study if there is useful information in the frequency with which different topics are written about. Based on the premise that the occurrence of an event increases its textual frequency, we assess whether internet document-frequency can capture cross-sectional variation in the occurrence-frequency of...
Article
Full-text available
Forgoing immediate revenue in order to provide customers with a more satisfying experience may be an optimal decision for a firm if dissatisfaction reduces customer lifetime value. This paper estimates the long-run effects of paying late fees, a leading cause of dissatisfaction in the video-rental market, on lifetime value. Using administrative dat...
Article
Abundant experimental research has documented that incidental primes and emotions are capable of influencing people's judgments and choices. This paper examines whether the influence of such incidental factors is large enough to be observable in the field, by analyzing 682 actual university admission decisions. As predicted, applicants' academic at...
Article
Standard economic models assume that the weight given to information from different sources depends exclusively on its diagnosticity. In this paper we study whether the same piece of information is weighted more heavily simply because it arose from direct experience rather than from observation. We investigate this possibility by conducting repeate...
Article
Previous experimental research has shown that people's decisions can be influenced by options they have encountered in the past. This paper uses PSID data to study this phenomenon in the field, by observing how long people commute after moving between cities. It is found, as predicted, that (i) people choose longer commutes in a city they have just...
Article
Based on contrast effects studies from psychology, we predicted that movers arriving from more expensive cities would rent pricier apartments than those arriving from cheaper cities. We also predicted that as people stayed in their new city they would get used to the new prices and would readjust their housing expenditures countering the initial im...
Article
Standard economic models assume that the weight given to information from different sources depends exclusively on its diagnosticity. In this paper we study whether the same piece of information is weighted more heavily simply because it arose from direct experience rather than from observation. We investigate this possibility by conducting repeate...
Article
Abundant experimental research has documented that incidental primes and emotions are capable of influencing people's judgments and choices. This paper examines whether the influence of such incidental factors is large enough to be observable in the field, by analyzing 682 actual university admission decisions. As predicted, applicants' academic at...
Article
Based on contrast effects studies from psychology, we predicted that movers arriving from more expensive cities would rent pricier apartments than those arriving from cheaper cities. We also predicted that as people stayed in their new city they would get used to the new prices and would readjust their housing expenditures countering the initial im...
Article
We hypothesized that on-line auction bidders would herd behind other bidders even when observed choices did not reveal private information. A model that inserts bidders engaging in this type of non-rational herding into a competitive market shows that, in equilibrium, (some) sellers set low starting-price in order to attract low valuation bidders w...
Article
This paper studies the impact of personally knowing a victim on social preferences for the welfare of other victims of the same misfortune. We begin with the analysis of a survey of volunteers, which shows that they tend to volunteer for organizations that target misfortunes previously suffered by their friends and relatives. In order to control fo...
Article
Full-text available
Standard models of choice assume that the weight given to information from different sources depends exclusively on its diagnosticity. In this paper we study whether directly experienced information influences behavior more strongly than vicariously obtained information. We conducted two experiments that, unlike prior studies, maintain content, for...

Citations

... One such approach is preregistration, which is a method to increase research transparency by documenting research decisions on a public, third-party repository prior to data collection. If done correctly, preregistration can prevent the cherry-picking of analyses and/or data transformations/cleaning choices that yield more desirable results, a behavior known as "phacking" (Simmons et al., 2020; see also Moore, 2016). Given that such behaviors are often done unintentionally, we prefer framing the key benefit of preregistration as reducing the number of "researcher degrees of freedom" that could unintentionally or intentionally be exploited to achieve spurious positive-or more desirable-results. ...
... We would also like to point out that, except for Gianelli and Dalla Volta (2015), none of the analyzed studies has been (at least, explicitly) preregistered. Preregistration does not automatically make an investigation better, but it is a practice that prevents p-hacking and HARKing (Simmons et al., 2021) and also allows other researchers to evaluate science in a more transparent way (Lakens, 2019). Therefore, another recommendation we make for future work in this field is to preregister hypotheses, sample size, analysis plan, and any other experimenter degrees of freedom before carrying out data collection (for details, see Simmons et al., 2021), which will control for indiscriminate flexibility and facilitate a less-biased evaluation of outcomes (Lakens, 2019;Simmons et al., 2021). ...
... Mislavsky et al. (8) found that participants rated low-stakes corporate A/B tests as no worse than their least-preferred policy. These authors also claimed (9) that some of Meyer et al.'s (3) data lacked evidence for experiment aversion. ...
... Thus, given the rich higher-order model in Thijssen et al. (2020), we first perform a replication of the original study using the full baseline data. Second, we evaluate their core analyses using the multiverse (Steegen et al., 2016) and specification curve analyses (Simonsohn et al., 2020) to investigate how the use of different environmental and puberty variables in the ABCD study data may impact interpretations and conclusions pertaining to the investigated brain-behavior associations. While the replication provides evidence of how effects replicate across research teams and different sets of ABCD data, the specification curve provides a reference for how alternative definitions of key variables that represent stressful experiences in the environment may impact the underlying conclusions. ...
... This conflict has important implications for research and policy. Objecting to an experiment only because one objects to one or both policies the experiment contains (8,9) does not necessarily constitute a judgment anomaly. If that were the only reason why people object to experiments, then policy makers could theoretically forestall backlashes to A/B testing by only comparing policies that people like, as Dietvorst et al. (10) suggest. ...
... We conducted a meta-analysis of the three studies to evaluate the robustness of the truth effect in the instructed conditions. 13 As all studies were pre-registered, and these are the only three studies in this line of research so far, the internal meta-analysis is justified (see Vosgerau et al., 2019). We combined the data from the three experiments, including only the instructed and the experienced-only conditions, but not the experience plus tag condition introduced in Experiment 2. ...
... (Retrieved: 08.09.2020) concerns, because in a case of no publication bias detection, the population effect size can be overestimated. The harmful consequences of publication bias in scientific research are a) the exploitation of degrees of freedom (df) [52], b) population effect overestimation c.f. [53], c) researcher behavior striving to obtain statistically significant results [54], commonly known as p-hacking [55], d) selective reporting of papers with statistically significant findings, termed as file-drawering [54], [56], and e) p-backing, defined as the process of evaluating different analyses on the same data set, then selectively interpreting only those with statistically significant evidence. [57] III. ...
... An exploration looking at non-historical judgment may examine the consequences of exposure to and consideration of extraneous, uninformative information followed by a straightforward directive to ignore the displayed value and a subsequent prompt to respond by making an estimation. Dietvorst and Simonsohn (2019) demonstrated that subjects will reliably choose to use the to-be-ignored information despite overt bias; however, they can be convinced to ignore it under some circumstances, thus reducing the impact of the biasing effect. If not all bits of information are equally likely to be ignored, what sets them apart? ...
... Meta-analysis seems promising in its attempts to use existing data to generate an overall estimate of effects. But meta-analysis risks reifying selectively reported findings (12), is vulnerable to biased selection (13), is indifferent to the quality of the original evidence (14), and answers a question that no one was asking, i.e., what the average effect of dissimilar studies is (15). These are both useful approaches, but they have limitations. ...
... SD = 0.80) were recruited via Prolific Academic and paid for their time. We aimed to ensure a minimum of 50 participants in each of the four experimental conditions (Simmons et al., 2018). The study was only available to UK nationals and people who spoke English as their first language. ...