ArticlePDF Available

Publication bias in the social sciences since 1959: Application of a regression discontinuity framework

PLOS
PLOS One
Authors:

Abstract and Figures

While publication bias has been widely documented in the social sciences, it is unclear whether the problem aggravated over the last decades due to an increasing pressure to publish. We provide an in-depth analysis of publication bias over time by creating a unique data set, consisting of 12340 test statistics extracted from 571 papers published in 1959-2018 in the Quarterly Journal of Economics. We, further, develop a new methodology to test for discontinuities at the thresholds of significance. Our findings reveal, that, first, in contrast to our expectations, publication bias was already present many decades ago, but that, second, bias patterns notably changed over time. As such, we observe a transition from bias at the 10 percent to bias at the 5 percent significance level. We conclude that these changes are influenced by increasing computational possibilities as well as changes in the acceptance rates of scientific top journals.
This content is subject to copyright.
RESEARCH ARTICLE
Publication bias in the social sciences since
1959: Application of a regression discontinuity
framework
Julia Jerke
1‡
*, Antonia VelicuID
2
, Fabian WinterID
2
, Heiko RauhutID
2‡
*
1Swiss National Science Foundation, Bern, Switzerland, 2Department of Sociology, University of Zurich,
Zurich, Switzerland
JJ is Lead and first author with the largest contribution. HR is the PI.
*jerke@soziologie.uzh.ch (JJ); rauhut@soziologie.uzh.ch (HR)
Abstract
While publication bias has been widely documented in the social sciences, it is unclear
whether the problem aggravated over the last decades due to an increasing pressure to
publish. We provide an in-depth analysis of publication bias over time by creating a unique
data set, consisting of 12340 test statistics extracted from 571 papers published in 1959-
2018 in the Quarterly Journal of Economics. We, further, develop a new methodology to test
for discontinuities at the thresholds of significance. Our findings reveal, that, first, in contrast
to our expectations, publication bias was already present many decades ago, but that, sec-
ond, bias patterns notably changed over time. As such, we observe a transition from bias at
the 10 percent to bias at the 5 percent significance level. We conclude that these changes
are influenced by increasing computational possibilities as well as changes in the accep-
tance rates of scientific top journals.
1 Introduction
The increasing marketization of science created a breeding ground for biases in the scientific
literature [1]. Due to the introduction of success criteria such as the journal impact factor and
h-index to measure academic performance as well as the increasingly exclusive acceptance
rates for manuscripts in scientific top journals, incentives to follow individual self-interest
have increased nowadays, leading to a situation where researchers face a highly competitive
‘publish or perish’ system. Successful careers increasingly require constant production of origi-
nal, novel and innovative contributions, which are expected to be published in highly ranked
journals and become highly cited [24]. On the one hand, more pressure is likely to prompt
researchers to work on more selective, more surprising results, which are also more likely to
represent selective outliers or biased or even tweaked findings. On the other hand, journals
aim to establish impact and reputation by publishing papers that are likely to receive a high
number of citations. Given the fact that a paper’s quality is often confused with the statistical
significance of the reported results, this may lead editors and reviewers, and to prefer hypothe-
sis-confirming results [5,6]. Closely related to this is a phenomenon called “winner’s curse” in
PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 1 / 27
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Jerke J, Velicu A, Winter F, Rauhut H
(2025) Publication bias in the social sciences since
1959: Application of a regression discontinuity
framework. PLoS ONE 20(2): e0305666. https://
doi.org/10.1371/journal.pone.0305666
Editor: Yuyan Wang, New York University
Grossman School of Medicine, UNITED STATES
OF AMERICA
Received: January 10, 2024
Accepted: June 3, 2024
Published: February 14, 2025
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0305666
Copyright: ©2025 Jerke et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Data and code for the
reproduction of our results can be found at https://
doi.org/10.17605/OSF.IO/J58CF.
auction theory [7]. New significant results on the most original hypotheses win the race of get-
ting access to prestigious journals, but at the same time overestimate the strength of significant
effects. Strong evidence for this thesis is that replications, even when they can replicate an orig-
inal finding, systematically find weaker effects than those reported in the original study [8]. As
a result, the scientific literature is biased towards overly positive findings, resulting in an
unnatural over-representation of significant findings—one form of manifestation of the so-
called publication bias.
Publication bias can be described as a bias in favor of predominantly publishing significant,
hypothesis-conforming results. It has been shown that manuscripts reporting positive or sig-
nificant results have a higher publication probability, independently from the methodological
quality [9]. In particular this can imply that a paper reporting significant estimates is pub-
lished, whereas a paper of equal theoretical, conceptual and methodological quality with insig-
nificant results is rejected or not even submitted. Despite a wide range of definitions and
perspectives formulated by different authors, they largely agree that the published literature
may not be representative for the universe of performed research, creating a non-representa-
tive accumulation of significant results in the published state of the art.
The presence of a substantial publication bias has been clearly documented in many disci-
plines, including the social sciences [1014]. Most of the evidence is indirect. For example,
funnel plots in meta-analyses provide information on variance and mean values of effects as a
function of standard error. In a remarkable meta-analysis Doucouliagos and Stanley [15] con-
structed a funnel plot for 1424 studies on the effects of minimum wage laws on unemploy-
ment. Rarely there is an opportunity for direct observations of publications bias. An exception
are the studies by Turner (et al. [16]) and Franco (et al. [13]) who managed to look directly
into the file drawer of pharmacological studies and sociological studies, respectively. Because
the studies were preregistered, the published and non-published results could be compared. It
turned out that almost all significant results were published, while non-significant results
remained in the file drawer.
Even though the problem of publication bias has received increasing attention–evident, for
example, in the growing number of studies dealing with it–it remains unclear whether publica-
tion bias is actually a new phenomenon or whether it already existed for some time. Overall,
the evidence in this respect is mixed. Several studies indicate that the scientific literature suf-
fered from a substantial over-representation of significant results already several decades in
the past [10,1719]. More recent studies, however, suggest that the bias might have increased
over time [20,21]. Other studies, in contrast, indicate that bias patterns did not change
throughout the last years [22], or that the bias even decreased recently [23].
We contribute to this gap in the literature by presenting the longest and most fine-grained
data set on publication bias in the social sciences, more specifically in the economic discipline.
Our sample comprises all studies published between 1959 and 2018 in the Quarterly Journal of
Economics, and therefore guarantees a complete account of the last sixty years. To investigate
publication bias, we focused on quantitative contributions and, first, identified all articles that
reported an empirical study in which one or more hypotheses were tested. In a second step we
extracted statistical test results from all eligible articles. Overall, we collected 12340 test statis-
tics (zor t) from 571 papers published between 1959 and 2018. We will examine the empirical
distribution of the test statistics for discontinuities at the common levels of significance and
whether the patterns change over time.
Our study thereby further contributes to the methodological literature. We propose a novel,
more fine-grained and efficient procedure for detecting publication bias compared to previous
studies. In general, we test whether we find clustering of significant results at the right-hand
side of the common significance levels in the empirical distribution of test statistics. We,
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 2 / 27
Funding: Swiss National Science Foundation;
Starting Grant CONCISE, BSSGI0-155981.
Competing interests: The authors have declared
that no competing interests exist.
therefore, make use of the principle of regression discontinuity for our specific case. We apply
a procedure proposed by McCrary [24] and adapt it to testing for discontinuities in the distri-
bution of test statistics. More precisely, we investigate whether the density of results just above
the thresholds of significance significantly exceeds the density just below these cutoffs by com-
paring the left-hand and right-hand limit of a weighted regression at the threshold. In this
way, our proposed method allows a more fine-grained picture of publication bias.
In the remainder, we proceed as follows: In the next section, we provide an in-depth analy-
sis of how the cult-like fixation on statistical significance has led to misconceptions and even-
tually widespread biases in the scientific literature. We further review previous evidence on
changes in publication bias over time. We describe our sampling procedure and prominent
sample characteristics in the third section. The fourth section is dedicated to deriving our
methodology to detect discontinuities in the empirical distribution of test statistics. We present
our cross-sectional and longitudinal results on publication bias in section five and six. Eventu-
ally, we conclude our investigation with a broad discussion of our findings and concluding
remarks regarding policy changes that have the potential to mitigate publication bias in the
future.
2 Statistical significance, misconceptions and publication bias
2.1 How statistical significance testing became a trigger for publication bias
The concept of statistical significance is often misunderstood and mistakenly regarded as an
indicator of whether a theory or hypothesis is true or false [25]. However, not being able to
reject the null hypothesis does not imply its confirmation and it particularly does not justify
any conclusions about the alternative hypothesis. The same applies if the null hypothesis is
rejected; it does not follow that the alternative hypothesis is true. This misconception is joined
by another widespread but wrong belief: results that do not confirm the alternative hypothesis
are less valuable for scientific progress as they do not establish new “truths”. Several studies
indeed show that scientists often have difficulties with the correct interpretation of pvalues
and statistical significance [2628]. Further, an analysis of articles published in the European
Sciological Review indicates that a substantial proportion of published research is prone to a
misuse of statistical null hypothesis testing [29]. For instance, in roughly half of the articles
non-significant effects are interpreted as zero effects. In summary, statistical significance often
seems to be erroneously confused with importance, and significance testing too often boils
down to a binary decision about the existence of an effect instead of discussing its magnitude
[30].
Criticism against the concept of statistical significance has already been voiced just after it
gained currency. For instance, [31, p. 474] stated: “[. . .] tests of significance, when used accu-
rately, are capable of rejecting or invalidating hypotheses, in so far as these are contradicted by
the data; but [. . .] they are never capable of establishing them as certainly true. In fact [. . .]
‘errors of the second kind’ are committed only by those who misunderstand the nature and
application of tests of significance.” Selvin [32] even warned of the possible threat of creating
incentives to manipulate statistical analyses. Nevertheless, tests of statistical significance
became broadly accepted and are now one of the most commonly applied standards for judg-
ing empirical research.
The meanwhile heavy use of significance testing quite clearly contributed to the phenome-
non of publication bias and other far-reaching, unexpected and presumably unintended conse-
quences for what is published in science. For instance, Ioannidis (et al. [33]) re-examined 159
meta-analyses from various economic areas and demonstrate that the statistical power of the
analyses generally did not exceed 18 percent and that 80 percent of the meta-effects are
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 3 / 27
overestimated by a factor of at least two. They argue that underpowered studies with signifi-
cant findings are often merely reporting random noise or bias [34,35]. This is also reflected in
the overall proportion of significant findings in the scientific literature. Using a simple Bayes-
ian argument, Ioannidis [34] shows that the conditional probability that effects reported as sig-
nificant are really true can be very small. Under certain conditions, in particular a low a priori
probability of the truth of the hypothesis, a low statistical power, and the convention of α=
5%, it follows that “most published research findings are false” [34]. The majority of published
findings is significant; a pattern that did not change much over the last decades and that can
hardly be attributed to testing only plausible hypotheses [3,10,20,36,37]. In a comparison of
publications from different disciplines, Fanelli [3] shows that the range of “positive” results
varies from space science to psychology from about 70% to more than 90%. Thus, we can safely
assume an under-representation of non-significant results as a consequence.
2.2 Did increasing publication pressures lead to more publication bias?
Review and problems of previous trend studies
Gerber and Malhotra [9, p. 11] note that “evidence of publication bias has been found through-
out the social sciences since the 1950s”. Indeed, there are studies on publication bias which
date back several decades ago. One of the first scientists drawing attention to a potentially
biased literature is Sterling [17]. In his study, he found that out of a sample of roughly 300 psy-
chological articles more than 97% reported significant results. Some thirty years, later Sterling
[37] repeated the study and found comparable patterns, concluding that there were no changes
in publication bias over the years. This conclusion also implies that the tendency towards pre-
dominantly significant results already existed in the Fifties. A similar over-representation of
significant results was also found in economics [10], in sociology [18] and medicine [19].
Despite the fact that there were already signs of publication bias in earlier years, we hypoth-
esize that the bias has increased notably over the past decades. Two major arguments support
this notion. First, academia has experienced an enormous increase in competition. Due to
declining acceptance rates for manuscripts in top journals and the increasing measurement of
academic success by hard indicators, such as the journal impact factor and h-index, scientists
and journals are now under more pressure than ever [38,39]. This might have resulted in ever-
more original, innovative but presumably also more selective research being submitted to and
selected by journals. Second, with the advent and advancement of statistical software, statistical
analysis has become much more efficient and can nowadays also be increasingly automated.
From a technological and computational perspective, it has therefore become much easier to
repeat or tweak analyses until the output matches the desired results [21], often referred to as
p-hacking.
The results from empirical studies explicitly investigating the development of publication
bias over time are rather mixed. Fanelli [20], for instance, shows that the proportion of signifi-
cant findings increased remarkably between 1990 and 2007. He analyzes the results of over
4600 papers published in this period, and finds that across various disciplines (including the
social sciences) the average prevalence of positive results has increased by a factor of 1.22 from
roughly 70% to about 86%. Further indicative evidence, that publication bias has increased,
can, for example, be found in the study of Leggett (et al. [21]), who examine two top psychol-
ogy journals and find a much higher peak at just below p= 0.05 in the year 2005 than in 1965.
In contrast, Brodeur (et al. [22]) find no change in bias patterns in empirical economics over
the last fifteen years, and Vivalt [23] even find a decreasing bias for economic studies reporting
randomized controlled trials over the last twenty years.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 4 / 27
However, the preceding investigations and comparisons of publication bias over time entail
two problems and can so far only serve as weak evidence for changes in publication bias over
time. First, even though there are studies on publication bias since the 1950s, a comparison
with recent investigations only constitutes a comparison of a few single cross-sectional studies
in different scientific disciplines. Second, methods to detect publication bias have been largely
refined. Over the last ten years new strategies were developed and applied. Thus, comparing
older with more recent studies implies comparing results obtained with different methods and
from uncomparable data from different disciplines. Therefore, the current state of the art in
speculating about time trends of publication bias has to be treated very carefully.
Our investigation addresses both objections and can be regarded as the first study of a long,
consistent and fine-grained time trend in the social sciences. We, first, collected all data within
one single scientific journal for a time span of sixty years and, thus, created a consistent data
set that allows for inference on time effects. Second, we have carefully hand-selected only coef-
ficients which were related to hypotheses to reduce bias as much as possible. Lastly, we develop
a new method to test for publication bias and apply it to our unique data structure. This large-
scale investigation allows us to directly investigate whether and how bias patterns have
changed over time.
3 Setting and data
3.1 Journal selection
The setting of our empirical work are the social sciences, and more specifically the economic
discipline. This discipline is particularly interesting since it encompasses a broad variety of
topics and broadly overlaps with other disciplines such as sociology, political science, psychol-
ogy, education research or health science. The basis of our analyses is the Quarterly Journal of
Economics (QJE). We chose the QJE because it belongs to the elite group of highly ranked top
journals in economics. We focus on the group of top journals as we expect them to pioneer
general developments in the discipline. Further, they may often be the first target journals to
which manuscripts are submitted and therefore serve as role models of how and what to pub-
lish in one discipline. For example, rejected manuscripts which would have fit the QJE and are
rejected due to the low acceptance rate are often submitted elsewhere, making manuscripts in
QJE representative of a larger set of top manuscripts in the field. Further, we conjecture that
the top journals in a field have a large influence on methodological conventions, working styles
and topics in a field; hence potentially biased publications there will have a large influence on
topics and methods of upcoming research.
For the sake of conducting an extensive longitudinal analysis, it is not feasible to examine
all top journals, forcing us to choose one single journal. Founded in 1886, the QJE is the oldest
economics journal, supplying us with a simple decision rule that we expect to be uncorrelated
with the features we aim to investigate. Meaning, we do not expect to observe patterns in the
QJE that we would otherwise not see in other top journals of the social sciences.
3.2 Sample construction
We will apply a narrow interpretation of publication bias and investigate whether a artificial
accumulation of statistical results just above the common thresholds of significance can be
observed. We will therefore analyze the empirical distribution of test statistics published in the
QJE. We focus on the volumes 73 to 133, yielding an observation period that spans six decades,
1959 to 2018. In line with the outlined purpose of our investigation, we construct two data
sets. (1) The first data set contains all articles published in the observation period and the
respective article meta information (henceforth called article data set). (2) The second data set
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 5 / 27
contains the statistical test results extracted from eligible articles (henceforth called coefficient
data set). The coding process took four years. Both data sets were manually compiled in a
detailed, effortful and standardized procedure and represent a full sample of the literature pub-
lished in the QJE between 1959 and 2018. The main part of the work was due to selecting only
those coefficients for which hypotheses were formulated, making it necessary that an expert
reads all articles.
Article data set. We used the Web of Science Core Collection (WoS) data base to con-
struct an initial sample. Overall, the QJE published 2908 items from 1959 to 2018, including
notes, comments and replies. The WoS provides basic information such as title, name of
author(s), publication year, volume and issue, number of cited references and number of cita-
tions since publication. Via thorough content analysis we categorized all articles in either
empirical study, theoretical contribution, or discussion (note, comment, reply, etc.).
Coefficient data set. After having constructed the article data set, we extracted test results
following an elaborate two-step selection procedure: First, the selection of eligible articles, and,
second, the selection of therein reported effects. To be eligible, articles had to meet three inclu-
sion criteria, which we refer to as the three stages of our first-step selection (see Table 1, panel A).
In the first stage, we explicitly filter articles that report empirical studies and contain appro-
priate statistical tests. The reason is that we refer to publication bias as a systematic noise
induced by preferential selection of significant findings or manipulation of non-significant
effects that eventually may result in suspicious patterns in the distribution of published statisti-
cal test results. Thus, our operationalization naturally entails the exclusion of theoretical con-
tributions and discussion articles. However, this does not necessarily imply that non-empirical
research that develops models or theories is not prone to certain forms of bias. For instance,
Paldam [40] refers to t-hacking as a form of bias introduced by authors that carefully tailor the-
oretical assumptions to end with the desired and well publishable conclusion. Certainly inter-
esting, however, this form of bias is not addressed in this study. In some cases, scientific notes
also reported empirical studies. Since the majority of these empirical notes directly referred to
an article previously published in the QJE, it is likely to assume a higher publication probabil-
ity. Hence, these notes were excluded from the publication bias analysis. However, we included
the few cases in which such empirical notes did not refer to articles previously published in
QJE. Overall, 1094 articles reported empirical studies.
In the second selection stage, we excluded articles that do not report and test hypotheses in
the sense that the authors formulate one or more initial expectations regarding the effect or
Table 1. Selection procedure to construct the coefficient data set.
A. First selection step
N
Initial sample: All publications 2908
(number of observations in the article data set)
Inclusion criteria
First stage: Articles that report empirical studies 1094
Second stage: Articles that state and test hypotheses 775
Third stage: Articles that report the respective test statistics or standard errors 609
B. Second selection step
Coefficients extracted from third stage articles 13736
Coefficients corresponding to two-sided tests 13451
Coefficients with abs(z)<10 12340
https://doi.org/10.1371/journal.pone.0305666.t001
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 6 / 27
relationship they investigate. By this means, we dismiss purely descriptive or explorative arti-
cles. In line with other studies e.g., [14], we strike a balance between the very restrictive proce-
dure of authors such as Gerber and Malhotra [9], that only include articles stating explicit
hypotheses, and authors such as Masicampo and Lalande [41] that do not restrict themselves
on articles with hypotheses at all and extract any reported test results. Instead we focus on arti-
cles that state either explicit or implicit hypotheses. By this means, we did not only include arti-
cles where the central theoretical assumptions were highlighted with keywords such as
“hypothesis”, “prediction”, “proposition” or “expectation” but also articles missing such key-
words where the expected effect could be deduced from the text. This made it necessary to
read all articles in detail from one expert, making this a very time-consuming enterprise lead-
ing to a unique data set. In more detail, we first ran a text search with the keywords “hypoth*”,
“predict*”, “propos* and “expect* (abbreviations were used to not erroneously ignore diver-
gent wordings, i.e. “hypothesis” versus “hypothesize”) to identify explicit hypotheses. Second,
we also carefully reviewed articles not containing any of these keywords for implicit hypothe-
ses. In total 775 articles met the criteria from the first two stages.
In the third and last stage we further excluded articles that failed to provide test statistics or
related estimation results that allow their calculation such as standard errors, sample size or p
values. Additionally, we also excluded papers that were based on simulated data and papers
that reported non-parametric tests only (such as chi-square tests), hence, restricting our data
to analyses for which parameters were tested using the Student’s t or the normal distribution.
Overall, the above described three-stage selection procedure reduced the initial 2908 publi-
cations to 609 eligible articles. These remaining articles all include empirical studies with
hypotheses and appropriate test statistics. After this first-step selection, we performed the sec-
ond-step selection. The second step describes the selection of extracted test results from the set
of eligible articles in order to construct our coefficient data set (see Table 1, panel B).
The main challenge for selecting appropriate coefficients from the remaining eligible arti-
cles is the identification of coefficients which relate to hypotheses. Therefore, we carefully read
the respective results section and inspected regression tables to clearly link the reported coeffi-
cients to the hypotheses. While we have applied broad inclusion criteria of what counts as arti-
cle with hypotheses, we have been relatively restrictive in deciding which coefficients relate to
the stated hypotheses.
In particular, we were careful not to extract multiple coefficients from regression models
building upon each other since these are expected to be highly correlated. This procedure pre-
vents potential inflation of our results on publication bias. As a rule of thumb we always
extracted the coefficients presented in the full model including control variables. Sole excep-
tion were coefficients reported in partial models to which the authors explicitly referred to as
evidence for or against their hypotheses. We did not extract test results that were of one of the
following characteristics: the coefficient is the result of testing identification assumptions
(including results from first stage regressions); the coefficient is linked to a hypothesis expect-
ing a null effect (including randomization checks, placebo tests, falsification checks etc.), or
the coefficient is estimated as part of robustness tests related to prior central results. We
excluded test coefficients that were expected to be zero since the term publication bias refers to
an over-representation of significant results. Coefficients with an expected null-effect on the
contrary potentially trigger different mechanisms and strategies that contrast with strategies
for coefficients that are expected to be significantly non-zero. Also, for the purpose of parsi-
mony, we excluded robustness tests as they generally repeat previous central analyses applying
divergent specifications and, therefore, yielding results that are often similar to those from ear-
lier analyses. Including them would possibly inflate our results. All our analyses we have con-
sidered only coefficients corresponding to two-sided tests. In our analyses we focus exclusively
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 7 / 27
on coefficients we found for two sided tests and that reported z-values smaller than 10 as
extremely large values are excessively influential in our test procedure. In total we collected
12340 test results from 571 articles.
We constructed our coefficients data set using tvalues. Note, however, that in most cases
these were not reported in the paper. We therefore calculated them by dividing the regression
coefficient by its standard error. In a few other cases the authors only reported the coefficient
and its pvalue. Using information on the degree of freedom and the respective distribution
function we could calculate the test statistic also in these cases.
Transformation to z-scores and weighing. We transformed all tscores to zscores. This
was necessary because the majority of our extracted test results stem from regression analyses
with different sample sizes, and, therefore, contains tstatistics with varying corresponding
degrees of freedom. In most cases the authors did not report the degrees of freedom. We then
calculated it by df =N#coefficients in the regression equation, including the constant and pos-
sible fixed effects. Since the Student’s tdistribution depends on the degree of freedom, our t
statistics will be tested against different theoretical distributions yielding a variety of corre-
sponding critical cutoff values for significance and making it impossible to investigate whether
there is bias at the common levels of significance. For each tstatistic we, therefore, determine
the individual pvalue and re-calculate the respective zstatistic that would get assigned the
same pvalue. To be more concrete, we undertake the following transformation:
pðtijÞ ¼ 2ð1T:dist½tij ;dfijÞ ) zij ¼Norm:Inv 1pðtij Þ
2
;ð1Þ
where t
ij
is the tstatistic from coefficient iof article jand df
ij
the respective degrees of freedom.
p(t
ij
) is the corresponding pvalue and z
ij
the zstatistic resulting from the transformation. Fol-
lowing this procedure we receive a consistent distribution of zstatistics; each one correspond-
ing to a tstatistic from our data set adjusted for the respective degrees of freedom.
We weigh our data based on the number of coefficients in the respective article to equally
account for articles reporting few and many coefficients. We perform this transformation
based on the arguments of the preceding study by [14]. We construct the weighting variable
using the inverse of the number of coefficients extracted from each article. Hence, the more
coefficients from one article the lower their weight. We will report results for raw and imputed
data as well as for unweighted and weighted data.
3.3 From theory to empirics and towards more complex analysis
Prior to the analyses on publication bias, we will document several important characteristics of
the data that might already bear implications with respect to publication bias. Fig 1 shows how
the ratio of theoretical to empirical contributions changed over time.
We further observe a rise in complexity of empirical analyses indicated by an increasing
number of coefficients that we collected from the articles and by increasing sample sizes. In
the first decade (1959–1968) we extracted on average 12.4 coefficients per article and the num-
ber of coefficients rises to 29.4 in the most recent decade (2009–2018). Remember that we only
extracted core coefficients that were relevant for the decision for or against the hypothesis of
interest. It is therefore plausible to attribute the risen number of coefficients per article to
more complex and comprehensive analyses. In tandem with the increase in core coefficients
per paper, sample sizes have notably increased. Fig 2 shows the distributions of the sample
sizes. The left figure in Fig 2 shows the raw distribution of sample sizes over time. We log-
transformed the sample size to visually account for very large sample sizes. The roughly linear
increase of the logarithmized sample size suggests an exponential increase which is further
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 8 / 27
underlined by the median of sample sizes shown in the right figure (in Fig 2). While the
median sample size is 52 before 1990, it increases to 1904 after 1990. The trend towards more
complex analyses with larger samples is very likely driven by technological and computational
progress and by increasing access to large public data files, but in addition may also be an indi-
cation that the expectations from manuscripts eligible for top journals has risen, contributing
to the notion of an increasing pressure to publish or perish. The positive side of this develop-
ment is that with the increasing sample size the power of statistical tests has increased.
Despite the increase in complexity of analyses and sample sizes, we do not see an increase
in the share of significant results (see S1 Fig in the S1 File). Overall, 64% of the coefficients are
significant at least at the 10 percent significance level. The 5 percent significance level was met
by 56% of the coefficients and the 1 percent significance level was met by 41% of the
Fig 1. Composition of the QJE over time, 1959–2018. The graph shows the proportional distribution of theoretical and empirical contributions and
discussions (e.g. notes, comments, replies) over the observational period. Shown are stacked proportions.
https://doi.org/10.1371/journal.pone.0305666.g001
Fig 2. Sample sizes over time. Figure on the left shows sample sizes over time. Each dot represents the respective sample size of a single coefficient of our
coefficient sample. The sample size was log-transformed to account for very large sample sizes. The line indicates a locally weighted regression. Figure on the
right shows the median sample sizes for each year.
https://doi.org/10.1371/journal.pone.0305666.g002
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 9 / 27
coefficients. Only 36% of the coefficients in our coefficient data set are not significant at any
conventional significance level. The lack of an increase in significant findings over time is sur-
prising for two reasons. First, such an increase has been previously documented in other stud-
ies e.g., [20]. Second, since the sample size is directly related to the statistical significance of a
test statistic, we would have expected the substantial increase in sample size to be associated
with increasingly significant results. We performed a robustness check by repeating the visual
inspection with the weighted data as well as including the number of coefficients per article in
the regression (see S2 Fig and the corresponding regressions in S1 Table in the S1 File), and
again do not see an increase over time.
4 A novel method for detecting publication bias
4.1 Challenges in the measurement of publication bias
There does not yet exist a standard procedure to investigate publication bias. This reflects the
general difficulty of measuring the absence of negative results based solely on the body of pub-
lished results. Most current detection methods of publication bias within samples of heteroge-
neous effects draw on the empirical distribution of test statistics or pvalues. They share the key
assumption that this distribution is supposed to be continuous [21,22,41]. Even though the
exact shape of the underlying distribution is unknown, it certainly should not have any jumps
or discontinuities; and in particular not at the (virtually arbitrary) common critical thresholds
of significance such as 1.64 or 1.96 for the ten percent or five percent level of the standard nor-
mal distribution.
One frequently used method to investigate publication bias in samples of highly heteroge-
neous effects is the caliper test [9,12]. It has meanwhile been applied in various disciplines
(e.g., sociology: [12,42,43]; economics: [23,44]; political sciences: [9]). The method is particu-
larly appealing since it is easy to apply and requires only few prerequisites. In brief, the test
compares the absolute frequency of test values in narrow and equal-sized intervals just above
and just below the critical thresholds of significance and evaluates the difference between
those two frequencies with a simple binomial test. However, the test has several weaknesses: 1)
it is only based on a very small subset of the distribution of test statistics, 2) the results strongly
depend on the size of the intervals and it remains unclear how to determine the optimal width,
and 3) the test is not immune to low precision of the reported test statistics and clustering that
is independent from publication bias.
Since then, further methods have been proposed to overcome these weaknesses. For
instance, [44] extend the basic caliper test with Monte Carlo simulations to account for low
precision of reported test results. [21,41] fit an exponential function to the empirical distribu-
tion of pvalues and conduct residual analyses at the critical threshold of significance. [45]
compare published test statistics with critical values that are adjusted for a file drawer bias.
And [14,22] construct a counterfactual distribution of test statistics to estimate the inflation of
significant results.
Following, we propose a novel method to measure publications bias that also overcomes
the weaknesses of the caliper test and combines some of the strengths of the above mentioned
advancements. Our approach is based on the principle of regression discontinuity and has sev-
eral advantageous features that we will briefly discuss. First, we use the full range of the distri-
bution of test statistics, giving higher weight to the area near the significance thresholds. Thus,
we do not discard any region of the distribution while still placing a particular focus on the
region where publication bias is likely to become visible. Second, the procedure makes use of
local linear smoothing and therefore—at least to some extent—corrects for clustering due to
low rounding precision. For instance, as [14,44] point out, a natural clustering at the value
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 10 / 27
z= 2 must be expected since zvalues equal the ratio of a coefficient and its standard error and
since those two values are often reported with a low rounding precision. Thus, by applying a
smoothing process, we make sure that a discontinuity at z= 1.96 not just reflects a clustering at
z= 2. Third and last, the procedure is very simple to implement and does not require specific
prerequisites.
In what follows, we will describe in detail how we set up our test procedure and explain
why it is well suited to investigate publication bias. For the sake of completeness, we will later
also rerun our analyses with one of the most commonly used methods to test for publication
bias, the caliper test, and report the results in the S1 File.
4.2 Regression discontinuity applied to publication bias
We apply a procedure known from the literature on regression discontinuity. Widely used in
the social sciences, regression discontinuity is a quasi-experimental design that is usually
applied to estimate treatment effects whenever individuals are assigned to a treatment based
on an underlying random assignment variable R[46]. Meaning, those whose value for R
exceeds a certain cutoff value receive the treatment, the others do not. The treatment effect can
then be estimated by comparing the outcome of interest for individuals just below and just
above the cutoff assuming that they are virtually similar regarding certain basic characteristics
as they only marginally differ in R[47].
Comparable to our problem of detecting unnatural jumps in the distribution at certain cut-
offs, regression discontinuity requires that there is no unnatural over-representation of cases
just above the cutoff value and, accordingly, also no underrepresentation of cases just below
the cutoff. This would otherwise pose a major threat to the validity of regression discontinuity
since such jumps in the distribution may indicate manipulation of the assignment variable R
by the observed individuals. For example, if individuals anticipate a potential benefit associated
with receiving the treatment they may possibly manipulate their value of Rto get assigned. The
estimated treatment effect will then be biased as the required comparability of individuals near
either side of the cutoff is not met. This is also called sorting; meaning that individuals that
otherwise would not have been assigned to the treatment now receive the treatment, resulting
in a discontinuity in the density distribution of the assignment variable at the cutoff.
[24] introduced an easily applicable procedure that tests for these discontinuities at given
cutoff values. We apply this idea to the framework of publication bias. In our case, we do not
analyze individuals receiving the treatment, e.g. a social policy benefit once they score higher
than a given cutoff, but rather coefficients that receive the treatment significance once their
corresponding test statistic exceeds the cutoff value of the applied decision rule of statistical
significance. For instance, a zstatistic exceeding 1.96 (1.64) is considered significant on the five
(ten) percent level, a zstatistic below those values is considered insignificant. Note that we
actually do not conduct a full-fledged regression discontinuity analysis as we are not interested
in an outcome variable after treatment; but we test for discontinuities in the distribution of test
statistics by applying the McCrary test.
The McCrary test can be applied in situations where manipulation in Rby the individuals is
in principle possible and monodirectional [24]. Both is given in our case of application.
Manipulation in the distribution of published test results is possible, for instance, by preferen-
tial publication of positive and significant effects, driven by journals and authors. The same
applies for tweaking of data by researchers to yield significant results. In addition, the manipu-
lation is monodirectional, since the opposite implies that test statistics are manipulated down-
wards to be insignificant. Conversely, in cases where unrelatedness between two concepts is
the desired outcome there may be reasons to manipulate or select coefficients towards non-
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 11 / 27
significance. But, as explained before, we did not extract coefficients for which a zero effect
was predicted. Hence, there is no bias to be expected from that side.
4.3 Setup for testing discontinuities at the thresholds of significance
The setup for our application of the McCrary test is as follows ([24] provides further mathe-
matical details). The assignment variable Ris the test statistic. The cutoff value c, where a dis-
continuity is expected, is the significance threshold, for instance c= 1.96.
The test requires two steps. In the first step, an undersmoothed histogram showing the
plain density distribution is constructed for R. In our case this is simply the empirical distribu-
tion of test statistics. In the second step, a weighted local linear regression smoothes the histo-
gram by regressing the normalized height of the bins is regressed on the respective bin
midpoints. In brief, this means that the height of a particular bin is estimated with a weighted
linear regression where the bins closest to the point of interest receive most weight. Local
smoothing thereby takes place within the range of a cautiously chosen bandwidth h. Continu-
ity in cis then given when the left-hand and the right-hand limits for care equal and the dis-
continuity estimate of interest θis expressed as the log difference in height:
y¼ln lim
r&cfðrÞ ln lim
r%cfðrÞ:ð2Þ
where ris the value that the running variable Rtakes and f(r) is the respective estimation result
of the weighted linear regression at r.
As McCrary [24] outlines, however, it is more accurate to perform local linear smoothing
separately for the left-hand side and the right-hand side of cyielding two separate regressions
that can be compared in c:
^
yln ^
fþln ^
f:ð3Þ
whereas ^
fþis the estimate from the right-hand and ^
fthe estimate from the left-hand regres-
sion. It can be shown that ^
yis asymptotically normal and consistent, and the respective equa-
tion for the approximate standard error ^syto compute confidence intervals is given by
^sy¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
nh
25
4
1
^
fþþ1
^
f
!
v
u
u
t:ð4Þ
where nis the sample size and hrefers to the chosen bandwidth.
For the test to produce valid results, a suitable bandwidth hfor the local smoothing must be
chosen such that both excessive noise is avoided when his very small, and oversmoothing at
the cutoff is avoided when his too large. The McCrary test is implemented in various statistical
programs such as Stata and R, and with it comes an automatic procedure to select the appro-
priate bandwidth. In the following analyses we refer to this bandwidth as the default band-
width. The automatic bandwidth selection is based on a two-step procedure. First, a histogram
with bin size b¼2^sn1
2is created where ^sis the standard deviation of the sample of test statis-
tics. Second, to compare the observed binned distribution to the distribution one would expect
without interference, a fourth-order polynomial regression is estimated on the right-hand and
the left-hand side of the cutoff. The optimal bandwidth is then based on a bandwidth selector
[48] and is given as the average of a left-hand and right-hand estimated expression that is fea-
turing the mean-squared error of the polynomial regression, its second derivative and a con-
stant derived from the integrals of the kernel. See [24, p. 705] for the exact calculation formula.
To investigate the sensitivity of our results we will additionally provide results with varying
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 12 / 27
bandwidths and determine the optimal bandwidth by mainly relying on visual inspection of
the discontinuity plots that we provide together with the quantitative estimation results. The
McCrary procedure not only requires a bandwidth for the local smoothing but also a bin size
for the underlying histogram. However, McCrary [24] demonstrates that the estimation of ^
yis
generally robust to the choice of the bin size. We therefore refrain from reporting results for
varying bin sizes and use the default bin size determined by the algorithm.
Summarizing, the test yields an estimate ^
yfor the discontinuity at a specific cutoff value cin
the distribution of test statistics by comparing the right-hand side and left-hand side limit of
the distribution in c. Note, that the test does not use the raw distribution but rather a smoothed
distribution, thereby accounting for noise in the data. It is precisely this aspect that ensures
that an observed over-representation of values at the 5 percent significance level, for example,
does not simply reflect rounding imprecision at the value z= 2. Since ^
yis approximately nor-
mal, we can test whether it significantly deviates from zero by taking the ratio of ^
yand its stan-
dard error ^syand testing it against H
0
:θ= 0 with a z-test. Moreover, as a useful extra,
estimating θallows us to draw conclusions about the actual size of the discontinuity in c. With
the transformation
c¼exp ð^
yÞ exp ln ^
fþln ^
f
¼exp ln
^
fþ
^
f
!" #¼^
fþ
^
fð5Þ
we can express the difference in the densities just above and just below the threshold of signifi-
cance as a percentage value.
5 Cross-sectional results on publication bias
5.1 Cross-sectional visual analysis of publication bias
We first analyze whether our data generally indicates publication bias, pooling all coefficients
over time. We exclude one-tailed test statistics (n= 285) for all following analyses. The most
direct analysis of publication bias inspects whether there are unnatural spikes in the distribu-
tion of test statistics at the common levels of statistical significance. Fig 3 shows the distribu-
tion of test statistics for the full sample. Panel (a) shows the raw and unweighted data. The
distribution is of remarkable shape and strikingly resembles the patterns documented by Bor-
deur [14]. We find a bimodal distribution for which the kernel density estimate prompts local
maxima around z= 2.1 and z= 0.5 and a local minimum around z= 1.3. The position of the
second local maxima at z= 2.1 indicates a clustering of results that are significant with at least
5 percent.
The data shows an accumulation of significant findings and an over-representation of sig-
nificant results just above the thresholds of significance (i.e. publication bias). These observa-
tions are consistent with two explanations. First, researchers often test plausible hypotheses
supported by theoretical considerations that have a good chance to yield significant findings.
Second, the clustering at z= 2.1 suggests an over-representation of significant results due to
preferential selection or manipulation which corresponds with our narrower interpretation of
publication bias. Even though we can not explicitly test the first explanation, we acknowledge
that it is unable to fully account for the local minimum at z= 1.2 and the valley between z= 1
and z= 1.6. Such a valley indicates the lack of expected observations and is suggestive for pub-
lication bias [14]. One consequence of these newly formed valleys or crater is the emergence of
two peaks on the left and right sides, resulting from the effect shifting towards the peaks. This
phenomenon is more discussed in Bordeur (et al. [14]). Essentially, the valleys play a crucial
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 13 / 27
role in selecting high-density data points, leading to the formation of local maxima. The first
peak is merely displaced, implying it would not exist without the shifts in values.
Fig 3a) entails additional suggestive evidence for publication bias. Remember, that integer
values might be slightly over-represented due to low rounding precision. Considering that we
do not observe such notable peaks at other integer values, the prominent outlier at z= 2 may,
however, at least to some degree indicate some form of manipulation such as favourable
rounding. For illustration, consider the following example of favorable rounding. Consider a
hypothetical regression coefficient of 0.585 and a corresponding standard error of 0.304. Their
ratio is 1.92, yielding a test statistic which is not significant at 5 percent. However, rounding
the coefficient and its standard error to two decimals, a ratio of just 1.97 is obtained, which is
now seemingly significant at the 5 percent level. Technically, such form of rounding is mathe-
matically correct. It nevertheless enables researchers to make results appear like being signifi-
cant though they are not, without involving actual fabrication of results.
Fig 3b) shows weighted data and confirms the observations from unweighted data. Here,
the test statistics are weighted by the number of coefficients per article. We again observe a
bimodal distribution with a small valley in the area around z= 1.2 and a notable peak at z= 2.
Hence, the overall shape is similar to panel (a) except that we observe much less non-signifi-
cant results. This is consistent with the previous observation that articles that report many
coefficients also report relatively more non-significant coefficients. Reducing their weight
automatically results in a reduced density of coefficients in the non-significant range of the
distribution.
5.2 Cross-sectional discontinuity analyses
In this section we present the discontinuity estimates for the full sample. As described in the
method section, we apply the McCrary test to estimate the size and significance of the
Fig 3. Distribution of test statistics with coefficient data. The graphs show the distribution of test statistics for the complete observation period, 1959–
2018. In panel (b) the data is weighted by number of coefficients per article. The red lines indicate, in that order, the values 1.64 (10 percent significance
level), 1.96 (5 percent significance level) and 2.58 (1 percent significance level).
https://doi.org/10.1371/journal.pone.0305666.g003
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 14 / 27
discontinuity at the common thresholds of significance. The test fits two separate weighted
local linear regressions to the left and the right of the specified cutoff value cand compares the
predicted values for cestimated by the two regressions (see Eq 3). The outcome ^
yis the differ-
ence of the natural logarithms of these values.
Fig 4 plots the results of the discontinuity estimations for unweighted (left column) and
weighted data (right column) for the whole sample spanning the sixty years of our observation
period. Panel (a) and (b) report the results for the 10 percent significance level (cutoff
c= 1.64), panel (c) and (d) report the results for the 5 percent significance level (cutoff
c= 1.96) and panel (e) and (f) report the results for the 1 percent significance level (cutoff
c= 2.58). Throughout the article we report the cutoff values rounded to two decimals. For our
analyses we used more precise values: c= 1.64485 for the 10 percent significance level,
c= 1.95996 for the 5 percent significance level and c= 2.57583 for the 1 percent significance
level.
There is a large discontinuity at the 10 percent threshold of statistical significance. This is
clearly demonstrated in panel (a). The size of the discontinuity ^
yis precisely estimated as 0.236
(s.e. = 0.059, p<0.001). This means that the density is around 27% larger at the right-hand side
of the threshold of significance than on the left-hand side (exp(0.236) = 1.27). This shows a
clear over-representation of results just above the 10 percent significance level. Further, there is
a medium discontinuity at the 5 percent significance level (Panel (c)). However, the bias is rela-
tively smaller compared to the 10 percent level. The discontinuity estimate is ^
y¼0:123, statis-
tically significant (s.e. = 0.050, p= 0.014) and translates to an over-representation of about 13%
of values just below the 5 percent significance level. Apparently, the estimation is unaffected by
the peak at z= 2. In contrast, there is no indication of publication bias at the 1 percent level.
Panel (e) shows no sign of a discontinuity at the threshold. The discontinuity estimate at the 1
percent threshold is ^
y¼ 0:093 with a standard error of s.e. = 0.059 and a p-value of p= 0.116.
We can confirm these results by analyses with weighted data. As before we weight by the
number of coefficients that we extracted from the same paper. The results are shown in panel
(b), (d) and (e) of Fig 4. The discontinuity estimates are 0.313 (s.e. = 0.064, p<0.001) for the
10 percent significance level, 0.274 (s.e. = 0.047, p<0.001) for the 5 percent significance level,
and -0.025 (s.e. = 0.053, p= 0.631) for the 1 percent significance level. Overall, the estimates
are notably larger for weighted data than for unweighted data. That is again the result of the
fact, that papers with fewer coefficients report more significant coefficients on average and
that, by weighting the data, we give more weight to those articles. However, the estimates for
unweighted and weighted data are qualitatively comparable: the discontinuity is large to mod-
erate and significant for the 10 and 5 percent significance level and small and insignificant for
the 1 percent significance level.
To sum up, our cross-sectional analyses of publication bias reveals the following patterns.
We find a notable and robust over-representation of test statistics just above the 10 percent sig-
nificance level that can hardly be explained by rounding or testing of overly plausible hypothe-
ses. It can, though, be attributed to preferential selection of significant results by the journal as
well as the authors—in terms of not submitting non-significant results—or the pushing of
results just above the threshold of significance. In addition, there is publication bias at the 5
percent significance level; however, less pronounced compared to the 10 percent level. There is
no indication of publication bias at the 1 percent level.
5.3 Sensitivity of cross-sectional discontinuity analyses
As described before, choosing an appropriate bandwidth hfor estimating the discontinuities
can affect the results. All graphs in Fig 4 use the default bandwidth automatically determined
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 15 / 27
Fig 4. Discontinuity plots, 1959–2018. The graphs show the results of the discontinuity estimation by applying the McCrary algorithm for the 10,
5 and 1 percent significance level for unweighted, left column, and weighted data, right column, respectively. The graphs plot the distribution (grey
circles) and the local linear density estimation (emerald line) with the respective 95% confidence band. Standard errors for ^
yare reported in
parentheses. The estimations use the respective default bandwidths and correspond to the first row, column(1), (5) and (9), in Table 2 for
unweighted data and in S2 Table in S1 File for weighted data.
https://doi.org/10.1371/journal.pone.0305666.g004
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 16 / 27
by the estimation algorithm. To test the sensitivity of our results to the choice of the band-
width, we repeat the analyses for varying bandwidths and report the results in Table 2, first
row (we provide the according results for weighted data in S2 Table in the S1 File).
Columns (1)-(4) refer to the 10 percent significance level. In column (1) we reproduce the
results from the discontinuity plot. The following columns show the result for the bandwidths
0.8, 1.2 and 1.6 (see S3 Fig in S1 File for the corresponding discontinuity plots). The sensitivity
results at the 10 percent significance threshold confirm our main analyses. There is a slight
increase of ^
ywith increasing bandwidth h. We further see a highly significant discontinuity at
the 10 percent significance level for all choices of h, ranging from 0.21 to 0.37. Inspection of
the discontinuity plots suggests that the differences between the estimates result from the
right-hand side of the local smoothing. The close-by local maximum naturally has a stronger
effect on the discontinuity estimation at c= 1.64 for larger bandwidths. This also shows a gen-
eral difficulty with separating analyses of the different significance levels with large band-
widths. Since the cutoff values c, for which we estimate the discontinuity, lie close to each
other, discontinuities at a particular point may affect the estimation at other points. For
instance, the discontinuity estimate for c= 1.64 may be inflated by the peak at z= 2, and, vice
Table 2. Discontinuity estimates for the common significance levels, full sample and sub-samples.
chosen
bandwidth h
c= 1.64 c= 1.96 c= 2.58 N
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) coefficients
articles
[default] 0.8 1.2 1.6 [default] 0.8 1.2 1.6 [default] 0.8 1.2 1.6
1959–2018 0.236*** 0.212*** 0.305*** 0.372*** 0.123** 0.096*0.153*** 0.210*** -0.093 -0.082 -0.166*** -0.233*** 12340
(0.059) (0.063) (0.051) (0.044) (0.050) (0.056) (0.046) (0.041) (0.059) (0.063) (0.051) (0.044) 571
estimated
bandwidth
[0.90] [1.01] [0.89]
q1: 1959–1973 0.843** 0.254 0.820** 0.988*** 0.206 0.141 0.228 0.431*0.897** 0.949** 0.370 0.064 270
(0.400) (0.480) (0.406) (0.358) (0.293) (0.327) (0.286) (0.262) (0.422) (0.447) (0.318) (0.265) 28
estimated
bandwidth
[1.25] [1.11] [0.86]
q2: 1974–1988 0.631*** 0.775*** 0.691*** 0.590*** 0.096 -0.061 0.040 0.106 -0.278 -0.109 -0.209 -0.283 683
(0.210) (0.289) (0.232) (0.193) (0.187) (0.239) (0.202) (0.178) (0.201) (0.274) (0.227) (0.194) 51
estimated
bandwidth
[1.40] [1.43] [1.50]
q3: 1989–2003 0.079 0.080 0.118 0.187** 0.050 0.002 0.066 0.136*-0.174*-0.177 -0.185** -0.204** 3876
(0.108) (0.113) (0.093) (0.080) (0.092) (0.108) (0.088) (0.077) (0.094) (0.110) (0.091) (0.079) 198
estimated
bandwidth
[0.88] [1.09] [1.11]
q4: 2004–2018 0.266*** 0.220*** 0.343*** 0.421*** 0.159** 0.153** 0.200*** 0.248*** -0.088 -0.048 -0.164** -0.249*** 7511
(0.074) (0.080) (0.065) (0.056) (0.064) (0.070) (0.058) (0.051) (0.074) (0.081) (0.065) (0.055) 294
estimated
bandwidth
[0.93] [0.95] [0.96]
Note: Each cell reports the results of a single McCrary discontinuity estimation ^
y. In column (1), (5) and (9) we report in brackets the bandwidth that the local linear
smoothing procedure determined by default. Additionally, we vary the bandwidth hin the other columns to examine the robustness of our results. We perform the
analyses for the 10 percent significance level, indicated by c= 1.64, in columns (1)-(4), for the 5 percent significance level, indicated by c= 1.96, in columns (5)-(8) and
for the 1 percent significance level, indicated by c= 2.58, in columns (9)-(12) Standard errors are reported in parentheses.
*p<0.10,
** p<0.05,
*** p<0.01.
https://doi.org/10.1371/journal.pone.0305666.t002
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 17 / 27
versa, the estimate for c= 1.96 may be deflated by the discontinuity at c= 1.64. However, this
can be circumvented by choosing smaller values for the bandwidth hand critical inspection of
the corresponding discontinuity plots.
The sensitivity analyses at the 5 and the 1 percent significance level also confirm our main
results. Table 2, columns (5)-(12), first row, show the respective results (further, see also S4
and S5 Figs in the S1 File). The estimates for the 5 percent level are about half the size com-
pared to the estimates for the 10 percent level and at least marginally significant for all band-
widths. The discontinuity plots show a better fit for smaller bandwidths and the peak at z= 2
does not seem to substantially affect the estimation. As in our main analyses, we observe no
discontinuity for the 1 percent significance level. Either our estimation procedure is unable to
detect a discontinuity or there is no discontinuity. Inspection of the discontinuity plots sug-
gests that both explanations may be valid. As for the 10 percent significance level, the close-by
local maximum substantially biases the estimation particularly for larger bandwidths. Since
the local maximum is now at the left-hand side of cutoff, we even observe negative discontinu-
ity estimates. Relying on smaller bandwidths, we see that there is no discontinuity at c= 2.58.
Overall, the sensitivity analyses confirm our main results. We find a notable and robust
over-representation of test statistics just above the 10 percent significance level. There is a
moderate over-representation of significant results at the 5 percent level. There is no indica-
tion of publication bias at the 1 percent level.
6 Longitudinal results on publication bias
Since the main objective of our study is to investigate whether patterns of publication bias
have changed over time, we extend the analyses from the previous section and take account for
the longitudinal nature of our data. For a first inspection, we present results for four separate
observational periods. Therefore, we divide the 60 years into 15-year intervals. Table 2, row
(q1) to (q4), reports the discontinuity estimates for the four sub-samples for the 10, 5 and 1
percent significance level (the corresponding discontinuity plots for each estimate are pro-
vided in the S3 to S17 Figs in S1 File).
Our results demonstrate a shift from publication bias at the 10 percent level to publication
bias both at the 10 percent and at the 5 percent level. Columns (1) to (4) provide results for the
10 percent significance level. The discontinuity is large and significant for the first half of our
observation period. With respect to the default bandwidth, the estimates range between 0.63
and 0.84. For the second half of our observation period, in contrast, we observe much smaller
discontinuity estimates that are only significant for the most recent period. Overall, this indi-
cates that the bias at the 10 percent significance level was substantial in the sixties through
eighties of the last century, decreased in the nineties and increased again in recent years
although the effect raised to a much lower level than in the first half of the observation period.
In column (5) to (8) we report the results for the 5 percent significance level. In contrast to the
10 percent significance level, the discontinuity is largest for the most recent years only. Though
substantial, the bias of 0.16 for the 5 percent level in the most recent period is still lower than
the bias of 0.27 for the 10 percent level. There is no substantial discontinuity and no significant
trend of publication bias at the 1 percent level. Column (9) to (12) report the results for the 1
percent significance level. The discontinuity is moderate and only marginally significant for
the first quarter of our observation period. For the second to fourth quarter, in contrast, the
estimated discontinuity is small and negative. As discussed in the previous section we are less
confident on whether we are able to reliably estimate the discontinuity at the 1 percent signifi-
cance level.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 18 / 27
To fully exploit the longitudinal nature of our data, we generalize our analyses by what we
call moving discontinuities. This analysis considers varying interval sizes and boundaries. This
is particularly useful considering that we extracted considerably less coefficients in earlier
years than in recent years. Further, this ensures that our results do not depend on a particular
choice of sub-samples.
The details of the procedure are as follows. We compute moving discontinuities comparable
to the principle of moving averages:
^
y1959;k¼^
yðZ1959;Z1960 ;...;Z1959þk1Þ
^
y1960;k¼^
yðZ1960;Z1961 ;...;Z1960þk1Þ
.
.
.
^
y2018kþ1;k¼^
yðZ2018kþ1;Z2018kþ2;...;Z2018Þ
ð6Þ
where Z
y
is the set of test statistics extracted in year y. For given interval length kwe, first, esti-
mate the discontinuity ^
y1959;kfor the first kyears, 1959 to 1959 + k1. We then move the inter-
val by one year and calculate the discontinuity ^
y1960;kfor the next kyears starting with 1960.
Continuing this procedure, we end with 60 k+ 1 discontinuity estimates for 60 k+ 1 corre-
sponding intervals whereas neighboring intervals always overlap by k1 years. This proce-
dure, as in moving averages, clearly introduces strong correlation between successive waves
(approximately 14/15 of the observations are the same in both waves). As we will explain
below, we do not make significance claims for these moving discontinuities and therefor do
not consider corrections for auto-correlation, nor do we explicitly model it. Note that we use
the default bandwidth determined by the estimation procedure for all analyses since the results
and discontinuity plots reported in the previous section suggest that it generally yields reliable
estimates. With this procedure we make sure that our results do not depend on a particular
division of our observation period. We further vary the interval length kto check for robust-
ness. This means that for larger kwe will have higher power and more precise discontinuity
estimates since the single intervals contain more values. On the other hand we smooth over
possible temporal variations. In contrast, smaller kwill result in a more fine-grained evaluation
of temporal changes, but yield less precise estimates. To navigate between the pros and cons of
smaller and larger k, we computed moving discontinuities for k= 5, . . ., 20. Since the detected
patterns are comparable, we only report the results for k= {10, 15, 20}. Note also, that our sug-
gested approach to investigate time trends of publication bias does not depend on the disconti-
nuity method, but can be applied to any publication detection method.
We report our key results for k= 15 in Fig 5. All graphs are computed using confidence
bands for the 95% confidence level. The patterns for moving discontinuities strikingly confirm
the results from discrete analyses, demonstrating a trend from publication bias at the 10 to the
5 percent level. Panel (a), (b) and (c) of Fig 5 show the results for the 10 percent, 5 percent and
1 percent significance level for k= 15. The results, therefore, directly correspond to Table 2.
Panel (a) demonstrates a decrease of publication bias at the 10 percent level. The disconti-
nuity is very high in the early years of our observation period with a ^
yranging steadily around
1. In the first half of the observation period the discontinuity is highly significant for all inter-
vals indicating a large over-representation of results just above the 10 percent significance
level. The period from 1978 to 1992 records a notable drop and is the first interval for which
the discontinuity is not significant anymore. The discontinuity remains small and significant,
with values close to zero, until shortly before 2010. After that it slowly increases to a stable and
significant discontinuity around ^
y¼0:3.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 19 / 27
Fig 5. Moving discontinuity plots. The graphs plot moving discontinuity estimates for the 10, 5 and 1 percent
significance level for k= 15 years time windows. Each dot is the result of a single discontinuity estimation. The x-axis
marks the end point of the respective intervals. For example, the first dot reports the estimate for the interval 1959–
1973, the second dot reports the estimate for the interval 1960–1974, and so forth. The outer lines denote the 95%
confidence interval. The estimates are reported in Table 2.
https://doi.org/10.1371/journal.pone.0305666.g005
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 20 / 27
Panel (b) provides the graph for the 5 percent significance level. The discontinuity is insig-
nificant and small for much of the earlier observation period. Then, there is an increase in pub-
lication bias after 2007 and the discontinuity estimates become significant from then on,
ranging between 0.1 and 0.2.
Panel (c) shows discontinuity trends for the 1 percent significance level. There is no sub-
stantial trend and most of the time, discontinuities are insignificant. More in particular, the
discontinuity estimates start large but insignificant and quickly drop into the negative range in
the eighties. Even though we see some fluctuations after that, the discontinuity remains nega-
tive for most of the intervals. Around 2010 the estimates become significant, but for reasons
already discussed, we do not suggest to over-interpret this finding.
We have run a number of sensitivity analyses, confirming our main findings. In particular,
we have varied the interval lengths kand repeated the analyses with the caliper test. S18 and
S19 Figs in the S1 File report intervals of k= 10 and k= 20. As expected, the plots are more vol-
atile and have wider confidence bands for 10 year moving discontinuities. In contrast, the plots
reporting 20 year moving discontinuities are very smooth, but have narrower confidence
bands. Both variations replicate the shift from the 10 to the 5 percent level.
7 Discussion
This study contributes to the increasing body of literature that investigates whether and how
publication bias has changed over the last decades. Drawing on a sample of more than 600
empirical papers published in the Quarterly Journal of Economics during the last six decades,
we test for discontinuities at the common levels of significance and identify several clear pat-
terns.First, we observe a significant publication bias over the last six decades. Our data indi-
cates a substantial over-representation of statistically significant effects just above the common
thresholds of significance in the observed period. We have developed a new method by apply-
ing the McCrary test procedure known from regression discontinuity to detect discontinuities
at the common thresholds of significance and our findings are robust across several parameter
variations.
Second, we observe a substantial change in bias patterns over time. Our observation period
can roughly be devided into three phases. In the first phase, spanning the first two quarters, we
find a large and stable bias at the 10 percent significance level, indicating the importance of
that specific threshold in those years. Around the year 1990, we observe a transition into the
second phase, which roughly comprises the third quarter of our observation period. In these
years we do not observe any publication bias, neither at the 10 nor at the 5 percent level of sig-
nificance. Lastly, the third phase spans the most recent quarter of our observation period and
shows substantial and robust bias as well at the 10 and the 5 percent level of significance. This
implies that publication bias moved to some degree from the 10 to the 5 percent level, indicat-
ing that the more rigorous threshold of 5 percent became increasingly important for research-
ers to push their results beyond and for reviewers as a selection criterion of what to
recommend for publication and what to reject.
How can these remarkable changes be explained? We first note that the large bias at the 10
percent threshold before 1990 coincides with the low proportion of empirical papers among
the published articles. Before 1990, less than 15% of the published articles contained an empiri-
cal study. Even though one could argue that only few empirical studies have been conducted at
that time, the results on publication bias might suggest a different story. Finding large disconti-
nuities –an indicator of publication bias– before 1990 can likewise indicate that empirical
papers have faced comparably high publication barriers represented by a low publication rate
of such papers.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 21 / 27
This argumentation is further supported by the absence of bias in the years after 1990. The
substantial increase in the proportion of empirical papers after 1990 presumably indicates a
higher acceptance rate for such papers, and, therefore, a temporarily reduced competition. A
Manifesto published by the editors of the QJE in which they announce a reorientation of the
journal also suggests that the transition towards more empirical papers reflects an editorial
decision on what kind of papers are selected for publication, rather than a change in what kind
of papers are submitted for publication by researchers [49]. The large bias before 1990 and the
absence of any bias thereafter is therefore consistent with the argument that this is the result of
changes in the publication chance of empirical papers.
The second noteworthy observation is the shift from a publication bias only at the 10 per-
cent significance level in earlier decades to a publication bias at both the 10 and 5 percent level
in recent years. From a statistical point of view, the explanation can be found in the sample
sizes and degrees of freedom, which have increased sharply over time. The pvalues are sensi-
tive to changes in sample size, especially for smaller sample sizes, and for a fixed test statistic,
the pvalue decreases quasi-exponentially with increasing sample size. Considering that 75% of
the coefficients that we extracted from articles before 1990 are linked to a sample size below
350 and that, in contrast, 75% of the coefficients that we extracted after 1990 are linked to a
sample size above 368, we believe that the 10 percent significance level may have been more
attractive to researchers before 1990 simply because sample sizes and, therefore, statistical
power were notably smaller at that time. In other words, the size of the bull’s-eye was adjusted
to the changing circumstances in such a way that there was a better chance of getting a hit in
the form of a significant result. This could also explain that the proportion of significant results
remained constant despite increasing sample size.
Lastly, we discuss the overall impact that publication bias may have or has had on econom-
ics and related social sciences today and in the past. We make two arguments that suggest that
the absolute magnitude and influence of biased research has increased. First, even though pub-
lication bias was apparently already present and large in earlier decades, it played a minor role
since it only affected a small share of economic research. In contrast, the majority of articles in
recent decades were of empirical nature. Therefore, although the relative magnitude of the bias
may have become smaller, the overall spread of biased research has become larger and, there-
fore, it became a more wide-spread problem than it used to be.
Our second argument goes beyond our findings and addresses general trends that can be
observed in highly influential journals. The QJE belongs to the elite group of highly regarded
journals and the constantly increasing number of submissions has pushed their acceptance
rate into the single-digit range (see for example [38], and our own analysis in Section 3 in the
S1 File), simultaneously increasing the pressure on and the competition between researchers
that aim for publications in one of these journals. In light of the above discussion, a lower
acceptance rate will go hand in hand with greater selectivity in choosing which papers to pub-
lish. These highly influential journals and their published contents, therefore, become more
and more relevant for the career of junior researchers. For example, an increasing number of
tenure requirements at universities demand publications in such top journals. These regula-
tions are highly controversial. Heckman and Moktan [50] dubbed the competition for the
positional goods of rare places in the five leading journals, the Quarterly Journal of Economics
among them, as the “tyranny of the big five.” However, their leading position makes articles in
top journals idealized role models of how and what to publish. As a consequence, more and
more researchers relate to the “biased” literature in these journals when selecting research top-
ics and relating new research to the current state of the art. In this sense, publication bias in
top journals can have reinforcing dynamics, influencing a whole series of upcoming publica-
tions and research trends based on innovative, but presumably biased studies.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 22 / 27
8 Conclusion
To summarize, our findings draw a picture that shows an increasing trend from theory to
empirics, towards larger studies with more complex statistics and towards more rigorous crite-
ria for statistical significance testing. This increasing competition and demand for elaborate
empirics may have led researchers nowadays to push their analyses beyond the 5 percent sig-
nificance level, while in earlier times, pushing beyond the 10 percent level may have been
regarded as enough.
Examining how publication bias has evolved and changed in the past raises several impor-
tant conclusions about how publication bias might change in the future. We, therefore, con-
clude by discussing several policy implications of what can be done to reduce publication bias
in the future. There have been a number of suggestions from editors, authors and, more gener-
ally, from advocates of the open science movement. They range from interventions (1) during
the publication process by implementing more rigorous statistical conventions and changing
the review process, (2) after the publication process by creating stronger incentives for replica-
tions, and (3) before the publication process by demanding pre-registration.
One reaction to the increasing evidence on bias in empirical research is to demand more
rigorous thresholds of statistical significance. The American Sociological Review, one of the top
journals in sociology, recommends not to use the 10 percent significance level any more [51],
others in interdisciplinary forums (Nature Human Behavior) go even further calling for signifi-
cance levels far below 5 percent [52]. However, the problem may merely be shifted once stric-
ter criteria have been implemented. Our analysis shows that publication bias in economics
shifted from biased research at the 10 percent level towards biased research at the 5 percent
level during the times when criteria became stricter. If the community asks now for stricter sig-
nificance levels and they became standard, it is plausible that future biased research will occur
at the new levels. Although one may argue that it becomes increasingly harder to push evi-
dence above stricter criteria, stricter significance criteria also require larger samples for all
researchers, relativizing the required efforts to produce less biased research once again. Instead
of evermore stricter significance criteria, reporting effect sizes along with the statistical results
may actually help shifting the focus from significance to meaningfulness [53].
As an alternative to suggesting stricter criteria for performing statistics, it has been sug-
gested to change the review process. In order to overcome publication bias, a results-free
review process has been put forward and initial trials have already been conducted by several
journals [54,55]. The idea is that the evaluation of manuscripts should be independent of the
reported statistical results. This may be one promising way to foster the publication of manu-
scripts independently from significance levels, pvalues and the like, since it separates the
review process from the outcomes of the study. However, first implementations of this
approach yielded mixed results. Separating the conceptual part of manuscripts from their sta-
tistical results raises new problems such as a shift in incentives and increased costs, e.g.
reduced transparency since information about data and results is withheld from editors and
reviewers, or a flood of null findings since researchers may be encouraged to open their “file-
drawers” [54,56].
Changes after the publication process have also been suggested. In particular, creating
stronger incentives for replications may change what will be published in the future since they
enhance the likelihood of error detection. Consequently, false-positive and inflated estimates
might undergo a correction process in the literature. Moreover, if researchers anticipate repli-
cations to come, they may be more cautious or reluctant to publish “statistical outliers” and
biased results. In this way, stronger incentives for conducting and publishing replication stud-
ies may reduce biased research. However, it is well known that replications are to be
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 23 / 27
understood as a contribution to a collective good. Therefore, “selective incentives” [57] are
needed so that the scientific community can benefit from replications. In particular, sensitizing
junior scientists for publication bias and creating respective incentives, such as asking for repli-
cation studies already in Master or Ph.D. programs, could further contribute to less bias in
future published knowledge (see for example the recommendations by Diekmann [58] for eco-
nomics programs and the suggestions by King [59,60] for political science programs at Har-
vard). In addition, it has been recommended to change journal data policies towards
demanding open data sharing. If this policy became standard, the costs for performing replica-
tions would be dramatically reduced, researchers would anticipate replications and may be less
inclined to publish biased results.
In addition to changing the process after data collection (statistics, reviews and replica-
tions), a change of the policies before data collection may even be more effective. In particular,
pre-registration could be a promising mechanism. In the meanwhile, a number of public data
repositories have been installed and a rise in pre-registration of studies can be observed. For
clinical trials in the US a pre-registration of studies is even required by law [61]. Pre-registra-
tion increases the incentives to publish null-findings and decreases possibilities to select
research findings based on their significance. A recent study suggests that pre-registration may
actually reduce publication bias and promote the publication of null findings [62]. The trend
towards pre-registration is aligned with upcoming new journals specializing on publishing
studies for which the data refuted prior hypotheses and expectations (e.g. the Journal of Nega-
tive Results). This certainly decreases the costs for publishing pre-registered (and other kinds
of) null findings and increases the incentives for pre-registration.
All in all, reflecting on our evidence in tandem with our discussion of policy interventions
in the scientific publication process, we conclude that the transition towards less biased
research is a long way to go. A number of measures have been suggested, tested and imple-
mented. Nevertheless, their effectiveness and impact and their possibly negative side-effects
and counter-intuitive macro-transitions will have to be empirically evaluated in future studies.
Supporting information
S1 File.
(PDF)
Acknowledgments
For helpful comments, we are grateful to participants of the MAER-Net Colloquium at Deakin
University, Workshop on Analytical Sociology at Venice International University, the PRIN-
TEGER European Conference on Research Integrity at Bonn University, the ESRA conference
at ISEG Lisbon School of Economics Management and seminar participants at University of
Zurich, LMU Munich and ETH Zurich. We thank Alexander Ehlert and Andreas Diekmann
for outstanding research assistance. We particularly thank Alexander Ehlert for his support in
adapting the R-command for the McCrary test. Lastly, we want to thank the editor and the
anonymous reviewers for helpful and constructive feedback which has been highly valuable for
improving the paper.
Author Contributions
Conceptualization: Julia Jerke.
Data curation: Julia Jerke, Antonia Velicu.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 24 / 27
Formal analysis: Julia Jerke, Heiko Rauhut.
Funding acquisition: Heiko Rauhut.
Investigation: Julia Jerke, Antonia Velicu, Heiko Rauhut.
Methodology: Julia Jerke.
Project administration: Heiko Rauhut.
Resources: Julia Jerke.
Software: Fabian Winter.
Supervision: Heiko Rauhut.
Validation: Antonia Velicu, Fabian Winter, Heiko Rauhut.
Visualization: Julia Jerke, Fabian Winter.
Writing original draft: Julia Jerke, Heiko Rauhut.
Writing review & editing: Julia Jerke, Antonia Velicu, Fabian Winter, Heiko Rauhut.
References
1. Anderson MS, Ronning EA, De Vries R, Martinson BC. The Perverse Effects of Competition on Scien-
tists’ Work and Relationships. Science and Engineering Ethics. 2007; 13(4):437–461. https://doi.org/10.
1007/s11948-007-9042-5 PMID: 18030595
2. Frey BS. Publishing as prostitution?—Choosing between one’s own ideas and academic success. Pub-
lic Choice. 2003; 116:205–223. https://doi.org/10.1023/A:1024208701874
3. Fanelli D. Do Pressures to Publish Increase Scientist’s Bias? Empirical Support from US States Data.
PLoS ONE. 2010; 5:e10271. https://doi.org/10.1371/journal.pone.0010271 PMID: 20422014
4. Christensen G, Miguel E. Transparency, reproducibility, and the credibility of economics research. Jour-
nal of Economic Literature. 2018; 56(3):920–80. https://doi.org/10.1257/jel.20171350
5. Dickersin K, Min YI, Meinert CL. Factors Influencing Publication of Research Results. Follow-up of
Applications Submitted to Two Institutional Review Boards. Journal of the American Medical Associa-
tion. 1992; 267(3):374–378. https://doi.org/10.1001/jama.1992.03480030052036 PMID: 1727960
6. Callaham ML, Wears RL, Weber EJ, Barton C, Young G. Positive-Outcome Bias and Other Limitations
in the Outcome of Research Abstracts Submitted to a scientific Meeting. Journal of the American Medi-
cal Association. 1998; 280(3):254–257. https://doi.org/10.1001/jama.280.3.254 PMID: 9676673
7. Young NS, Ioannidis JPA, Al-Ubaydli O. Why Current Publication Practices May Distort Science. PLOS
Medicine. 2008; 5(10):1–5. https://doi.org/10.1371/journal.pmed.0050201 PMID: 18844432
8. Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;
349 (6251). PMID: 26315443
9. Gerber A, Malhotra N. Do Statistical Reporting Standards Affect What is Published? Publication Bias in
Two Leading Political Science Journals. Quarterly Journal of Political Science. 2008; 3:313–326.
https://doi.org/10.1561/100.00008024
10. De Long JB, Lang K. Are All Economic Hypotheses false? Journal of Political Economy. 1992; 100
(6):1257–12723. https://doi.org/10.1086/261860
11. Doucouliagos C. Publication Bias in the Economic Freedom and Economic Growth Literature. Journal
of Economic Surveys. 2005; 19(3):367–387. https://doi.org/10.1111/j.0950-0804.2005.00252.x
12. Gerber AS, Malhotra N. Publication Bias in Empirical Sociological Research: Do Arbitrary Significance
Levels Distort Published Results? Sociological Methods & Research. 2008; 37(1):3–30. https://doi.org/
10.1177/0049124108318973
13. Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: Unlocking the file drawer.
Science. 2014; 345(6203):1502–1505. https://doi.org/10.1126/science.1255484 PMID: 25170047
14. Brodeur A, Le
´M, Sangnier M, Zylberberg Y. Star Wars: The Empirics Strike Back. American Economic
Journal: Applied Economics. 2016; 8(1):1–32.
15. Doucouliagos H, Stanley TD. Publication selection bias in minimum-wage research? A meta-regression
analysis. British Journal of Industrial Relations. 2009; 47(2):406–428. https://doi.org/10.1111/j.1467-
8543.2009.00723.x
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 25 / 27
16. Turner EH, Matthews AM, Linardatos E, Tell RA, Robert R. Selective Publication of Antidepressant Tri-
als and Its Influence on Apparent Efficacy. The New England Journal of Medicine. 2008; 358(3):252–
260. https://doi.org/10.1056/NEJMsa065779 PMID: 18199864
17. Sterling TD. Publication Decision and the Possible Effects on Inferences Drawn From Tests of Signifi-
cance—or Vice Versa. Journal of the American Statistical Association. 1959; 54:30–34. https://doi.org/
10.2307/2282137
18. Wilson FD, Smoke GL, Martin JD. The Replication Problem in Sociology: A Report and a Suggestion.
Sociological Inquiry. 1973; 43(2):141–149. https://doi.org/10.1111/j.1475-682X.1973.tb00711.x
19. Simes RJ. Publication Bias: The Case for an International Registry of Clinical Trials. Journal of Clinical
Oncology. 1986; 4:1529–1541. https://doi.org/10.1200/JCO.1986.4.10.1529 PMID: 3760920
20. Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2012;
90:891–904. https://doi.org/10.1007/s11192-011-0494-7
21. Leggett NC, Thomas NA, Loetscher T, Nicholls MER. The Life of p: “Just Significant” Results are on the
Rise. Quarterly Journal of Experimental Psychology. 2013; 66(12):2303–2309. https://doi.org/10.1080/
17470218.2013.863371 PMID: 24205936
22. Brodeur A, Cook N, Heyes A. Methods Matter: p-Hacking and Publication Bias in Causal Analysis in
Economics. American Economic Review. 2020; 110(11):3634–3660. https://doi.org/10.1257/aer.
20190687
23. Vivalt E. Specification Searching and Significance Inflation Across Time, Methods and Disciplines.
Oxford Bulletin of Economics and Statistics. 2019; 81(4):797–816. https://doi.org/10.1111/obes.12289
24. McCrary J. Manipulation of the running variable in the regression discontinuity design: A density test.
Journal of Econometrics. 2008; 142(2):698–714. https://doi.org/10.1016/j.jeconom.2007.05.005
25. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values,
confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology.
2016; 31(4):337–350. https://doi.org/10.1007/s10654-016-0149-3 PMID: 27209009
26. Haller H, Krauss S. Misinterpretations of significance: A problem students share with their teachers.
Methods of Psychological Research. 2002; 7(1):1–20.
27. Badenes-Ribera L, Frias-Navarro D, Iotti B, Bonilla-Campos A, Longobardi C. Misconceptions of the p-
value among Chilean and Italian academic psychologists. Frontiers in Psychology. 2016; 7:1247.
https://doi.org/10.3389/fpsyg.2016.01247 PMID: 27602007
28. Lyu XK, Xu Y, Zhao XF, Zuo XN, Hu CP. Beyond psychology: prevalence of p value and confidence
interval misinterpretation across different fields. Journal of Pacific Rim Psychology. 2020; 14. https://
doi.org/10.1017/prp.2019.28
29. Bernardi F, Chakhaia L, Leopold L. ‘Sing me a song with social significance’: the (mis) use of statistical
significance testing in European sociological research. European Sociological Review. 2017; 33(1):1–
15.
30. Deirdre NMcCloskey STZ, Ziliak S, McCloskey DN. The Cult of Statistical Significance: How the Stan-
dard Error Costs Us Jobs, Justice, and Lives. Economics, Cognition, And Society. University of Michi-
gan Press; 2008. Available from: https://books.google.ch/books?id=JWLIRr_ROgAC.
31. Fisher RA. Statistical Tests. Nature. 1935; 136(3438):474. https://doi.org/10.1038/136474b0
32. Selvin HC. A critique of tests of significance in survey research. American Sociological Review. 1957;
22(5):519–527. https://doi.org/10.2307/2089475
33. Ioannidis JPA, Stanley TD, Doucouliagos H. The power of bias in economic research. The Economic
Journal. 2017; 127:F236–F265. https://doi.org/10.1111/ecoj.12461
34. Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Medicine. 2005; 2(8). https://
doi.org/10.1371/journal.pmed.0020124 PMID: 16060722
35. Ioannidis JPA, Trikalinos TA. An exploratory test for an excess of significant findings. Clinical Trials.
2007; 4(3):245–253. https://doi.org/10.1177/1740774507079441 PMID: 17715249
36. Sahner H. Vero
¨ffentlichte empirische Sozialforschung: Eine Kumulation von Artefakten? Eine Period-
ika. Zeitschrift fu¨r Soziologie. 1979; 8(3):267–278. https://doi.org/10.1515/zfsoz-1979-0305
37. Sterling TD, Rosenbaum WL, Weinkam JJ. Publication Bias Revisited: The Effect of the Outcome of
Statistical Tests on the Decision to Publish and Vice Versa. American Statistician. 1995; 49:108–112.
https://doi.org/10.1080/00031305.1995.10476125
38. Card D, DellaVigna S. Nine Facts about Top Journals in Economics. Journal of Economic Literature.
2013; 51(1):144–161. https://doi.org/10.1257/jel.51.1.144
39. Brankovic J, Ringel L, Werron T. How Rankings Produce Competition: The Case of Global University
Rankings. Zeitschrift fu¨r Soziologie. 2018; 47(4):270–288. https://doi.org/10.1515/zfsoz-2018-0118
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 26 / 27
40. Paldam M. The strategies of economic research—An empirical study. Department of Economics and
Business Economics, Aarhus University; 2018. Available from: https://EconPapers.repec.org/RePEc:
aah:aarhec:2018-04.
41. Masicampo EJ, Lalande DR. A peculiar prevalence of p values just below.05. Quarterly Journal of
Experimental Psychology. 2012; 65(11):2271–2279. https://doi.org/10.1080/17470218.2012.711335
PMID: 22853650
42. Auspurg K, Hinz T, Schneck A. Ausmaß und Risikofaktoren des Publication Bias in der deutschen
Soziologie. Ko
¨lner Zeitschrift fu¨r Soziologie und Sozialpsychologie. 2014; 66:549–573. https://doi.org/
10.1007/s11577-014-0284-3
43. Berning CC, Weiß B. Publication bias in the German social sciences: an application of the caliper test to
three top-tier German social science journals. Quality & Quantity. 2016; 50:901–917. https://doi.org/10.
1007/s11135-015-0182-4
44. Bruns SB, Asanov I, Bode R, Dunger M, Funk C, Hassan SM, et al. Reporting errors and biases in pub-
lished empirical findings: Evidence from innovation research. Research Policy. 2019; 48(9):103796.
https://doi.org/10.1016/j.respol.2019.05.005
45. McCrary J, Christensen G, Fanelli D. Conservative Tests under Satisficing Models of Publication Bias.
PLOS ONE. 2016; 11(2):e0149590. https://doi.org/10.1371/journal.pone.0149590 PMID: 26901834
46. Hahn J, Todd P, der Klaauw WV. Identification and Estimation of Treatment Effects with a Regression-
Discontinuity Design. Econometrica. 2001; 69(1):201–209. https://doi.org/10.1111/1468-0262.00183
47. Lee DS, Lemieux T. Regression Discontinuity Designs in Economics. Journal of Economic Literature.
2010; 48(2):281–355. https://doi.org/10.1257/jel.48.2.281
48. Fan J, Gijbels I. Local polynomial modelling and its applications: monographs on statistics and applied
probability 66. vol. 66. Chapman & Hall; 1996.
49. Blanchard OJ, Maskin ES, Summers LH. Manifesto. The Quarterly Journal of Economics. 1985; 100(1):
iii. https://doi.org/10.1093/qje/100.1.iii
50. Heckman JJ, Moktan S. Publishing and promotion in economics: The tyranny of the top five. Journal of
Economic Literature. 2020; 58(2):419–70. https://doi.org/10.1257/jel.20191574
51. Mustillo SA, Lizardo OA, McVeigh RM. Editors’ Comment: A Few Guidelines for Quantitative Submis-
sions. American Sociological Review. 2018; 83(6):1281–1283. https://doi.org/10.1177/
0003122418806282
52. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, et al. Redefine statisti-
cal significance. Nature Human Behaviour. 2018; 2(1):6. https://doi.org/10.1038/s41562-017-0189-z
PMID: 30980045
53. Pek J, Flora D. Reporting Effect Sizes in Original Psychological Research: A Discussion and Tutorial.
Psychological Methods. 2016; 23.
54. Ansell B, Samuels D. Journal Editors and “Results-Free” Research: A Cautionary Note. Comparative
Political Studies. 2016; 49(13):1809–1815. https://doi.org/10.1177/0010414016669369
55. Button KS, Bal L, Clark A, Shipley T. Preventing the ends from justifying the means: withholding results
to address publication bias in peer-review. BMC Psychology. 2016; 4(1). https://doi.org/10.1186/
s40359-016-0167-7 PMID: 27903302
56. Findley MG, Jensen NM, Malesky EJ, Pepinsky TB. Can results-free review reduce publication bias?
The results and implications of a pilot study. Comparative Political Studies. 2016; 49(13):1667–1703.
https://doi.org/10.1177/0010414016655539
57. Olson M. The Logic of Collective Action. American studies collection. Cambridge: Harvard University
Press; 1965. Available from: https://books.google.ch/books?id=jzTeOLtf7_wC.
58. Diekmann A. Are Most Published Research Findings False? Journal of Economics and Statistics (Jahr-
buecher fuer Nationaloekonomie und Statistik). 2011; 231(5-6):628–635.
59. King G. Replication, replication. PS: Political Science & Politics. 1995; 28(3):444–452.
60. King G. Publication, publication. PS: Political Science & Politics. 2006; 39(1):119–125.
61. Warren M. First analysis of ‘pre-registered’ studies shows sharp rise in null findings. Nature. 2018.
https://doi.org/10.1038/d41586-018-07118-1
62. Allen C, Mehler DM. Open Science challenges, benefits and tips in early career and beyond. Preprint at
PsyArXiv. 2018.
PLOS ONE
Publication bias in the social sciences since 1959
PLOS ONE | https://doi.org/10.1371/journal.pone.0305666 February 14, 2025 27 / 27
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
P values and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys have revealed that these two indices are generally misunderstood. However, existing surveys on this subject fall under psychology and biomedical research, and data from other disciplines are rare. Moreover, the confidence of researchers when constructing judgments remains unclear. To fill this research gap, we surveyed 1,479 researchers and students from different fields in China. Results reveal that for significant (i.e., p < .05, CI does not include zero) and non-significant (i.e., p > .05, CI includes zero) conditions, most respondents, regardless of academic degrees, research fields and stages of career, could not interpret p values and CIs accurately. Moreover, the majority were confident about their (inaccurate) judgements (see osf.io/mcu9q/ for raw data, materials, and supplementary analyses). Therefore, as misinterpretations of p values and CIs prevail in the whole scientific community, there is a need for better statistical training in science.
Article
Full-text available
The movement towards open science is a consequence of seemingly pervasive failures to replicate previous research. This transition comes with great benefits but also significant challenges that are likely to affect those who carry out the research, usually early career researchers (ECRs). Here, we describe key benefits, including reputational gains, increased chances of publication, and a broader increase in the reliability of research. The increased chances of publication are supported by exploratory analyses indicating null findings are substantially more likely to be published via open registered reports in comparison to more conventional methods. These benefits are balanced by challenges that we have encountered and that involve increased costs in terms of flexibility, time, and issues with the current incentive structure, all of which seem to affect ECRs acutely. Although there are major obstacles to the early adoption of open science, overall open science practices should benefit both the ECR and improve the quality of research. We review 3 benefits and 3 challenges and provide suggestions from the perspective of ECRs for moving towards open science practices, which we believe scientists and institutions at all levels would do well to consider.
Article
Full-text available
This paper examines how significance inflation has varied across time, methods and disciplines. Leveraging a unique data set of impact evaluations on 20 kinds of development programmes, I find that results from randomized controlled trials exhibit less significance inflation than results from studies using other methods. Further, randomized controlled trials have exhibited less significance inflation over time, but quasi‐experimental studies have not. There is no robust difference between results from researchers affiliated with economics departments and those from researchers affiliated with other predominantly health‐related departments. Overall, the biases found appear much smaller than those previously observed in other social sciences.
Article
Full-text available
Zusammenfassung Der Zusammenhang zwischen Rankings und Konkurrenz wird häufig unterstellt, aber selten genauer untersucht. Der vorliegende Aufsatz geht ihm am Beispiel globaler Universitätsrankings nach. Ausgehend von einem soziologischen Verständnis von Konkurrenz bestimmen wir „Ranken“ als eine soziale Operation, die vier Teiloperationen miteinander kombiniert: Vergleich von Leistungen, Quantifizierung, Visualisierung, und wiederholte Publikation. Visualisierung und Publikation stehen für die in der Literatur bisher kaum berücksichtigte performative Dimension von Rankings, die für die Analyse des Zusammenhangs zwischen Rankings und Konkurrenz von zentraler Bedeutung ist. Auf dieser Grundlage zeigen wir, wie globale Universitätsrankings zur Konstruktion von Konkurrenz beitragen: durch (a) Globalisierung eines spezifischen Exzellenzdiskurses; (b) Verknappung von Reputation; (c) Transformation einer stabilen in eine dynamische Statusordnung. Wir schließen mit einer Diskussion von Implikationen dieser Analyse für die soziologische Erforschung von Konkurrenz und ihrer gesellschaftlichen Effekte.
Article
The credibility revolution in economics has promoted causal identification using randomized control trials (RCT), difference-in-differences (DID), instrumental variables (IV) and regression discontinuity design (RDD). Applying multiple approaches to over 21,000 hypothesis tests published in 25 leading economics journals, we find that the extent of p-hacking and publication bias varies greatly by method. IV (and to a lesser extent DID) are particularly problematic. We find no evidence that (i) papers published in the Top 5 journals are different to others; (ii) the journal “revise and resubmit” process mitigates the problem; (iii) things are improving through time. (JEL A14, C12, C52)
Article
This paper examines the relationship between placement of publications in top five (T5) journals and receipt of tenure in academic economics departments. Analyzing the job histories of tenure–track economists hired by the top 35 US economics departments, we find that T5 publications have a powerful influence on tenure decisions and rates of transition to tenure. A survey of the perceptions of young economists supports the formal statistical analysis. Pursuit of T5 publications has become the obsession of the next generation of economists. However, the T5 screen is far from reliable. A substantial share of influential publications appear in non-T5 outlets. Reliance on the T5 to screen talent incentivizes careerism over creativity.( JEL A14, I23, J44, J62)
Article
Errors and biases in published results compromise the reliability of empirical research, posing threats to the cumulative research process and to evidence-based decision making. We provide evidence on reporting errors and biases in innovation research. We find that 45% of the articles in our sample contain at least one result for which the provided statistical information is not consistent with reported significance levels. In 25% of the articles, at least one strong reporting error is diagnosed where a statistically non-significant finding becomes significant or vice versa using the common significance threshold of 0.1. The error rate at the test level is very small with 4.0% exhibiting any error and 1.4% showing strong errors. We also find systematically more marginally significant findings compared to marginally non-significant findings at the 0.05 and 0.1 thresholds of statistical significance. These discontinuities indicate the presence of reporting biases. Explorative analysis suggests that discontinuities are related to authors’ affiliations and to a lesser extent the article’s rank in the issue and the style of reporting.
Article
There is growing interest in enhancing research transparency and reproducibility in economics and other scientific fields. We survey existing work on these topics within economics and discuss the evidence suggesting that publication bias, inability to replicate, and specification searching remain widespread in the discipline. We next discuss recent progress in this area, including through improved research design, study registration and pre-analysis plans, disclosure standards, and open sharing of data and materials, drawing on experiences in both economics and other social sciences. We discuss areas where consensus is emerging on new practices, as well as approaches that remain controversial, and speculate about the most effective ways to make economics research more credible in the future.