Article

Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say "Usually Not"

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We attempt to replicate 67 papers published in 13 well-regarded economics journals using author-provided replication files that include both data and code. Some journals in our sample require data and code replication files, and other journals do not require such files. Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files. We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable. We conclude with recommendations on improving replication of economics research.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As Figure 3 shows, that viability number has risen considerably, though how high depends on which sample of journals is examined. 5 Galiani, Gertler, and Romero (2018) found a roughly 39 percent viability rate, while Chang and Li (2015) found a 58 percent rate (38 viable data sets from 67 attempts). 6 Thus, by this comparison, viability of replication has risen by as much as a factor of six since the 1980s. ...
... If journals require some combination of public data availability and data checks in the review process, much of the replicability "problem" might disappear. This sort of requirement is empirically important: Chang and Li (2015) report that they are able to obtain replication data more often for papers published in journals with a data availability requirement than for those published in journals without such a requirement. ...
... Dewald, Thursby, and Anderson (1986) describe a common situation: a regularly updated government dataset is used in analysis, but Ozier neither a copy of the relevant vintage of the public dataset nor the precise date when it was obtained is included in the replication files, thus preventing would-be replicators from knowing whether the dataset they obtain is the same as what the original study authors had used. Chang and Li (2015) mention, as the 1986 paper also did, the occasional problem of confidential datasets and unavailable software packages. Alongside uncommented computer programs and unintuitive variable names, McCullough, McGeary, and Harrison (2006) describe cases of forgotten subroutines, cases in which "the person responsible for archiving the data and code stopped doing this part of his job," a case of an ASCII data file in which "we are supposed to guess the names of the variables," and one case in which the original study author had included the pessimistic caveat that the program supplied to replicators was "not necessarily the one that produced the results reported in the paper." ...
... Reproducibility of scientific findings is the core of scientific research (Tetens, 2016). Scientists have proposed recommendations and suggested criterion (Chang & Li, 2015;Yale, 2010) for better scientific reproducibility, but the reproduction of actual scientific research may still be challenging even if the research article meets the expectations. In this project, we replicated a research article by professor Skinner (2019a), which explores the association between broadband access and online course enrollment in the US. ...
... These guidelines may help us assess and define the strength of the original academic paper as well. Thus, we first compare the original project against some general guidelines, particularly the comprehensive ones by Yale Law School Roundtable on Data and Code Sharing (Yale, 2010) and a group of researchers who reviewed 60 economic papers (Chang & Li, 2015). Then we determine its reproducibility according to a discipline-specific guideline by the National Science Foundation (NSF) and Wicherts et al. (2006) provided a simple solution to scientists for better scientific reproducibility: report data and standardized codes publicly in appendix. ...
... Based on these recommendations, the reproducibility of this research project should be very high. 5 years later, Chang & Li (2015) provided a more comprehensive list of general recommendations based on their review of 60 economic research papers. A close comparison with the previous papers shows that 5 extra recommendations to researchers are added in this recent paper. ...
Preprint
Full-text available
In this paper, we replicated a Bayesian educational research project, which explores the association between broadband access and online course enrollment in the US. We summarized key findings from our replication and compared them with the original project. Based on my replication experience, we aim to demonstrate the challenges of research reproduction, even when codes and data are shared openly and the quality of the materials on GitHub are high. Moreover, we investigate the implicit presumptions of the researchers' level of knowledge and discuss how such presumptions may add difficulty to the reproduction of scientific research. Finally, we hope this article sheds light on the design of reproducibility criterion and opens up a space to explore what should be taught in undergraduate statistics education.
... A so-called "reproducibility crisis" (Baker, 2016) is affecting the natural and social sciences (Chang & Li, 2015;Duvendack, Palmer-Jones, & Reed, 2017;Gertler, Galiani, & Romero, 2018;Hagger et al., 2016;Watts, Duncan, & Quan, 2018). The failure to replicate a significant proportion of results in psychology (OSC, 2015), along with the exposure of high-profile fraud cases (Atwater, Mumford, Schriesheim, & Yammarino, 2014;Stroebe, Postmes, & Spears, 2012), has increased the visibility of the replicability and reproducibility problem in management research. ...
... A transparent description of sampling, data collection, aggregation, and inference methods, along with a precise description of the study context allow for replication (Aguinis, Ramani, & Alabduljader, 2018;Aguinis & Solarino, 2019). The availability of data and code allow for reproduction (Chang & Li, 2015). ...
... First, the data and code needed for reproduction may not be disclosed by the authors, making reproduction impossible (Dewald et al., 1986;McCullough, McGeary, & Harrison, 2006;McCullough, McGeary, & Harrison, 2008). Second, the data and code may be available but they do not match, or the instructions of the original authors are so obscure that they prevent reproduction (Chang & Li, 2015). Third, when data and code can be run but do not deliver the expected findings, it still does not necessarily invalidate the results as the authors could have submitted incorrect versions of the data or code files. ...
Article
Full-text available
The past decade has been marked by concerns regarding the replicability and reproducibility of published research in the social sciences. Publicized failures to replicate landmark studies, along with high-profile cases of research fraud, have led scholars to reconsider the trustworthiness of both findings and institutionalized research practices. This paper considers two questions: (1) Relative to psychology and economics, what is the state of replication and reproduction research in management? (2) Are the disciplines equally advanced in the use of methods applied to study the replication problem? A systematic literature review identified 67 studies pertinent to these questions. The results indicate that the replication prevalence rate in management studies lies almost exactly between those of psychology and economics, while a high level of variation between management and other business-related disciplines can be noted. Further, similarly to psychology, but unlike economics, the surveys of published replications tend to report high replication success rates for management and other business-related disciplines. However, a comparison with recently obtained results in preregistered multi-study replications in psychology and economics suggests that these rates are almost certainly inflated. Method and data transparency are medium to low, often rendering attempts to reproduce or replicate studies impossible. Finally, the understanding of the replicability problem in management is held back by the underutilization of methods developed in other disciplines. The review also reveals that management, psychology, and economics exhibit strikingly different practices and approaches to replication, despite facing similar incentive structures. Disciplines in which replication and reproduction attempts are rare and which frequently involve authors of the original study in replication attempts lack strong deterrents against questionable research practices; thus, they are less likely to deliver replicable results.
... Conditionally successful: If -both-data -and-programs are present, do results replicate? Romero (2018) found a roughly 39 percent viability rate, while Chang and Li (2015) found a 58 percent rate (38 viable data sets from 67 attempts). 7 Thus, by this comparison, viability of replication has risen by as much as a factor of six since the 1980s. ...
... Earlier, I referred to what Fisher the public dataset nor the precise date when it was obtained is included in the replication files, thus preventing would-be replicators from knowing whether the dataset they obtain is the same as what the original study authors had used. Chang and Li (2015) mention, as the 1986 paper also did, the occasional problem of confidential datasets and unavailable software packages. Alongside uncommented computer programs and unintuitive variable names, McCullough, McGeary, and Harrison (2006) describe cases of forgotten subroutines, cases in which "the person responsible for archiving the data and code stopped doing this part of his job," a case of an ASCII data file in which "we are supposed to guess the names of the variables," and one case in which the original study author had included the pessimistic caveat that the program supplied to replicators was "not necessarily the one that produced the results reported in the paper." ...
... The author was right: it was not. 7 Chang and Li (2015) also noted that journal requirements mattered: replication a data set was twice as likely to be available when the journal required it, compared to when the journal did not. called a "demonstrable" experiment (one for which an experimenter is able to reliably conduct a procedure that produces a predictable result) as "reproducible," while I have followed recent years' efforts in the social sciences in referring to the successful checking of computer programs as "replication." ...
... As Figure 3 shows, that viability number has risen considerably, though how high depends on which sample of journals is examined. 5 Galiani, Gertler, and Romero (2018) found a roughly 39 percent viability rate, while Chang and Li (2015) found a 58 percent rate (38 viable data sets from 67 attempts). 6 Thus, by this comparison, viability of replication has risen by as much as a factor of six since the 1980s. ...
... If journals require some combination of public data availability and data checks in the review process, much of the replicability "problem" might disappear. This sort of requirement is empirically important: Chang and Li (2015) report that they are able to obtain replication data more often for papers published in journals with a data availability requirement than for those published in journals without such a requirement. ...
... Dewald, Thursby, and Anderson (1986) describe a common situation: a regularly updated government dataset is used in analysis, but Ozier neither a copy of the relevant vintage of the public dataset nor the precise date when it was obtained is included in the replication files, thus preventing would-be replicators from knowing whether the dataset they obtain is the same as what the original study authors had used. Chang and Li (2015) mention, as the 1986 paper also did, the occasional problem of confidential datasets and unavailable software packages. Alongside uncommented computer programs and unintuitive variable names, McCullough, McGeary, and Harrison (2006) describe cases of forgotten subroutines, cases in which "the person responsible for archiving the data and code stopped doing this part of his job," a case of an ASCII data file in which "we are supposed to guess the names of the variables," and one case in which the original study author had included the pessimistic caveat that the program supplied to replicators was "not necessarily the one that produced the results reported in the paper." ...
Article
Full-text available
In 2004, a landmark study showed that an inexpensive medication to treat parasitic worms could improve health and school attendance for millions of children in many developing countries. Eleven years later, a headline in The Guardian reported that this treatment, deworming, had been “debunked.” The pronouncement followed an effort to replicate and re-analyze the original study, as well as an update to a systematic review of the effects of deworming. This story made waves amidst discussion of a reproducibility crisis in some of the social sciences. In this paper, I explore what it means to “replicate” and “reanalyze” a study, both in general and in the specific case of deworming. I review the broader replication efforts in economics, then examine the key findings of the original deworming paper in light of the “replication,” “reanalysis,” and “systematic review.” I also discuss the nature of the link between this single paper's findings, other papers’ findings, and any policy recommendations about deworming. Through this example, I provide a perspective on the ways replication and reanalysis work, the strengths and weaknesses of systematic reviews, and whether there is, in fact, a reproducibility crisis in economics.
... However, the reasons why reproducibility fails are reasonably similar. In many cases, it is impossible to access the data and code of other researchers, which significantly decreases the chance of reproducing scientific results [La22,St19a,SSM18,CL15]. If research data is accessible, missing directions on using the given data and code is another substantial issue [Ma20, Ra19,St19a]. ...
... With digital research data playing an essential role in most research fields and technological advances that facilitate access to data sharing platforms, scientific journals and other stakeholders across many scientific disciplines introduced policies on data handling to improve research data sharing. The investigations by [La22,SKL18,SSM18,CL15] show that the reproducibility of scientific results correlates with the availability of research data and that the introduction of policies enforcing obligatory data sharing by scientific journals significantly affects the availability of research data. Further recommendations by the research community to increase the reproducibility of scientific results include ...
Conference Paper
Full-text available
Reproducible research results are among the pillars of sustainable science, and considerable progress has been achieved in this direction recently. However, there is much room for improvement across the research communities. Here we analyze the reproducibility of 108 publications from an interdisciplinary Collaborative Research Center on applied mathematics in various scientific fields. Based on a previous reproducibility study in hydrology, we identify the rate of reproducible scientific results and why reproducibility fails. We identify the main problems that hinder reproducible results and relate them to previous interventions targeting the research culture of reproducible scientific findings. Thus, the success of our measures can be estimated, and specific recommendations for future work can be derived. In our study, the number of publications that allow for at least partly reproducible research results increased over time. However, we see an ongoing need for directives and support in research data management among research communities since issues concerning data accessibility and quality limit the reproducibility of scientific results. We argue that our results are representative of other interdisciplinary research areas.
... The issue is partially solved with the use of basename function in the code cleaning stage. All things considered, we note that our automated study has a comparable success rate to the reported manual reproducibility studies 25,26 , which gives strength to the overall significance of our results. Though conducting a reproducibility study with human intervention would result in more sophisticated findings, it is labor-intensive on a large scale. ...
... Given that most of the datasets in our study belong to the social sciences, we reference a few reproducibility studies in this domain that emphasize its computational component (i.e., use the same data and code). Chang and Li attempt to reproduce results from 67 papers published in 13 well-regarded economic journals using the deposited supplementary material 26 . They successfully reproduced 33% of the results without contacting the authors and 43% with the authors' assistance. ...
Article
Full-text available
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
... In some cases, the code would not crash if it was re-executed in RStudio or if it had an available directory for saving outputs. All things considered, we note that our automated study has a comparable success rate to the reported manual reproducibility studies [25,26], which gives strength to the overall significance of our results. Though conducting a reproducibility study with human intervention would result in more sophisticated findings, it is labor-intensive on a large scale. ...
... Given that most of the datasets in our study belong to the social sciences, we reference a few reproducibility studies in this domain that emphasize its computational component (i.e., use the same data and code). Chang and Li attempt to reproduce results from 67 papers published in 13 well-regarded economic journals using the deposited supplementary material [26]. They successfully reproduced 33% of the results without contacting the authors and 43% with the authors' assistance. ...
Preprint
Full-text available
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74\% of R files crashed in the initial execution, while 56\% crashed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals' collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
... Since replications do not seem to occur naturally, systematic and coordinated replication and reproduction programmes have emerged (Brodeur et al. 2024 a). Following several large-scale replicability and computational reproducibility projects conducted over the past decade (Chang and Li 2015 ;Open Science Collaboration 2015 ;Camerer et al. 2016Camerer et al. , 2018, the boundary has recently been pushed to meta-robustness reproductions (Brodeur et al. 2024 d;Campbell et al. 2024 ). Such projects have a higher explanatory power and visibility than stand-alone replications, and reduce adverse effects on the careers of replicators (Brodeur et al. 2024 a). ...
Article
Full-text available
Robustness reproductions and replicability discussions are on the rise in response to concerns about a potential credibility crisis in economics. This paper proposes a protocol to structure reproducibility and replicability assessments, with a focus on robustness. Starting with a computational reproduction upon data availability, the protocol encourages replicators to prespecify robustness tests, prior to implementing them. The protocol contains three different reporting tools to streamline the presentation of results. Beyond reproductions, our protocol assesses adherence to the pre-analysis plans in the replicated papers as well as external and construct validity. Our ambition is to put often controversial debates between replicators and replicated authors on a solid basis and contribute to an improved replication culture in economics.
... Rebuttals, which are publications intended to point out important flaws in a published study, do not seem to affect future citations of the rebutted paper (Banobi et al., 2011). Although a significant number of studies in psychology (Hardwicke et al., 2018), economics (Chang and Li, 2015) and management (Bergh et al., 2017;Hofman et al., 2021) do not contain enough details to make a replication/reproduction possible, they are still taken as an evidence for the existence of the effect, if their methodology is considered as sound (Flickinger et al., 2014). ...
Article
Purpose Replication is a primary self-correction device in science. In this paper, we have two aims: to examine how and when the results of replications are used in management and organization research and to use the results of this examination to offer guidelines for improving the self-correction process. Design/methodology/approach Study 1 analyzes co-citation patterns for 135 original-replication pairs to assess the direct impact of replications, specifically examining how often and when a replication study is co-cited with its original. In Study 2, a similar design is employed to measure the indirect impact of replications by assessing how often and when a meta-analysis that includes a replication of the original study is co-cited with the original study. Findings Study 1 reveals, among other things, that a huge majority (92%) of sources that cite the original study fail to co-cite a replication study, thus calling into question the impact of replications in our field. Study 2 shows that the indirect impact of replications through meta-analyses is likewise minimal. However, our analyses also show that replications published in the same journal that carried the original study and authored by teams including the authors of the original study are more likely to be co-cited, and that articles in higher-ranking journals are more likely to co-cite replications. Originality/value We use our results to formulate recommendations that would streamline the self-correction process in management research at the author-, reviewer- and journal-level. Our recommendations would create incentives to make replication attempts more common, while also increasing the likelihood that these attempts are targeted at the most relevant original studies.
... It is all the more surprising that empirical research across disciplines today faces a replicability crisis (e. g., economics or biology, see Chang & Li, 2015;Errington et al., 2014; for overviews across disciplines, see Hoffmann et al., 2021;Munafò, 2016;Open Science Collaboration, 2015;Pashler & Wagenmakers, 2012), as does (sport) psychology (cf. this Special Issue and the preceding one in 2017 in the Zeitschrift für Sportpsychologie; Tamminen & Poucher, 2018), even though the reproducibility debate was recently termed as an opportunity and not a crisis (Munafò et al., 2022). ...
Article
Full-text available
Open Science is an important development in science, not only to overcome the replication crisis or crisis of confidence but also to openly and transparently describe research processes to enable replication and reproduction. This study describes the current state of the art within German-speaking sport psychology regarding open science-related attitudes, behaviors, and intentions and identifies the reasons for a potential reluctance toward open science. The findings revealed a match between open science-related attitudes and intentions, although open science-related behaviors still fall behind those two. We discovered time constraints, time allocation issues, and anticipated competitive disadvantages if not all researchers adhere to open science practices as the reasons behind this behavioral reluctance. Our findings suggest that the development of open science has clearly reached the German-speaking sport-psychological community, but that there is considerable potential for improvement – especially regarding behaviors.
... However, internal replication studies provided a false sense of replicability because researchers used questionable research practices to produce successful internal replications (Francis, 2014;John et al., 2012;Schimmack, 2012). The pervasive presence of publication bias at least partially explains replication failures in social psychology (Open Science Collaboration, 2015;Pashler & Wagenmakers, 2012;Schimmack, 2020;Schimmack, 2012) medicine (Begley & Ellis, 2012;Prinz et al., 2011), and economics (Camerer et al., 2016;Chang & Li, 2015). ...
Article
Full-text available
Selection for statistical significance is a well-known factor that distorts the published literature and challenges the cumulative progress in science. Recent replication failures have fueled concerns that many published results are false-positives. Brunner and Schimmack (2020) developed z-curve, a method for estimating the expected replication rate (ERR) – the predicted success rate of exact replication studies based on the mean power after selection for significance. This article introduces an extension of this method, z-curve 2.0. The main extension is an estimate of the expected discovery rate (EDR) – the estimate of a proportion that the reported statistically significant results constitute from all conducted statistical tests. This information can be used to detect and quantify the amount of selection bias by comparing the EDR to the observed discovery rate (ODR; observed proportion of statistically significant results). In addition, we examined the performance of bootstrapped confidence intervals in simulation studies. Based on these results, we created robust confidence intervals with good coverage across a wide range of scenarios to provide information about the uncertainty in EDR and ERR estimates. We implemented the method in the zcurve R package (Bartoš & Schimmack, 2020).
... Today, the researcher must contend with a wide range of practical issues, the through line of which is concern with long-term discovery, access, and usefulness of replication files. Though journals have implemented code and data availability policies in the field of economics for almost 40 years, Chang and Li (2015) concluded that "economics research is generally not replicable" (11), largely due to the "missing data or code" for the majority of the 60 papers whose research they attempted to replicate. ...
... On a broader level, the issue of replicability or transparency has become a focal topic across multiple social and natural science disciplines (Aguinis et al., 2017;Chang & Li, 2015; National Academies of Sciences, 2019; Pratt et al., 2020). Although replicability or transparency criteria are vital for research credibility in all disciplines, the issue has not captured the attention of HRD researchers. ...
Article
Full-text available
Problem The NHRD conception claimed to be based on multiple country-cases through a constructive/interpretive process. However, four of cases focusing on HRD policy in China presented incomplete history of China’s HRD policies, which may have misled the NHRD conception. Solution We re-examine China’s history of HRD policy as indigenous phenomena in comparison to the four China-cases. Adopting a similar historical method, we fail to identify the policy pattern reported by the previous cases, thus challenge the NHRD’s constructivist embeddedness. We question the credibility and trustworthiness of the country-based studies as well as the sense-making constructive base of the NHRD ideation. From China’s local phenomenon, we derived a set of HRD assumptions contrary to the existing western-centric assumptions to enrich the global HRD knowledge. Stakeholders Theory-minded HRD scholars intended for rigorous and relevant theory development inquiries; practice-oriented HRD practitioners, especially those from western context and working in a non-western HRD context.
... On a broader level, the issue of replicability or transparency has become a focal topic across multiple social and natural science disciplines (Aguinis et al., 2017;Chang & Li, 2015;Pratt et al., 2020). While replicability or transparency criteria are vital for research credibility in all disciplines, the issue has not captured the attention of HRD researchers. ...
Preprint
Full-text available
Problem: The NHRD conception claimed to be based on multiple country-cases through the constructive/interpretive process. However, four of cases focusing on HRD policy in China presented incomplete history of China's HRD policies, which may have misled the NHRD conception. Solution: We reexamine China's history of HRD policy as indigenous phenomena in comparison to the four China-cases. Adopting a similar historical method, we fail to identify the policy pattern reported by the previous cases, thus challenge the NHRD's constructivist embeddedness. We question the credibility and trustworthiness of the country-based studies as well as the sense-making constructive base of the NHRD ideation. From China's local phenomenon, we derived a set of HRD assumptions contrary to the existing western-centric ones to enrich the global HRD knowledge. Stakeholders: Theory-minded HRD scholars intended for rigorous and relevant theory development inquiries; practice-oriented HRD practitioners, especially those from western context and working in a non-western socioeconomic context.
... For this reason, the need for replications of results from experimental philosophy is frequently brought up (for the literature on the replication crisis in sciences other than experimental philosophy see e.g. : Ioannidis 2005;Chang and Li 2015;Open Science Collaboration 2015;Baker 2016;Miłkowski et al. 2018; for attempts to corroborate some classic X-Phi results see e.g.: Rose et al. 2017;Machery et al. 2020; van Dongen et al. 2020;Ziółkowski 2021;Cova et al. 2021; for methodological considerations concerning replication studies see Machery, 2020). The first large-scale replication project in experimental philosophy carried out by Cova et al. (2021) reran 40 X-Phi studies and, interestingly, the researchers found that demographic (including cross-cultural) effects were less likely to be corroborated than content-based effects (i.e. ...
Article
Full-text available
The cross-cultural differences in epistemic intuitions reported by Weinberg, Nichols and Stich (2001; hereafter: WNS) laid the ground for the negative program of experimental philosophy. However, most of WNS’s findings were not corroborated in further studies. The exception here is the study concerning purported differences between Westerners and Indians in knowledge ascriptions concerning the Zebra Case, which was never properly replicated. Our study replicates the above-mentioned experiment on a considerably larger sample of Westerners (n = 211) and Indians (n = 204). The analysis found a significant difference between the ethnic groups in question in the predicted direction: Indians were more likely to attribute knowledge in the Zebra Case than Westerners. In this paper, we offer an explanation of our result that takes into account the fact that replications of WNS’s other experiments did not find any cross-cultural differences. We argue that the Zebra Case is unique among the vignettes tested by WNS since it should not be regarded as a Gettier case but rather as a scenario exhibiting skeptical pressure concerning the reliability of sense-perception. We argue that skepticism towards perception as a means of gaining knowledge is a trope that is deeply rooted in Western epistemology but is very much absent from Classical Indian philosophical inquiry. This line of reasoning is based on a thorough examination of the skeptical scenarios discussed by philosophers of the Indian Nyaya tradition and their adversaries.
... Some of the lack of replicability identified by recent studies (Camerer, 2016;Chang & Li, 2015, 2017bHöffler, 2017b;Stodden et al., 2018)) occurs despite the fact that journals have policies that encourage the provision of replication packages. Evaluating compliance with policies as well as quality and utility of replication packages is arduous, if not impossible, due to a lack of consistent, reliable metadata on the materials provided to journals. ...
Article
Full-text available
We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque.
... Data, code, materials, and design and analysis transparency standards require that the respective elements are shared with the readers, thereby allowing for thorough checking and reproducibility of research results. Even though data and code availability do not always ensure perfect reproducibility (Hardwicke et al. 2021;Chang and Li 2015), the ability to rerun the code on data increases trust in the results. Furthermore, the ability to inspect all the materials and raw results allows for independent evaluation of threats to external validity (Shadish, Campbell, and Cook 2001), such as under-or over-representation of certain demographics or specifics of materials used for intervention. ...
Article
Full-text available
Growing concerns about the credibility of scientific findings have sparked a debate on new transparency and openness standards in research. Management and organization studies scholars generally support the new standards, while emphasizing the unique challenges associated with their implementation in this paradigmatically diverse discipline. In this study, I analyze the costs to authors and journals associated with the implementation of new transparency and openness standards, and provide a progress report on the implementation level thus far. Drawing on an analysis of the submission guidelines of 60 empirical management journals, I find that the call for greater transparency was received, but resulted in implementations that were limited in scope and depth. Even standards that could have been easily adopted were left unimplemented, producing a paradoxical situation in which research designs that need transparency standards the most are not exposed to any, likely because the standards are irrelevant to other research designs.
... Of those 58 journals, 34 had a mandatory policy (58.6%). Chang and Li (2015) and Vlaeminck and Podkrajac (2017) have examined how voluntary and mandatory data policies perform. The authors concluded that mandatory data policies perform better, because the probability to find the data necessary to conduct reproductions is higher than for journals with voluntary data policies. ...
Article
Full-text available
In the field of social sciences and particularly in economics, studies have frequently reported a lack of reproducibility of published research. Most often, this is due to the unavailability of data reproducing the findings of a study. However, over the past years, debates on open science practices and reproducible research have become stronger and louder among research funders, learned societies, and research organisations. Many of these have started to implement data policies to overcome these shortcomings. Against this background, the article asks if there have been changes in the way economics journals handle data and other materials that are crucial to reproduce the findings of empirical articles. For this purpose, all journals listed in the Clarivate Analytics Journal Citation Reports edition for economics have been evaluated for policies on the disclosure of research data. The article describes the characteristics of these data policies and explicates their requirements. Moreover, it compares the current findings with the situation some years ago. The results show significant changes in the way journals handle data in the publication process. Research libraries can use the findings of this study for their advisory activities to best support researchers in submitting and providing data as required by journals.
... Yet even when data are shared, sharing does not guarantee that the results reported in the publications are reproducible. In several studies, only a small proportion (ranges from 17% to 37%) of findings can be reproduced without author assistance (Chang & Li, 2015;Eubank, 2016;Hardwicke et al., 2018). ...
Chapter
In this chapter, we bridge research on scientific and counterfactual reasoning. We review findings that children struggle with many aspects of scientific experimentation in the absence of formal instruction, but show sophistication in the ability to reason about counterfactual possibilities. We connect these two sets of findings by reviewing relevant theories on the relation between causal, scientific, and counterfactual reasoning before describing a growing body of work that indicates that prompting children to consider counterfactual alternatives can scaffold both the scientific inquiry process (hypothesis-testing and evidence evaluation) and science concept learning. This work suggests that counterfactual thought experiments are a promising pedagogical tool. We end by discussing several open questions for future research.
... Yet even when data are shared, sharing does not guarantee that the results reported in the publications are reproducible. In several studies, only a small proportion (ranges from 17% to 37%) of findings can be reproduced without author assistance (Chang & Li, 2015;Eubank, 2016;Hardwicke et al., 2018). ...
Chapter
Young children typically begin learning words during their first 2 years of life. On the other hand, they also vary substantially in their language learning. Similarities and differences in language learning call for a quantitative theory that can predict and explain which aspects of early language are consistent and which are variable. However, current developmental research practices limit our ability to build such quantitative theories because of small sample sizes and challenges related to reproducibility and replicability. In this chapter, we suggest that three approaches—meta-analysis, multi-site collaborations, and secondary data aggregation—can together address some of the limitations of current research in the developmental area. We review the strengths and limitations of each approach and end by discussing the potential impacts of combining these three approaches.
... Science is said to be in 'crisis' (Redman 2015); reports of prevalent research misbehaviour, (Martinson et al. 2005, Necker 2014, Fanelli 2009, Marušić et al. 2011) high profile cases of scientific misconduct, and poor replicability of research findings (Open Science Collaboration 2015, Begley and Ellis 2012, Chang and Li 2015, Ioannidis 2017, Munafo et al. 2017) threaten public trust in science. Poor research practices can produce misleading results and waste resources (Ioannidis et al. 2014). ...
Article
Full-text available
Background: Recognising the importance of addressing ethics and research integrity (ERI) in Europe, in 2017, the All European Academies (ALLEA) published a revised and updated European Code of Conduct for Research Integrity (ECoC). Consistent application of the ECoC by researchers across Europe will require its widespread dissemination, as well as an innovative training programme and novel tools to enable researchers to truly uphold and internalise the principles and practices listed in the Code. Aim: VIRT ² UE aims to develop a sustainable train-the-trainer blended learning programme enabling contextualised ERI teaching across Europe focusing on understanding and upholding the principles and practices of the ECoC. Vision: The VIRT ² UE project recognises that researchers not only need to have knowledge of the ECoC, but also to be able to truly uphold and internalise the principles underpinning the code. They need to learn how to integrate them into their everyday practice and understand how to act in concrete situations. VIRT ² UE addresses this challenge by providing ERI trainers and researchers with an innovative blended (i.e. combined online and off-line approaches) learning programme that draws on a toolbox of educational resources and incorporates an e-learning course (including a YouTube channel) and face-to-face sessions designed to foster moral virtues. ERI trainers and researchers from academia and industry will have open access to online teaching material. Moreover, ERI trainers will learn how to facilitate face-to-face sessions of researchers, which focus on learning how to apply the content of the teaching material to concrete situations in daily practice. Objectives: VIRT ² UE’s work packages (WP) will: conduct a conceptual mapping amongst stakeholders to identify and rank the virtues which are essential for good scientific practice and their relationship to the principles and practices of the ECoC (WP1); identify and consult ERI trainers and the wider scientific community to understand existing capacity and deficiencies in ERI educational resources (WP2); develop the face-to-face component of the train-the-trainer programme which provides trainers with tools to foster researchers’ virtues and promote the ECoC and iteratively develop the programme based on evaluations (WP3); produce educational materials for online learning by researchers and trainers (WP4); implement and disseminate the train-the-trainer programme across Europe, ensuring the training of sufficient trainers for each country and build capacity and consistency by focusing on underdeveloped regions and unifying fragmented efforts (WP5); and develop the online training platform and user interface, which will be instrumental in evaluation of trainers’ and researchers’ needs and project sustainability (WP6). Impact: The VIRT ² UE training programme will promote consistent application of the ECoC across Europe. The programme will affect behaviour on the individual level of trainers and researchers – simultaneously developing an understanding of the ECoC and other ERI issues, whilst also developing scientific virtues, enabling the application of the acquired knowledge to concrete situations and complex moral dilemmas. Through a dedicated embedding strategy, the programme will also have an impact on an institutional level. The train-the-trainer approach multiplies the impact of the programme by reaching current and future European ERI trainers and, subsequently, the researchers they train.
... This report was seminal in generating various meta-research analyses demonstrating that reproducibility of results and reassessments of computational working are the exception rather than the rule (e.g. Chang and Li 2015;Nuzzo 2015;Peng 2011;Resnik and Shamoo 2016;Bruner and Holman 2019). ...
Article
Full-text available
This article describes an attempt to reproduce the published analysis from three archaeological field-walking surveys by using datasets collected between 1990 and 2005 which are publicly available in digital format. The exact methodologies used to produce the analyses (diagrams, statistical analysis, maps, etc.) are often incomplete, leaving a gap between the dataset and the published report. By using the published descriptions to reconstruct how the outputs were manipulated, I expected to reproduce and corroborate the results. While these experiments highlight some successes, they also point to significant problems in reproducing an analysis at various stages, from reading the data to plotting the results. Consequently, this article proposes some guidance on how to increase the reproducibility of data in order to assist aspirations of refining results or methodology. Without a stronger emphasis on reproducibility, the published datasets may not be sufficient to confirm published results and the scientific process of self-correction is at risk.
... To overcome this crisis of confidence, replicating original findings is becoming increasingly more common in the scientific community (e.g. [12][13][14][15][16][17][18]). ...
Article
Full-text available
To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.
... The cumulative effect of lack of replication, bias in reporting, preference for particular outcomes, and the ability to preferentially obtain these outcomes is a body of empirical research findings that largely cannot be reproduced (verified through conducting the original analysis with the original data; Chang et al., 2015;Hardwicke et al., 2018;Ioannidis et al., 2009) or replicated (verified through conducting the original analysis with newly collected data; Camerer et al., 2016Camerer et al., , 2018Cova et al., 2021;Open Science Collaboration [OSC], 2015). Even bodies of research that have been supported through existing standards of evidence, broad use of meta-analyses, and many conceptual replications can sometimes not be replicated (e.g., Hagger et al., 2016). ...
Article
Improving research culture to value transparency and rigor is necessary to engage in a productive “Credibility Revolution.” The field of educational psychology is well positioned to act toward this goal. It will take specific actions by both grassroots groups plus leadership to set standards that will ensure that getting published, funded, or hired is determined by universally supported ideals. These improved standards must ensure that transparency, rigor, and credibility are valued above novelty, impact, and incredibility. Grassroots groups advocate for change and share experience so that the next generation of researchers have the experience needed to sustain these early moves. Each community can take inspiration from others that have made shifts toward better practices. These instances provide opportunities for emulating trail-blazers, training for new practices such as preregistration, and constructively evaluating or criticizing practice in ways that advances the reputation of all involved.
... With recent efforts showing some high profile works failing to reproduce [40][41][42], attempts have been made to determine why such works fail to reproduce [43,44], what policies can be taken to decrease reproduction failures [36,45] and whether such policies are effective [11,42,46]. Despite these efforts, scientific results remain challenging to reproduce across many disciplines [11,41,[47][48][49][50][51][52][53][54][55][56][57][58]. We return to these issues in the Discussion section, when we contrast our work with previous and related efforts. ...
Article
Full-text available
We carry out efforts to reproduce computational results for seven published articles and identify barriers to computational reproducibility. We then derive three principles to guide the practice and dissemination of reproducible computational research: (i) Provide transparency regarding how computational results are produced; (ii) When writing and releasing research software, aim for ease of (re-)executability; (iii) Make any code upon which the results rely as deterministic as possible. We then exemplify these three principles with 12 specific guidelines for their implementation in practice. We illustrate the three principles of reproducible research with a series of vignettes from our experimental reproducibility work. We define a novel Reproduction Package , a formalism that specifies a structured way to share computational research artifacts that implements the guidelines generated from our reproduction efforts to allow others to build, reproduce and extend computational science. We make our reproduction efforts in this paper publicly available as exemplar Reproduction Packages . This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.
... 1 Duvendack et al. (2015) looked across 162 replication studies in economics journals and found that two out of three were unable to confirm the original findings. Chang and Li (2015) in a smaller study were able to themselves replicate the main result in one third of cases where they had access to the original data and code. Assistance from the authors of the studies increased this to about one half. ...
Article
In 2010, the world’s first Social Impact Bond (SIB) was launched at Peterborough Prison. It was used to fund an intervention – ‘The One Service’ – aimed at reducing the reoffending among prisoners discharged after serving a sentence of less than 12 months. Under the terms of the SIB, investors are paid according to how successful the One Service is in reducing reconvictions. If a minimum threshold of a 7.5% reduction in reconviction events is reached across the pilot, payment is triggered. Additionally, there is an option to trigger an early payment if a 10% reduction is noted in the number of reconviction events in individual cohorts. A propensity score matching (PSM) approach was used to estimate the impact. For cohort 1, the impact was estimated, by a previous team of independent assessors, to be a reduction in reconviction events of 8.4% (Jolliffe and Hedderman, 2014). This report reviews the PSM approach, prompted in part by the desire to understand the reasons behind the differences in reconviction rates between prisoners discharged from HMP Peterborough and prisoners discharged from other prisons. Its primary aim is to identify whether there is a need to revise the approach taken before it is applied to cohort 2 (and the final cohort - the weighted mean of cohort 1 and cohort 2). To do this, various amendments to the methodology were explored. It was not possible to replicate the results of Jolliffe and Hedderman (2014). This should perhaps be no surprise given the difficulties often encountered with replication attempts. However, the difference between the replication result and the Jolliffe and Hedderman (2014) result was not statistically significant. Since this review did not identify any clear improvement in the matching process, the conclusion is that the cohort 1 approach be maintained. We also recommend an adjustment to the sample selection in cohort 2. It is important to note that this recommendation is based on the analysis of cohort 1 data and is not informed by cohort 2 reoffending data. In more detail, the review began with a replication of Jolliffe and Hedderman (2014) and then explored whether various changes might give better estimates: changing the set of variables included in the propensity score model; changing how propensity score matching was implemented; allowing for unobserved prison- and/or area-specific differences in outcomes. The analysis used data for cohort 1 plus a number of years prior to the introduction of the One Service. The results suggest: it may be possible to alter the set of variables in the propensity score model in order to achieve a somewhat better fit, but this did not lead to any overall improvement in the methodology; changing the implementation of matching did not achieve any overall improvements in the model; controlling for historic differences in outcomes between prisons is problematic since the mean number of reconviction events at HMP Peterborough has followed a different trend from other prisons over the 2006-2009 pre-treatment period. In view of these findings, the recommendations for cohort 2 are as follows: Use the same matching variables as for cohort. While some advantages to altering the variable set were found, these were not sufficient to justify a change; Use the same matching approach. The cohort 1 approach of (up to) 10:1 matching without replacement, within a 0.05 caliper performed as well as alternative approaches; Estimate effects only for men aged 21 or above. This reflects the fact that, in cohort 1, there were almost no under-21s at HMP Peterborough. If under-21s are similarly absent at HMP Peterborough in cohort 2, the recommendation is to exclude them from comparator prisons too. This will help make the composition of individuals from other prisons more similar to that of HMP Peterborough; Report standard errors of impact estimates. This is suggested as a means of providing some sense of the statistical significance of the estimated impacts. The protocol for cohort 2 and the final cohort is included as an appendix.
... Difficulty establishing the analytic reproducibility of research reports has been encountered in several scientific domains, including economics, political science, behavioural ecology and psychology [3][4][5][6]8,9]; but see [10]. A preliminary obstacle for many such studies is that research data are typically unavailable [11][12][13][14]. ...
Article
Full-text available
For any scientific report, repeating the original analyses upon the original data should yield the original outcomes. We evaluated analytic reproducibility in 25 Psychological Science articles awarded open data badges between 2014 and 2015. Initially, 16 (64%, 95% confidence interval [43,81]) articles contained at least one ‘major numerical discrepancy' (>10% difference) prompting us to request input from original authors. Ultimately, target values were reproducible without author involvement for 9 (36% [20,59]) articles; reproducible with author involvement for 6 (24% [8,47]) articles; not fully reproducible with no substantive author response for 3 (12% [0,35]) articles; and not fully reproducible despite author involvement for 7 (28% [12,51]) articles. Overall, 37 major numerical discrepancies remained out of 789 checked values (5% [3,6]), but original conclusions did not appear affected. Non-reproducibility was primarily caused by unclear reporting of analytic procedures. These results highlight that open data alone is not sufficient to ensure analytic reproducibility.
Article
Full-text available
The escalating costs of research and development, coupled with the influx of researchers, have led to a surge in published articles across scientific disciplines. However, concerns have arisen regarding the accuracy, validity, and reproducibility of reported findings. Issues such as replication problems, fraudulent practices, and a lack of expertise in measurement theory and uncertainty analysis have raised doubts about the reliability and credibility of scientific research. Rigorous assessment practices in certain fields highlight the importance of identifying potential errors and understanding the relationship between technical parameters and research outcomes. To address these concerns, a universally applicable criterion called comparative certainty is urgently needed. This criterion, grounded in an analysis of the modeling process and information transmission, accumulation, and transformation in both theoretical and applied research, aims to evaluate the acceptable deviation between a model and the observed phenomenon. It provides a theoretically grounded framework applicable to all scientific disciplines adhering to the International System of Units (SI). Objective evaluations based on this criterion can enhance the reproducibility and reliability of scientific investigations, instilling greater confidence in published findings. Establishing this criterion would be a significant stride towards ensuring the robustness and credibility of scientific research across disciplines.
Article
Reproducibility is a central tenant of research. We aimed to synthesize the literature on reproducibility and describe its epidemiological characteristics, including how reproducibility is defined and assessed. We also aimed to determine and compare estimates for reproducibility across different fields.
Article
Reproducibility is a central tenant of research. We aimed to synthesize the literature on reproducibility and describe its epidemiological characteristics, including how reproducibility is defined and assessed. We also aimed to determine and compare estimates for reproducibility across different fields.
Article
Reproducibility is a central tenant of research. We aimed to synthesize the literature on reproducibility and describe its epidemiological characteristics, including how reproducibility is defined and assessed. We also aimed to determine and compare estimates for reproducibility across different fields.
Article
Full-text available
Introduction: Reproducibility is a central tenant of research. We aimed to synthesize the literature on reproducibility and describe its epidemiological characteristics, including how reproducibility is defined and assessed. We also aimed to determine and compare estimates for reproducibility across different fields. Methods: We conducted a scoping review to identify English language replication studies published between 2018-2019 in economics, education, psychology, health sciences and biomedicine. We searched Medline, Embase, PsycINFO, Cumulative Index of Nursing and Allied Health Literature – CINAHL, Education Source via EBSCOHost, ERIC, EconPapers, International Bibliography of the Social Sciences (IBSS), and EconLit. Documents retrieved were screened in duplicate against our inclusion criteria. We extracted year of publication, number of authors, country of affiliation of the corresponding author, and whether the study was funded. For the individual replication studies, we recorded whether a registered protocol for the replication study was used, whether there was contact between the reproducing team and the original authors, what study design was used, and what the primary outcome was. Finally, we recorded how reproducibilty was defined by the authors, and whether the assessed study(ies) successfully reproduced based on this definition. Extraction was done by a single reviewer and quality controlled by a second reviewer. Results: Our search identified 11,224 unique documents, of which 47 were included in this review. Most studies were related to either psychology (48.6%) or health sciences (23.7%). Among these 47 documents, 36 described a single reproducibility study while the remaining 11 reported at least two reproducibility studies in the same paper. Less than the half of the studies referred to a registered protocol. There was variability in the definitions of reproduciblity success. In total, across the 47 documents 177 studies were reported. Based on the definition used by the author of each study, 95 of 177 (53.7%) studies reproduced. Conclusion: This study gives an overview of research across five disciplines that explicitly set out to reproduce previous research. Such reproducibility studies are extremely scarce, the definition of a successfully reproduced study is ambiguous, and the reproducibility rate is overall modest. Funding: No external funding was received for this work.
Chapter
Extreme market events cannot be understood without reference to the complex aspects of money and credit. National and international government and banking policies and politics and attitudes in response to economic conditions greatly affect business and personal balance sheets and income statements and incentives to save, spend, or invest. This chapter surveys ties between interest rates and market movements and focuses on the roles played by central banks and the effects of their policies.
Chapter
This chapter provides a necessary review and critique of the traditional definitions of Geopolitical Risk (GPR) and Transition Economies, which is relevant given the major political, social, economic and psychological global crises that occurred during the last fifteen years (COVID-19; Regulatory Convergence; Trade Wars; Protectionism; the rise of the Far-Right and Nationalist movements in various countries; the Global Financial Crisis of 2007–2014; constitutional amendments in various countries etc.). The chapter also discusses the literature on Cross-Border Spillovers, Social Capital and Social Networks, all of which are critical elements of Geopolitical Risk.
Book
Chúng tôi biên soạn cuốn sách này hướng tới quá trình tự tìm hiểu, sử dụng, và khám phá. Tuy nhiên, có nhiều phần thông tin hiện trạng chưa thực sự đầy đủ, ở nhiều mục sẽ có nhiều điểm quý vị độc giả cần xem phần ‘Tài liệu tham khảo’. Sự thiếu sót này xuất phát từ giới hạn của việc giới thiệu nhiều bài toán bằng một cuốn sách 200 trang. Các bài báo khoa học cung cấp từ 5.000 đến 10.000 chữ để giải quyết một bài toán trọn vẹn. Vì vậy, sự kết hợp giữa cuốn sách và các tài liệu tham khảo là cần thiết cho hành trình tự khám phá cùng bayesvl. Đối với việc sử dụng tài liệu tham khảo, có nhiều tài liệu bằng tiếng Anh nên có thể khiến nhiều độc giả gặp cản trở về ngôn ngữ. Các hạn chế này hy vọng có thể được khắc phục đầy đủ hơn.
Article
Recent interest in promoting replication efforts assumes that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. The paper describes research designs for replication and demonstrates how multiple designs may be combined in systematic replication efforts, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
Article
Full-text available
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists' gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed.
Article
Full-text available
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed. Keywords: crowdsourcing data analysis; scientific transparency; research reliability; scientific robustness; researcher degrees of freedom; analysis-contingent results
Article
A reproducible analysis is one in which an independent entity, using the same data and the same statistical code, would obtain the exact same result as the previous analyst. Reproducible analyses utilize script-based analyses and open data to aid in the reproduction of the analysis. A reproducible analysis does not ensure the same results are obtained if another sample of data is obtained, often referred to as replicability. Reproduction and replication of studies are discussed as well as the overwhelming benefits of creating a reproducible analysis workflow. A tool is proposed to aid in the evaluation of studies to describe which element in a study has a strong reproducible workflow and areas that could be improved. This tool is meant to serve as a discussion tool, not to rank studies or devalue studies that are unable to share data or statistical code. Finally, discussion surrounding reproducibility for qualitative studies are discussed along with unique challenges for adopting a reproducible analysis framework.
Article
Full-text available
Credibility building activities in computational research include verification and validation, reproducibility and replication, and uncertainty quantification. Though orthogonal to each other, they are related. This paper presents validation and replication studies in electromagnetic excitations on nanoscale structures, where the quantity of interest is the wavelength at which resonance peaks occur. The study uses the open-source software PyGBe : a boundary element solver with treecode acceleration and GPU capability. We replicate a result by Rockstuhl et al. (2005, doi:10/dsxw9d) with a two-dimensional boundary element method on silicon carbide (SiC) particles, despite differences in our method. The second replication case from Ellis et al. (2016, doi:10/f83zcb) looks at aspect ratio effects on high-order modes of localized surface phonon-polariton nanostructures. The results partially replicate: the wavenumber position of some modes match, but for other modes they differ. With virtually no information about the original simulations, explaining the discrepancies is not possible. A comparison with experiments that measured polarized reflectance of SiC nano pillars provides a validation case. The wavenumber of the dominant mode and two more do match, but differences remain in other minor modes. Results in this paper were produced with strict reproducibility practices, and we share reproducibility packages for all, including input files, execution scripts, secondary data, post-processing code and plotting scripts, and the figures (deposited in Zenodo). In view of the many challenges faced, we propose that reproducible practices make replication and validation more feasible. This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.
Article
Full-text available
The purpose of this article is primarily to introduce the topic of scientific uncertainty to the wider context of economics and management. Scientific uncertainty is one of the manifestations of irreducible uncertainty and reflection on it should enable better decision making. An entity that bases its operation on current scientific research, which depreciates over time and ultimately leads to erroneous decisions, is referred to as the “loser”. The text indicates estimation of potential scale of this problem supplemented by an outline of sociological difficulties identified in the analysis of the process of building scientific statements. The article ends with a sketch of the answer to the question “how to act in the context of scientific uncertainty?”.
Article
Tests of the relative performance of multiple forecasting models are sensitive to how the set of alternatives is defined. Evaluating one model against a particular set may show that it has superior predictive ability. However, changing the number or type of alternatives in the set may demonstrate otherwise. This paper focuses on forecasting models based on technical analysis and analyzes how much data snooping bias can occur in tests from restricting the size of forecasting model “universes” or ignoring alternatives used by practitioners and other researchers. A Monte Carlo simulation shows that false discoveries have an average increase of 0.72-2.5 percentage points each time one removes half of the prediction models from the set of relevant alternatives. A complementary empirical investigation suggests that at least 50% of positive findings reported in the literature concerned with trading rule overperformance may be false. Our results motivate several recommendations for applied researchers that would alleviate data snooping bias in some of the more popular statistical tests used in the literature.
Article
Full-text available
A combination of confirmation bias, hindsight bias, and pressure to publish may prompt the (unconscious) exploration of various methodological options and reporting only the ones that lead to a (statistically) significant outcome. This undisclosed analytic flexibility is particularly relevant in EEG research, where a myriad of preprocessing and analysis pipelines can be used to extract information from complex multidimensional data. One solution to limit confirmation and hindsight bias by disclosing analytic choices is preregistration: researchers write a time-stamped, publicly accessible research plan with hypotheses, data collection plan, and intended preprocessing and statistical analyses before the start of a research project. In this manuscript, we present an overview of the problems associated with undisclosed analytic flexibility, discuss why and how EEG researchers would benefit from adopting preregistration, provide guidelines and examples on how to preregister data preprocessing and analysis steps in typical ERP studies, and conclude by discussing possibilities and limitations of this open science practice.
Article
In this paper I explore an underdiscussed factor contributing to the replication crisis: Scientists, and following them policy makers, often neglect sources of errors in the production and interpretation of data and thus overestimate what can be learnt from them. This neglect leads scientists to conduct experiments that are insufficiently informative and science consumers, including other scientists, to put too much weight on experimental results. The former leads to fragile empirical literatures, the latter to surprise and disappointment when the fragility of the empirical basis of some disciplines is revealed.
Book
This is the first comprehensive overview of the 'science of science,' an emerging interdisciplinary field that relies on big data to unveil the reproducible patterns that govern individual scientific careers and the workings of science. It explores the roots of scientific impact, the role of productivity and creativity, when and what kind of collaborations are effective, the impact of failure and success in a scientific career, and what metrics can tell us about the fundamental workings of science. The book relies on data to draw actionable insights, which can be applied by individuals to further their career or decision makers to enhance the role of science in society. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists and graduate students, policymakers, and administrators with an interest in the wider scientific enterprise.
Article
Full-text available
Economists have recently adopted preanalysis plans in response to concerns about robustness and transparency in research. The increased use of registered preanalysis plans has raised competing concerns that detailed plans are costly to create, overly restrictive, and limit the type of inspiration that stems from exploratory analysis. We consider these competing views of preanalysis plans, and make a careful distinction between the roles of preanalysis plans and registries, which provide a record of all planned research. We propose a flexible “packraft” preanalysis plan approach that offers benefits for a wide variety of experimental and nonexperimental applications in applied economics. JEL CLASSIFICATION A14; B41; C12; C18; C90; O10; Q00
Article
Full-text available
Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study. Science , this issue 10.1126/science.aac4716
Article
Full-text available
Over the past decade, researchers have expressed concerns over what seemed to be a paucity of replications. In line with this, editorial policies of some leading marketing journals have been modified to encourage more replications. We conducted an extension of a 1994 study see whether these efforts have had an effect. In fact, the replication rate has fallen to 1.2 percent, a decrease in the rate by 50%. As things now stand, practitioners should be skeptical about using the results published in marketing journals as hardly any of them have been successfully replicated, teachers are advised to ignore the findings until they have been replicated, and researchers should put little stock in the outcomes of one-shot studies.
Article
Full-text available
*I thank John Siegfried and Robert Moffitt for helpful comments on previous drafts of this report, and Noel Whitehurst of Vanderbilt University for outstanding research assistance. The findings of this report were presented at the 2009 Joint Statistical Meetings in Washington, D.C.Replicating empirical results is an underemphasized, yet essential element of scientific progress. The rewards for replicating are often low (relative to original contributions) while the costs can be substantial. A researcher who sets off to do a serious replication study is likely to find that the task is tedious, more difficult than anticipated, and prone to souring relationships with colleagues. Add these deterrents to the fact that faculty don’t get promoted by replicating others ’ results, and it is no surprise that published replication studies are rare. Replication and robustness studies have been difficult to conduct because they usually require cooperation from the author(s). Researchers frequently fail to keep documented, well organized, and complete records of data and data processing programs underlying published articles, and are less than enthusiastic when asked to help replicate their work. A highly visible instance of this type of behavior became headline news in November 2009 when the so-called “climategate ” controversy broke. Stolen email records from the University of East Anglia revealed that researchers had resisted transparency and gone out of their way to prevent skeptics from analyzing their data. While no evidence has indicated that these researchers were covering up errors, their deliberate efforts to prevent others from independently verifying their results has reflected poorly on an important body of research, and science in general.
Article
Full-text available
This article explores a new approach to identifying government spending shocks which avoids many of the shortcomings of existing approaches. The new approach is to identify government spending shocks with statistical innovations to the accumulated excess returns of large US military contractors. This strategy is used to estimate the dynamic responses of output, hours, consumption and real wages to a government spending shock. We find that positive government spending shocks are associated with increases in output, hours and consumption. Real wages initially decline after a government spending shock and then rise after a year. We estimate the government spending multiplier associated with increases in military spending to be about 1.5 over a horizon of 5 years.
Article
Full-text available
This paper argues in favor of empirical models built by including in fiscal VAR models structural shocks identified via the narrative method. We first show that "narrative" shocks are orthogonal to the relevant information set a fiscal VAR. We then derive impulse responses to these shocks. The use of narrative shocks does not require the inversion of the moving-average representation of a VAR for the identification of the relevant shocks. Therefore, within this framework, fiscal multipliers can be identified and estimated even when, in the presence of "fiscal foresight," the MA representation of the VARs is not invertible. (JEL C32, E62, H20, H62, H63)
Article
Full-text available
The "business cycle" is a fundamental, yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the 'output-gap' view that the business cycle corresponds to transitory deviations in economic activity from a permanent or "trend" level. Then, we investigate the extent to which a general model-based approach to estimating trend and cycle for the United States in the postwar era produces measures of the business cycle that depend on models versus the data. We find strong empirical support for a nonlinear time series model that implies a highly asymmetric business cycle that is large and negative in recessions, but small and close to zero in expansions. Based on the principle of forecast combination, we use Bayesian model averaging to construct a model-free measure of the business cycle that also turns out to be highly asymmetric. This model-free measure of the business cycle is closely related to other measures of economic slack and implies a convex short-run aggregate supply curve. The asymmetric business cycle also potentially reconciles two long-standing, but competing theories about the main source of macroeconomic fluctuations.
Article
Full-text available
This paper examines the evidence on the relationship between credit spreads and economic activity. Using an extensive data set of prices of outstanding corporate bonds trading in the secondary market, we construct a credit spread index that is—compared with the standard default-risk indicators—a considerably more powerful predictor of economic activity. Using an empirical framework, we decompose our index into a predictable component that captures the available firm-specific information on expected defaults and a residual component—the excess bond premium. Our results indicate that the predictive content of credit spreads is due primarily to movements in the excess bond premium. Innovations in the excess bond premium that are orthogonal to the current state of the economy are shown to lead to significant declines in economic activity and equity prices. We also show that during the 2007–09 financial crisis, a deterioration in the creditworthiness of broker-dealers—key financial intermediaries in the corporate cash market—led to an increase in the excess bond premium. These find- ings support the notion that a rise in the excess bond premium represents a reduction in the effective risk-bearing capacity of the financial sector and, as a result, a contraction in the supply of credit with significant adverse consequences for the macroeconomy.Institutional subscribers to the NBER working paper series, and residents of developing countries may download this paper without additional charge at www.nber.org.
Article
Full-text available
This paper estimates the dynamic effects of changes in taxes in the United States. We distinguish between the effects of changes in personal and corporate income taxes using a new narrative account of federal tax liability changes in these two tax components. We develop an estimator in which narratively identified tax changes are used as proxies for structural tax shocks and apply it to quarterly post WWII US data. We find that short run output effects of tax shocks are large and that it is important to distinguish between different types of taxes when considering their impact on the labor market and the major expenditure components.
Article
Full-text available
We study economic growth and inflation at different levels of government and external debt. Our analysis is based on new data on forty-four countries spanning about two hundred years. The dataset incorporates over 3,700 annual observations covering a wide range of political systems, institutions, exchange rate arrangements, and historic circumstances. Our main findings are: First, the relationship between government debt and real GDP growth is weak for debt/GDP ratios below a threshold of 90 percent of GDP. Above 90 percent, median growth rates fall by one percent, and average growth falls considerably more. We find that the threshold for public debt is similar in advanced and emerging economies. Second, emerging markets face lower thresholds for external debt (public and private)—which is usually denominated in a foreign currency. When external debt reaches 60 percent of GDP, annual growth declines by about two percent; for higher levels, growth rates are roughly cut in half. Third, there is no apparent contemporaneous link between inflation and public debt levels for the advanced countries as a group (some countries, such as the United States, have experienced higher inflation when debt/GDP is high.) The story is entirely different for emerging markets, where inflation rises sharply as debt increases.
Article
Full-text available
This paper uses a sequence of government budget constraints to motivate estimates of returns on the US Federal government debt. Our estimates differ conceptually and quantitatively from the interest payments reported by the US government. We use our estimates to account for contributions to the evolution of the debt-GDP ratio made by inflation, growth, and nominal returns paid on debts of different maturities. (JEL E23, E31, E43, G12, H63)
Article
Full-text available
The impact of fiscal stimulus depends not only on short-term tax and spending policies, but also on expectations about offsetting measures in the future. This paper analyzes the effects of an increase in government spending under a plausible debt-stabilizing policy that systematically reduces spending below trend over time, in response to rising public liabilities. Accounting for such spending reversals brings an otherwise standard new Keynesian model in line with the stylized facts of fiscal transmission, including the crowding-in of consumption and the 'puzzle' of real exchange rate depreciation. Time series evidence for the U.S. supports the empirical relevance of endogenous spending reversals.
Article
Full-text available
Using a Time-Varying Parameters Bayesian Vector Autoregression model, we investigate how the dynamic effects of oil supply shocks on the US economy have changed over time. In contrast to previous studies, we identify oil supply shocks with sign restrictions which are derived from a simple supply and demand model of the global oil market. First, we find a remarkable structural change in the oil market itself, i.e. a typical oil supply shock is characterized by a much smaller impact on world oil production and a greater effect on the real price of crude oil over time. A steepening of the oil demand curve is the only possible explanation for this stylized fact. Accordingly, similar physical disturbances in oil production now have a significantly higher leverage effect on oil prices resulting in a stronger impact on real GDP and consumer prices. Second, we document that the contribution of oil supply shocks to fluctuations in the real price of oil has decreased considerably over time, implying that current oil price fluctuations are more demand driven. Third, oil supply disturbances seem to have played a significant but non-exclusive role in the 1974/75 and early 1990s recessions but were of minor importance in the 1980/81 and millennium slowdowns. Finally, while oil supply shocks explain little of the "Great Inflation", their relative importance for CPI inflation variability has somewhat increased over time.
Article
Full-text available
For decades economists have searched for the sources of business cycle fluctuations. Early business cycle research focused on leading and lagging indicators and, while many of these are still employed today, they fail to provide insight into the sources of the fluctuations. Despite recent advances in economic modeling, there is still much debate as to the cause of recessions and expansions. In standard real business cycle models, a large component of the fluctuations is attributable to technology shocks. Unfortunately, technology and technology shocks are notoriously difficult to measure. To identify the responses of the economy to a technology shock, I use new data from Bowker’s Publications that documents the change in the number of new titles in technology that were available for purchase in the American economy from major publishers. My findings indicate that, in response to a positive technology shock, employment, total factor productivity and capital all significantly increase. Although my findings are different from those in other recent studies, they are consistent with the predictions of standard real business cycle models.
Article
Full-text available
On April 1, 1992, New Jersey's minimum wage rose from 4.25to4.25 to 5.05 per hour. To evaluate the impact of the law, the authors surveyed 410 fast-food restaurants in New Jersey and eastern Pennsylvania before and after the rise. Comparisons of employment growth at stores in New Jersey and Pennsylvania (where the minimum wage was constant) provide simple estimates of the effect of the higher minimum wage. The authors also compare employment changes at stores in New Jersey that were initially paying high wages (above $5.00) to the changes at lower-wage stores. They find no indication that the rise in the minimum wage reduced employment. Copyright 1994 by American Economic Association.
Article
Financial innovation is widely believed to be at least partly responsible for the recent financial crisis. At the same time, there are empirical and theoretical arguments that support the view that changes in financial markets, in particular, innovations in consumer credit and home mortgages, played a role in the 'great moderation'. This article questions empirical evidence supporting this view. Especially the behaviour of aggregate home mortgages changed less during the great moderation than is typically believed. A remarkable change we do find is that monetary tightenings became episodes during which financial institutions other than banks increased their mortgages holdings.
Article
We examine the role of expectations in the Great Moderation episode. We derive theoretical restrictions in a New-Keynesian model and test them using measures of expectations obtained from survey data, the Greenbook and bond markets. Expectations explain the dynamics of inflation and interest rates but their importance isroughly unchanged over time. Systems with and without expectations display similar reduced form characteristics. Results are robust to changes in the structure of the empirical model. (JEL E23, E24, E31, 32).
Article
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Article
We analyze the effect of measurement error in macroeconomic data on economics research using two features of the estimates of latent US output produced by the Bureau of Economic Analysis (BEA). First, we use the fact that the BEA publishes two theoretically identical estimates of latent US output that only differ due to measurement error: the more well-known gross domestic product (GDP), which the BEA constructs using expenditure data, and gross domestic income (GDI), which the BEA constructs using income data. Second, we use BEA revisions to previously published releases of GDP and GDI. Using a sample of 23 published economics papers from top economics journals that utilize GDP as a key component of an estimated model, we assess whether using either revised GDP or GDI instead of GDP in the published paper would change reported results. We find that estimating models using revised GDP generates the same qualitative result as the original paper in all 23 cases. Estimatin g models using GDI, both with the GDI data originally available to the authors and with revised GDI, instead of GDP generates larger differences in results than those obtained with revised GDP. For 3 of 23 papers (13%), the results we obtain with GDI are qualitatively different than the original published results.
Article
General Nature of the Editorial Process
Article
We replicate Reinhart and Rogoff (2010A and 2010B) and find that selective exclusion of available data, coding errors and inappropriate weighting of summary statistics lead to serious miscalculations that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies. Over 1946–2009, countries with public debt/GDP ratios above 90% averaged 2.2% real annual GDP growth, not −0.1% as published. The published results for (i) median GDP growth rates for the 1946–2009 period and (ii) mean and median GDP growth figures over 1790–2009 are all distorted by similar methodological errors, although the magnitudes of the distortions are somewhat smaller than with the mean figures for 1946–2009. Contrary to Reinhart and Rogoff’s broader contentions, both mean and median GDP growth when public debt levels exceed 90% of GDP are not dramatically different from when the public debt/GDP ratios are lower. The relationship between public debt and GDP growth varies significantly by period and country. Our overall evidence refutes RR’s claim that public debt/GDP ratios above 90% consistently reduce a country’s GDP growth.
Article
Can a single conversation change minds on divisive social issues, such as same-sex marriage? A randomized placebo-controlled trial assessed whether gay (n = 22) or straight (n = 19) messengers were effective at encouraging voters (n = 972) to support same-sex marriage and whether attitude change persisted and spread to others in voters' social networks. The results, measured by an unrelated panel survey, show that both gay and straight canvassers produced large effects initially, but only gay canvassers' effects persisted in 3-week, 6-week, and 9-month follow-ups. We also find strong evidence of within-household transmission of opinion change, but only in the wake of conversations with gay canvassers. Contact with gay canvassers further caused substantial change in the ratings of gay men and lesbians more generally. These large, persistent, and contagious effects were confirmed by a follow-up experiment. Contact with minorities coupled with discussion of issues pertinent to them is capable of producing a cascade of opinion change. Copyright © 2014, American Association for the Advancement of Science.
Article
We analyze dynamics of the permanent and transitory components of the US economic activity and the stock market obtained by multivariate dynamic factor modeling. We capture asymmetries over the phases of economic and stock market trends and cycles using independent Markov-switching processes. We show that both output and stock prices contain significant transitory components, while consumption and dividends are useful to identify their respective permanent components. The extracted economic trend perfectly predicts all post-war recessions. Our results shed light on the nature of the bilateral predictability of the economy and the stock market. The transitory stock market component signals recessions with an average lead of one quarter, whereas the market trend is correlated with the economic trend with varying lead/lag times.
Article
Counter-cyclical variation in individuals' idiosyncratic labor income risk could generate substantial welfare costs. Following past research, we infer income volatility - the variance of permanent income shocks, a standard proxy for income risk - from the rate at which cross-sectional variances of income rise over the life-cycle for a given cohort. Our novelty lies in exploiting cross-state variation in state economic conditions or state sensitivity to national economic conditions. We find that income volatility is higher in good state times than bad; during good national times, we find volatility is higher in states that are more sensitive to national conditions.JEL Codes: D31 (Personal Income, Wealth, and Their Distributions); E32 (Business Fluctuations, Cycles)
Article
holders also have downward sloping demand curves. Groups for whom the liquidity of Treasuries is likely to be more important have steeper demand curves. The results have bearing for important questions in finance and macroeconomics. We discuss implications for the behavior of corporate bond spreads, interest rate swap spreads, the riskless interest rate, and the value of aggregate liquidity. We also discuss the implications of our results for the financing of the US deficit, Ricardian equivalence, and the effects of foreign central bank demand on Treasury yields.
Article
During recessions, the focus on male job losses may overshadow other important outcome variables. We examine the effects of economic downturns on occupational segregation by gender, using staffing data from over 6 million private-sector US establishments from 1966-2010. Consistent with the literature, we find a downward trend in occupational segregation that is diminishing over time. Drawing upon Rubery's (1988) work on women and recessions, we find support for both the buffer and the segmentation hypotheses. On net, however, the buffer hypothesis appears to dominate providing evidence that in periods of economic decline the trend of decreasing economic dissimilarity is interrupted.
Article
We assess the extent to which the greater US macroeconomic stability since the mid-1980s can be accounted for by changes in oil shocks and the oil elasticity of gross output. We estimate a DSGE model and perform counterfactual simulations. We nest two popular explanations for the Great Moderation: smaller non-oil real shocks and better monetary policy. We find that oil played an important role in the stabilisation. Around half of the reduced volatility of inflation is explained by better monetary policy alone, and 57 percent of the reduced volatility of GDP growth is attributed to smaller TFP shocks. Oil related effects explain around a third.
Article
We simulate the Federal Reserve second Large-Scale Asset Purchase programme in a DSGE model with bond market segmentation estimated on US data. GDP growth increases by less than a third of a percentage point and inflation barely changes relative to the absence of intervention. The key reasons behind our findings are small estimates for both the elasticity of the risk premium to the quantity of long-term debt and the degree of financial market segmentation. Without the commitment to keep the nominal interest rate at its lower bound for an extended period, the effects of asset purchase programmes would be even smaller.
Article
We investigate inflation predictability in the United States across the monetary regimes of the XXth century. The forecasts based on money growth and output growth were significantly more accurate than the forecasts based on past inflation only during the regimes associated with neither a clear nominal anchor nor a credible commitment to fight inflation. These include the years from the outbreak of World War II in 1939 to the implementation of the Bretton Woods Agreements in 1951, and from Nixon's closure of the gold window in 1971 to the end of Volckers disinflation in 1983.
Article
How should environmental policy respond to economic fluctuations caused by persistent productivity shocks? This paper answers that question using a dynamic stochastic general equilibrium real business cycle model that includes a pollution externality. I first estimate the relationship between the cyclical components of carbon dioxide emissions and US GDP and find it to be inelastic. Using this result to calibrate the model, I find that optimal policy allows carbon emissions to be procyclical: increasing during expansions and decreasing during recessions. However, optimal policy dampens the procyclicality of emissions compared to the unregulated case. A price effect from costlier abatement during booms outweighs an income effect of greater demand for clean air. I also model a decentralized economy, where government chooses an emissions tax or quantity restriction and firms and consumers respond. The optimal emissions tax rate and the optimal emissions quota are both procyclical: during recessions, the tax rate and the emissions quota both decrease. (Copyright: Elsevier)
Article
We document the cyclical properties of US firms' financial flows and show that equity payout is procyclical and debt payout is countercyclical. We then develop a model with debt and equity financing to explore how the dynamics of real and financial variables are affected by "financial shocks." We find that financial shocks contributed significantly to the observed dynamics of real and financial variables. The recent events in the financial sector show up as a tightening of firms' financing conditions which contributed to the 2008-2009 recession. The downturns in 1990-1991 and 2001 were also influenced by changes in credit conditions. (JEL E23, E32, E44, G01, G32)
Article
Standard vector autoregression (VAR) identification methods find that government spending raises consumption and real wages; the Ramey--Shapiro narrative approach finds the opposite. I show that a key difference in the approaches is the timing. Both professional forecasts and the narrative approach shocks Granger-cause the VAR shocks, implying that these shocks are missing the timing of the news. Motivated by the importance of measuring anticipations, I use a narrative method to construct richer government spending news variables from 1939 to 2008. The implied government spending multipliers range from 0.6 to 1.2. Copyright 2011, Oxford University Press.
Article
This paper identifies a new source of business-cycle fluctuations. Namely, a common stochastic trend in neutral and investment-specific productivity. We document that in U.S. postwar quarterly data total factor productivity (TFP) and the relative price of investment are cointegrated. We show theoretically that TFP and the relative price of investment are cointegrated if and only if neutral and investment-specific productivity share a common stochastic trend. We econometrically estimate an RBC model augmented with a number of real rigidities and driven by a multitude of shocks. We find that in the context of our estimated model, innovations in the common stochastic trend explain. (Copyright: Elsevier)
Article
This paper evaluates the extent to which a DSGE model can account for the impact of tax policy shocks. We estimate the response of macroeconomic aggregates to anticipated and unanticipated tax shocks in the U.S. and find that unanticipated tax cuts have persistent expansionary effects on output, consumption, investment and hours worked. Anticipated tax cuts give rise to contractions in output, investment and hours worked prior to their implementation, while stimulating the economy when implemented. We show that a DSGE model can account quite successfully for these findings. The main features of the model are adjustment costs, consumption durables, variable capacity utilization and habit formation. (Copyright: Elsevier)
Article
This paper investigates the sources of the substantial decrease in output growth volatility in the mid-1980s by identifying which of the structural parameters in a representative New Keynesian and structural VAR models changed. Overturning conventional wisdom, we show that the Great Moderation was due not only to changes in shock volatilities but also to changes in monetary policy parameters, as well as in the private sector's parameters. The Great Moderation was previously attributed to good luck because the alternative sources of instabilities appear to have offsetting effects on output volatility and therefore were impossible to detect using existing techniques. © 2011 The President and Fellows of Harvard College and the Massachusetts Institute of Technology.
Article
This paper introduces the model confidence set (MCS) and applies it to the selection of models. An MCS is a set of models that is constructed so that it will contain the best model with a given level of confidence. The MCS is in this sense analogous to a confidence interval for a parameter. The MCS acknowledges the limitations of the data; uninformative data yield an MCS with many models whereas informative data yield an MCS with only a few models. The MCS procedure does not assume that a particular model is the true model; in fact, the MCS procedure can be used to compare more general objects, beyond the comparison of models. We apply the MCS procedure to two empirical problems. First, we revisit the inflation forecasting problem posed by Stock and Watson (1999) and compute the MCS for their set of inflation forecasts. Second, we compare a number of Taylor rule regressions and determine the MCS of the best in terms of in-sample likelihood criteria.
Article
Financial innovation is widely believed to be at least partly responsible for the recent financial crisis. At the same time, there are empirical and theoretical arguments that support the view that changes in financial markets, in particular, innovations in consumer credit and home mortgages, played a role in the ‘great moderation’. This article questions empirical evidence supporting this view. Especially the behaviour of aggregate home mortgages changed less during the great moderation than is typically believed. A remarkable change we do find is that monetary tightenings became episodes during which financial institutions other than banks increased their mortgages holdings.
Article
Researchers express concern over a paucity of replications. In line with this, editorial policies of some leading marketing journals now encourage more replications. This article reports on an extension of a 1994 study to see whether these efforts have had an effect on the number of replication studies published in leading marketing journals. Results show that the replication rate has fallen to 1.2%, a decrease in the rate by half. As things now stand, practitioners should be skeptical about using the results published in marketing journals as hardly any of them have been successfully replicated, teachers should ignore the findings until they receive support via replications and researchers should put little stock in the outcomes of one-shot studies.
Article
We examine the role of expectations in the Great Moderation episode. We derive theoretical restrictions in a New-Keynesian model and test them using measures of expectations obtained from survey data, the Greenbook and bond markets. Expectations explain the dynamics of inflation and interest rates but their importance is roughly unchanged over time. Systems with and without expectations display similar reduced form characteristics. Results are robust to changes in the structure of the empirical model. (JEL E23, E24, E31, E32)
Article
This paper uses a dynamic factor model for the quarterly changes in consumption goods' prices in the United States since 1959 to separate them into three independent components: idiosyncratic relative-price changes, a low-dimensional index of aggregate relative-price changes, and an index of equiproportional changes in all inflation rates that we label "pure" inflation. We use the estimates to answer two questions. First, what share of the variability of inflation is associated with each component, and how are they related to conventional measures of monetary policy and relative-price shocks? Second, what drives the Phillips correlation between inflation and measures of real activity? (JEL E21, E23, E31, E52)
Article
This paper investigates the impact of tax changes on economic activity. We use the narrative record, such as presidential speeches and Congressional reports, to identify the size, timing, and principal motivation for all major postwar tax policy actions. This analysis allows us to separate legislated changes into those taken for reasons related to prospective economic conditions and those taken for more exogenous reasons. The behavior of output following these more exogenous changes indicates that tax increases are highly contractionary. The effects are strongly significant, highly robust, and much larger than those obtained using broader measures of tax changes. (JEL E32, E62, H20, N12)
Article
I revisit the question of indeterminacy in US monetary policy using limited-information identification-robust methods. I find that the conclusions of Clarida, Gali, and Gernter (2000) that policy was inactive before 1979 are robust, but the evidence over the Volcker-Greenspan periods is inconclusive. I show that this is in fact consistent with policy being active over that period. Problems of identification also arise because policy reaction has been more gradual recently. At a methodological level, the paper demonstrates that identification issues should be taken seriously, and that identification-robust methods can be informative even when they produce wide confidence sets. (E31, E32, E52, E65,)
Article
[eng] We present a two-sided search model where agents differ by their human capital endowment and where workers of different skill are imperfect substitutes. Then the labor market endogenously divides into disjoint segments and wage inequality will depend on the degree of labor market segmentation. The most important results are : 1) overall wage inequality as well as within-group and between-group inequalities increase with relative human capital inequality ; 2) within-group wage inequality decreases while between-group and overall wage inequalities increase with the efficiency of the search process ; 3) within-group, between-group and overall wage inequalities increase with technological changes. [fre] Immigration et justice sociale. . Cet article est d�di� � la m�moire d'Yves Younes qui nous a quitt�s en mai 1996, et dont les derni�res r�flexions sur l'importance du ph�nom�ne migratoire dans les �tats-Unis des ann�es 1980-1790 m'ont beaucoup influenc�.. L'ouverture des fronti�res entre le Nord et le Sud peut-elle se retourner contre les plus d�favoris�s du monde, c'est-�-dire les non-qualifi�s du Sud ?. Avec deux facteurs de production, les migrations Sud-Nord b�n�ficient tou�jours aux moins qualifi�s du Sud, puisqu'ils y sont le facteur le plus abondant. Mais avec trois facteurs de production (trois niveaux de qualifications, ou deux niveaux et un facteur capital imparfaitement mobile), l'ouverture des fronti�res peut conduire � une baisse du salaire des moins qualifi�s du Sud si leur compl�mentarit� avec le travail tr�s qualifi� ou le capital du Nord est suffisamment faible compar�e � celle des sudistes plus qualifi�s.. Plusieurs �tudes r�centes sugg�rent effectivement que les �lasticit�s de compl�mentarit� chutent brutalement au-del� d'un certain �cart de qualification. Cependant, rien ne prouve que ces effets soient suffisamment forts pour que l'ouverture optimale des fronti�res du point de vue de la justice sociale
Article
A key issue in current research and policy is the size of fiscal multipliers when the economy is in recession. Using a variety of methods and data sources, we provide three insights. First, using regime-switching models, we estimate effects of tax and spending policies that can vary over the business cycle; we find large differences in the size of fiscal multipliers in recessions and expansions with fiscal policy being considerably more effective in recessions than in expansions. Second, we estimate multipliers for more disaggregate spending variables which behave differently in relation to aggregate fiscal policy shocks, with military spending having the largest multiplier. Third, we show that controlling for predictable components of fiscal shocks tends to increase the size of the multipliers.
Article
Recent evaluations of the fiscal stimulus packages enacted in 2009 in the United States and Europe such as Cogan et al. (2009) and Cwik and Wieland (2009) suggest that the GDP effects will be modest due to crowding out of private consumption and investment. Corsetti, Meier, and M�ller (2009, 2010) argue that spending shocks are typically followed by consolidations with substantive spending cuts, which enhance the short-run stimulus effect. This note investigates the implications of this argument for the estimated impact of recent stimulus packages and the case for discretionary fiscal policy.
Article
We develop a method for measuring the amount of insurance the portfolio of government liabilities provides against scal shocks, and apply it to postwar US data. We de ne scal shocks as surprises in defense spending. Our results indicate that the US federal government is partially hedged against wars and other surprise increases in defense expenditures. Seven percent of the total cost of defense spending shocks in the postwar era was absorbed by lower real returns on the federal government's outstanding liabilities. More than half of this is due to reductions in expected future, rather than contemporaneous, holding returns on government debt. This implies that changes in US government's scal position help predict future bond returns. Our results also have implications for active management of government debt.
Article
This paper proposes that idiosyncratic firm-level fluctuations can explain an important part of aggregate shocks, and provide a microfoundation for aggregate productivity shocks. Existing research has focused on using aggregate shocks to explain business cycles, arguing that individual firm shocks average out in aggregate. I show that this argument breaks down if the distribution of firm sizes is fat-tailed, as documented empirically. The idiosyncratic movements of the largest 100 firms in the US appear to explain about one third of variations in output and the Solow residual. This "granular" hypothesis suggests new directions for macroeconomic research, in particular that macroeconomic questions can be clarified by looking at the behavior of large firms. This paper's ideas and analytical results may also be useful to think about the fluctuations of other economic aggregates, such as exports or the trade balance.