Article

Improving transparency in observational social science research: A pre-analysis plan approach

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Social science research has undergone a credibility revolution, but these gains are at risk due to problematic research practices. Existing research on transparency has centered around randomized controlled trials, which constitute only a small fraction of research in economics. In this paper, I highlight three scenarios in which study preregistration can be credibly applied in non-experimental settings: cases where researchers collect their own data; prospective studies; and research using restricted-access data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Many scientific fields have well-established norms around disclosing and discussing limitations, which are often recognized as necessary for improving scientific rigor and research integrity. These norms rest on a shared belief that recognizing, exploring, and articulating limitations can foster greater precision in the descriptions of research (making it easier to reproduce), help ensure appropriate interpretation of research findings, make research claims more credible, and highlight issues that would benefit from further research [5,6,18]. Although practices necessarily vary by field and by publication venue [12], the ML research community is notable for not having particularly well-developed norms around disclosing and discussing limitations. ...
... Or better disseminated throughout the entirety of a paper (e.g., when the limitation arose as a new idea is introduced in the paper)? (5) In what ways did this tool confuse you or seem counterintuitive? (6) In what ways could this tool be modified to be more useful to you as you conduct your own ML research? ...
Preprint
Full-text available
Transparency around limitations can improve the scientific rigor of research, help ensure appropriate interpretation of research findings, and make research claims more credible. Despite these benefits, the machine learning (ML) research community lacks well-developed norms around disclosing and discussing limitations. To address this gap, we conduct an iterative design process with 30 ML and ML-adjacent researchers to develop and test REAL ML, a set of guided activities to help ML researchers recognize, explore, and articulate the limitations of their research. Using a three-stage interview and survey study, we identify ML researchers' perceptions of limitations, as well as the challenges they face when recognizing, exploring, and articulating limitations. We develop REAL ML to address some of these practical challenges, and highlight additional cultural challenges that will require broader shifts in community norms to address. We hope our study and REAL ML help move the ML research community toward more active and appropriate engagement with limitations.
... 11 Pre-registration allows for adapting analytical decisions after observing the data, but deviations from the pre-analysis plans are reported transparently, thereby establishing a clear distinction between predictive confirmatory analyses and postdictive exploratory analyses . Political scientists may learn from experiences in other disciplines and built on existing advice in order to apply pre-registrations even on observational (Burlig 2018), secondary (Weston et al. 2018a), and qualitative ) data. ...
... Pre-registration and result-blind review are not the panacea for all problems regarding analytical robustness. For instance, political scientists often analyze pre-existing large-scale datasets for which pre-registration is sometimes possible (Weston/Bakker 2018;Weston et al. 2018b;Burlig 2018;Nosek et al. 2018) but not always suitable. Since such datasets offer an incredible number of reasonable model specifications, in these cases it is particularly important to know that significant effects provide more informational value than "merely [demonstrating] that it is possible to find a specification that fits the author's favorite hypothesis" (Ho et al. 2007: 199). ...
Preprint
Full-text available
Witnessing the ongoing "credibility revolutions" in other disciplines, also political science should engage in meta-scientific introspection. Theoretically, this commentary describes why scientists in academia's current incentive system work against their self-interest if they prioritize research credibility. Empirically, a comprehensive review of meta-scientific research with a focus on quantitative political science demonstrates that threats to the credibility of political science findings are systematic and real. Yet, the review also shows the discipline's recent progress toward more credible research. The commentary proposes specific institutional changes to better align individual researcher rationality with the collective good of verifiable, robust, and valid scientific results. Forschung als soziales Dilemma: Eine meta-wissenschaftliche Bestandsaufnahme zur Glaubwürdigkeit politikwissenschaftlicher Befunde und ein Appell zur Veränderung akademischer Anreizstrukturen Angesichts der "Glaubwürdigkeitsrevolution" in anderen Sozialwissenschaften liegen Fragen nach der Verlässlichkeit institutioneller Wissensproduktion auch in der Politikwissenschaft nahe. Dieser Kommentar beschreibt warum Wissenschaftler entgegen ihrem Eigeninteresse handeln, wenn sie Forschungsvalidität priorisieren. Ein umfassender Überblick der meta-wissenschaftlichen Literatur mit Fokus auf die quantitative weist einerseits auf jüngst eingeleitete Reformen zur Sicherung reliabler Forschung hin. Andererseits offenbar dieser Überblicksartikel systematische Probleme in der Glaubwürdigkeit veröffentlichter Forschungsbefunde. Dieser Kommentar schlägt konkrete Maßnahmen vor individuelle Forscheranreize in Einklang zu bringen mit dem gemeinschaftlichen Ziel verlässlicher Forschung.
... Development of practices appropriate for existing data-whether historical or contemporary, quantitative, or qualitative-is a priority" [5]. There have been a few proposed solutions, but there is no institutional mechanism for implementation [6,7]. This problem could be corrected by the growing number of researchers using confidential data within the Census Bureau's Federal Statistical Research Data Centers (FSRDCs). ...
Article
Full-text available
A split sample/dual method research protocol is demonstrated to increase transparency while reducing the probability of false discovery. We apply the protocol to examine whether diversity in ownership teams increases or decreases the likelihood of a firm reporting a novel innovation using data from the 2018 United States Census Bureau’s Annual Business Survey. Transparency is increased in three ways: 1) all specification testing and identifying potentially productive models is done in an exploratory subsample that 2) preserves the validity of hypothesis test statistics from de novo estimation in the holdout confirmatory sample with 3) all findings publicly documented in an earlier registered report and in this journal publication. Bayesian estimation procedures that leverage information from the exploratory stage included in the confirmatory stage estimation replace traditional frequentist null hypothesis significance testing. In addition to increasing statistical power by using information from the full sample, Bayesian methods directly estimate a probability distribution for the magnitude of an effect, allowing much richer inference. Estimated magnitudes of diversity along academic discipline, race, ethnicity, and foreign-born status dimensions are positively associated with innovation. A maximally diverse ownership team on these dimensions would be roughly six times more likely to report new-to-market innovation than a homophilic team.
... There remain a number of important unresolved issues regarding the spread of open science in the social sciences that are beyond the scope of this article. A frequent topic of discussion is the extent to which the adoption of pre-registration and pre-analysis plans will spread beyond experimental studies into observational research 23,24 . Some of the ambivalence toward pre-registration we document in this study may be related to concerns that these approaches may not be well-suited to certain branches of non-experimental analysis. ...
Article
Full-text available
Open science practices such as posting data or code and pre-registering analyses are increasingly prescribed and debated in the applied sciences, but the actual popularity and lifetime usage of these practices remain unknown. This study provides an assessment of attitudes toward, use of, and perceived norms regarding open science practices from a sample of authors published in top-10 (most-cited) journals and PhD students in top-20 ranked North American departments from four major social science disciplines: economics, political science, psychology, and sociology. We observe largely favorable private attitudes toward widespread lifetime usage (meaning that a researcher has used a particular practice at least once) of open science practices. As of 2020, nearly 90% of scholars had ever used at least one such practice. Support for posting data or code online is higher (88% overall support and nearly at the ceiling in some fields) than support for pre-registration (58% overall). With respect to norms, there is evidence that the scholars in our sample appear to underestimate the use of open science practices in their field. We also document that the reported lifetime prevalence of open science practices increased from 49% in 2010 to 87% a decade later.
... In other words, axiologically, the research has to maintain in value-aware context during data enquiries and interpretation stages, in order to prevent researchers from unconsciously filtering in biased manner. To improve reliability or trustworthiness of the data interpretation, transcripts are prepared for cross-examination, which takes root in a mindset in advancing transparency practices in a typical observational social science research (Burlig, 2018). ...
Conference Paper
Full-text available
T his article presents how an instrumental case research-oriented approach can be used in a teacher field trip. The purpose is to help the teachers on a field trip to identify and explore research topics and propositions that have currency in today’s digital and AI economy. The authors of this article were among the 55-member team in the educational field trip to China, in 2018. The trip was motivated by the strategic significance of China as an emerging leader in the world for many contemporary phenomena in new business model and strategies, i.e. new retails (新零售, 人工智能). The visits to Hangzhou’s Artificial Intelligence (AI) town (中国-杭州人工智能小镇) and Alibaba group headquarter led to some important propositions highlighted in this article. Clearly, there are active progress and achievements of a digitally enabled business ecosystem that emerged as a new and active trend of business model development in China. Among the identified propositions are a model describing the architecture of the business ecosystem, consisting of the production ecosystems, the consumption ecosystems, the broader socio-cultural and public domain level, and the digital envelope and information-in-use, which shares also concepts and knowledge already available in Complex Adaptive Systems (CAS). In addition, the visit to the Banyan Tree Hotel group in Hangzhou helps the researchers to better understand the role of brand storytelling. Banyan Tree brand started with a story of the founders experiencing their honeymoon in surrounding Banyan trees in Phuket. Since then, everything they do, i.e. the quality assurance system, the infrastructure, new product development, and philanthropy activities, are all related to brand storying telling that shares the same roots of themes, and brand personality. As such, Banyan Tree is an exemplar case, which has significant instrumental utility. A narrative discussion is provided in the article.
... Second, we encourage the use of pre-specified analysis plans (PSAPs) when possible to limit data mining (Christensen & Miguel, 2018;Burlig, 2018). Although PSAPs can be limiting in certain studies (Miguel et al., 2014), the costs are likely much lower in this setting. ...
... Subscribing to the open science movement, researchers are discussing how to improve the robustness of empirical research findings by increasing the transparency of research procedures. Present in many fields (Fraser et al. 2018), the movement has also reached the social sciences and, more recently, political science (Burlig 2018;Freese and Peterson 2017;Monogan 2013;Wuttke 2019). ...
Article
Full-text available
The GLES Open Science Challenge 2021 was a pioneering initiative in quantitative political science. Aimed at increasing the adoption of replicable and transparent research practices, it led to this special issue. The project combined the rigor of registered reports—a new publication format in which studies are evaluated prior to data collection/access and analysis—with quantitative political science research in the context of the 2021 German federal election. This special issue, which features the registered reports that resulted from the project, shows that transparent research following open science principles benefits our discipline and substantially contributes to quantitative political science. In this introduction to the special issue, we first elaborate on why more transparent research practices are necessary to guarantee the cumulative progress of scientific knowledge. We then show how registered reports can contribute to increasing the transparency of scientific practices. Next, we discuss the application of open science practices in quantitative political science to date. And finally, we present the process and schedule of the GLES Open Science Challenge and give an overview of the contributions included in this special issue.
... First, there is no evidence that exploratory results are disappearing from leading journals. The vast majority (over 80%) of published empirical work in economics, for instance, is not experimental, some of which is exploratory and not appropriate for PAPs (28,29). Second, in some fields, what has been called "fishing" has been practiced for a long time and has yielded powerful descriptive results that were later subjected to confirmatory analyses. ...
Article
Full-text available
While the social sciences have made impressive progress in adopting transparent research practices that facilitate verification, replication, and reuse of materials, the problem of publication bias persists. Bias on the part of peer reviewers and journal editors, as well as the use of outdated research practices by authors, continues to skew literature toward statistically significant effects, many of which may be false positives. To mitigate this bias, we propose a framework to enable authors to report all results efficiently (RARE), with an initial focus on experimental and other prospective empirical social science research that utilizes public study registries. This framework depicts an integrated system that leverages the capacities of existing infrastructure in the form of public registries, institutional review boards, journals, and granting agencies, as well as investigators themselves, to efficiently incentivize full reporting and thereby, improve confidence in social science findings. In addition to increasing access to the results of scientific endeavors, a well-coordinated research ecosystem can prevent scholars from wasting time investigating the same questions in ways that have not worked in the past and reduce wasted funds on the part of granting agencies.
... Whereas pre-registration has clear benefits for providing a verifiable documentation of the timing of study decisions in relation to data collection, the benefits of pre-registration becomes more debated when conducting secondary analyses of existing data sets. In particular, because some portion of the data has already been presented and because it can not be confirmed that the study team has not already explored the data set, pre-registration no longer serves to verify to readers that certain analyses were pre-planned and not data-driven (Burlig, 2018). Nonetheless, even with secondary analyses of existing data, registering an analyses plan prior to conducting an analysis can help keep a researcher focused and honest with themselves, even if it can not offer the same level of confidence to others. ...
Article
Full-text available
Objective We aimed to document the use of transparent reporting of hypotheses and analyses in behavioral medicine journals in 2018 and 2008. Design: We examined a randomly selected portion of articles published in 2018 and 2008 by behavioral medicine journals with the highest impact factor, excluding manuscripts that were reviews or purely descriptive. Main Outcome Measures: We coded whether articles explicitly stated if the hypotheses/outcomes/analyses were primary or secondary; if study was registered/pre-registered; if ‘exploratory’ or a related term was used to describe analyses/aims; and if power analyses were reported. Results: We coded 162 manuscripts published in 2018 (87% observational and 12% experimental). Sixteen percent were explicit in describing hypotheses/outcomes/analyses as primary or secondary, 51% appeared to report secondary hypotheses/outcomes/analyses but did not use term ‘secondary,’ and 33% were unclear. Registration occurred in 14% of studies, but 91% did not report which analyses were registered. ‘Exploratory’ or related term was used in 31% of studies. Power analyses were reported in 8% of studies. Compared to 2008 (n = 120), studies published in 2018 were more likely to be registered and less likely to be unclear if outcomes were primary or secondary. Conclusions: Behavioral medicine stakeholders should consider strategies to increase clarity of reporting, and particularly details that will inform readers if analyses were pre-planned or post-hoc. Study registration https://osf.io/39ztn
... A first, crucial set of norms speak to what, exactly, a complete PAP should contain and how PAPs should be adapted for observational studies, which comprise the majority of research projects undertaken in political science and economics (Burlig 2018;Jacobs 2020). Figure 3 shows the number and share of PAPs that satisfy the four key requirements of a complete PAP: 1) specifying a clear hypothesis; 2) specifying the primary dependent variable(s) sufficiently clearly so as to prevent post-hoc adjustments; 3) specifying the treatment or main explanatory variable sufficiently clearly so as to prevent post-hoc adjustments; and 4) spelling out the precise statistical model to be tested including functional forms and estimator. ...
Article
Full-text available
Pre-analysis plans (PAPs) have been championed as a solution to the problem of research credibility, but without any evidence that PAPs actually bolster the credibility of research. We analyze a representative sample of 195 PAPs registered on the Evidence in Governance and Politics (EGAP) and American Economic Association (AEA) registration platforms to assess whether PAPs registered in the early days of pre-registration (2011–2016) were sufficiently clear, precise, and comprehensive to achieve their objective of preventing “fishing” and reducing the scope for post-hoc adjustment of research hypotheses. We also analyze a subset of ninety-three PAPs from projects that resulted in publicly available papers to ascertain how faithfully they adhere to their pre-registered specifications and hypotheses. We find significant variation in the extent to which PAPs registered during this period accomplished the goals they were designed to achieve. We discuss these findings in light of both the costs and benefits of pre-registration, showing how our results speak to the various arguments that have been made in support of and against PAPs. We also highlight the norms and institutions that will need to be strengthened to augment the power of PAPs to improve research credibility and to create incentives for researchers to invest in both producing and policing them.
... 67 Fifth, although we did not correct for multiple comparisons, we did preregister all hypotheses in alignment with previous theoretical and empirical work. 68 Given that this was the first paper to examine how affect and EEG sleep are associated in daily life, we felt it was important to examine affect in relation to all facets of EEG sleep (and to replicate previous findings with sleep diaries and actigraphy). Future studies should seek to replicate our results. ...
Article
Objective/Background Disrupted sleep can be a cause and a consequence of affective experiences. However, daily longitudinal studies show sleep assessed via sleep diaries is more consistently associated with positive and negative affect than sleep assessed via actigraphy. The objective of the study was to test whether sleep parameters derived from ambulatory electroencephalography (EEG) in a naturalistic setting were associated with day-to-day changes in affect. Participants/Method Eighty adults (mean age = 32.65 years, 63% female) completed 7 days of affect and sleep assessments. We examined bidirectional associations between morning positive affect and negative affect with sleep assessed via diary, actigraphy, and ambulatory EEG. Results Mornings with lower positive affect than average were associated with higher diary- and actigraphy-determined sleep efficiency that night. Mornings with higher negative affect than average were associated with longer actigraphy-determined total sleep time that night. Nights with longer diary-determined total sleep time, greater sleep efficiency, and shorter sleep onset latency than average were associated with higher next-morning positive affect, and nights with lower diary-determined wake-after-sleep-onset were associated with lower next-morning negative affect. EEG-determined sleep and affect results were generally null in both directions: only higher morning negative affect was associated with longer rapid eye movement (REM) sleep that night. Conclusions Self-reported sleep and affect may occur in a bidirectional fashion for some sleep parameters. EEG-determined sleep and affect associations were inconsistent but may still be important to assess in future studies to holistically capture sleep. Single-channel EEG represents a novel, ecologically valid tool that may provide information beyond diaries and actigraphy.
... No Weak evidence for reporting bias in tests with less than 8 independent observations. plans, and a set of hypotheses before starting the experiment are good tools to alleviate problems discussed in our paper and should be used more often (Christensen and Miguel, 2018;Burlig, 2018). Finally, researchers should be encouraged to share and combine their data to obtain more powerful experiments (Button et al., 2013). ...
Preprint
Full-text available
The replicability and credibility crisis in psychology and economics sparked the debate on underpowered experiments, publication biases, and p-hacking. Analyzing the number of independent observations of experiments published in Experimental Economics, Games and Economic Behavior, and the Journal of Economic Behavior and Organization, we observe that we did not learn much from this debate. The median experiment in our sample has too few independent observations and, thus, is underpowered. Moreover, we find indications for biases in reporting highly significant results. We investigate for which papers and experiments it is more likely to find reporting biases, and we suggest remedies that could help to overcome the replicability crisis.
... see [60]). Theories of action 'provide a very simple, but powerful, tool for getting beneath the surface of individual, group, and organizational behavior'; they 'systematically analyze and document behavioral patterns and the reasoning behind them' ( [61] ...
Article
Full-text available
Pacific regional organisations focusing on climate change have overlapping adaptation-related mandates. With the growing importance of regional organisations in supplying financial and technical resources for climate adaptation in small island developing states, it is important to understand how well these supranational organisations work together on these issues. In this paper, theories of regionalism and neofunctionalism, complex systems, and superordinate group identity are used to design an action research project that tests the level of coordination between Pacific regional organisations. It presents and discusses a pre-analysis plan for the project, the goal of which is to determine the ways in which virtual team structure can be used to enhance inter-organisational coordination of adaptation interventions across small, dispersed, resource-constrained country jurisdictions. The proposed study represents an important intermediary step in developing more robust climate-related organisational policies at the regional scale in the Pacific and beyond.
... As Brodeur et al. (2016) and Vivalt (2017) note, lab and field experiments suffer from the least inflation bias in the set of papers they studied. Thus, existing registries may not be set up to accommodate pre-analysis plans for the observational studies that may need them the most (Burlig, 2018). A second limitation is that restricting analyses to pre-specified hypotheses can have large costs. ...
Article
Full-text available
Ongoing changes to research practices and recent media attention to agricultural and applied economics have raised new ethical problems, but also created opportunities for new solutions. In this paper, we discuss ethical issues facing the profession and propose potential ways in which the field can address these issues. We divide our discussion into two topics. First are ethical issues that arise during the collection, management and analysis of data. Second are ethical issues faced by researchers as they formulate, fund, and disseminate their research. We pay special attention to issues of data dredging or p-hacking and potential ethical issues arising from interaction with the media.
Article
The recognition that researcher discretion coupled with unconscious biases and motivated reasoning sometimes leads to false findings (“p-hacking”) led to the broad embrace of study preregistration and other open-science practices in experimental research. Paradoxically, the preregistration of quasi-experimental studies remains uncommon although such studies involve far more discretionary decisions and are the most prevalent approach to making causal claims in the social sciences. I discuss several forms of recent empirical evidence indicating that questionable research practices contribute to the comparative unreliability of quasi-experimental research and advocate for adopting the preregistration of such studies. The implementation of this recommendation would benefit from further consideration of key design details (e.g., how to balance data cleaning with credible preregistration) and a shift in research norms to allow for appropriately nuanced sensemaking across prespecified, confirmatory results and other exploratory findings.
Chapter
I argue we can be more systematic about approaching and collecting historical data with the intention of using it for quantitative causal inference (CI). I discuss common challenges to be aware of when working with historical data, how to better structure visits to libraries or archives, and how innovations in research transparency and research design—namely, preanalysis plans for observational data—can be used as tools to help improve our collection of historical data. I emphasize that scholars should spend more effort at the research design stage in order to understand when and why data are available, and what biases might be present. Recognizing what historical data is needed for causal inference, and acknowledging what is not available, can help the research in the long term.
Article
Imposing term limits on elected officials is expected to increase turnout due to enhanced competition by one theoretical perspective, while another predicts depressed turnout as a result of clientelist turnout buying. These puzzling contradictory predictions are examined by a quasi-experiment (using a difference-in-differences approach) based on a 2022 reform which introduced term limits for Costa Rican mayors that were applied for the first time in the 2024 municipal election. Over one half of mayors suddenly faced retroactive term limits, while the remaining ones were eligible for reelection. This analysis was pre-registered following the 2022 reform but before the 2024 election, that is, at a time when treatment assignment already occurred but the post-treatment outcomes were not known and the analysis could still not be performed. The analysis could only be completed after the February 2024 election. The results suggest that the adoption of term limits reinvigorated electoral competition but that its participatory gains were only modest, fostering turnout only in the largest cities. The analysis contributes by advancing the—still uncommon—practice of pre-registering observational research after the treatment assignment but prior to the release of the data (and even prior to the processes that produce that data).
Article
Full-text available
South Africa has one of the highest crime rates in the world. This paper examines the effect of weather shocks on various types of crime. Using a 12-year panel data set at a monthly resolution on the police ward level, we observe a short-term effect of temperatures on violent crime. Furthermore, we find evidence for the medium-term effect of weather on crime via droughts. Yet, effect sizes are subtle in both cases and we also emphasize often neglected but well-documented limitations to the interpretability of weather data and weather-induced mechanisms. Recognizing these limitations, we conclude with a cautious interpretation of our findings to inform police deployment strategies.
Article
A robust body of evidence shows that air pollution exposure is detrimental to health outcomes, often measured as deaths and hospitalizations. This literature has focused less on subclinical channels that nonetheless impact behavior, performance, and skills. This article reviews the economic research investigating the causal effects of pollution on nonhealth end points, including labor productivity, cognitive performance, and multiple forms of decision-making. Subclinical effects of pollution can be more challenging to observe than formal health care encounters but may be more pervasive if they affect otherwise healthy people. The wide variety of possible impacts of pollution should be informed by plausible mechanisms and require appropriate hypothesis testing to limit false discovery. Finally, any detected effects of pollution, in both the short and long run, may be dampened by costly efforts to avoid exposure ex ante and remediate its impacts ex post; these costs must be considered for a full welfare analysis.
Article
How does the improved convenience of electronic payments affect consumer payment choice and cash demand? We study the staggered, quasi-random introduction of contactless debit cards by a retail bank. We use account-level data and compare transactions which are eligible for contactless authentication to transactions which are not. We identify a significant convenience effect on debit card use at the intensive margin. The convenience elasticity is strongest among younger clients. Treatment effects increase over time, coinciding with increasing merchant acceptance. The effect on cash demand is economically small and statistically insignificant. We also find no effect on consumer spending.
Article
The elasticity of taxable income (ETI) is a key parameter in tax policy analysis. To examine the large variation found in the literature of taxable and broad income elasticities, I conduct a comprehensive meta-regression analysis using information from 61 studies containing 1,720 estimates. My findings reveal that estimated elasticities are not immutable parameters. They are correlated with contextual factors and the choice of the empirical specification influences the estimated elasticities. Finally, selective reporting bias is prevalent, and the direction of bias depends on whether deductions are included in the tax base.
Article
Full-text available
Economists have recently adopted preanalysis plans in response to concerns about robustness and transparency in research. The increased use of registered preanalysis plans has raised competing concerns that detailed plans are costly to create, overly restrictive, and limit the type of inspiration that stems from exploratory analysis. We consider these competing views of preanalysis plans, and make a careful distinction between the roles of preanalysis plans and registries, which provide a record of all planned research. We propose a flexible “packraft” preanalysis plan approach that offers benefits for a wide variety of experimental and nonexperimental applications in applied economics. JEL CLASSIFICATION A14; B41; C12; C18; C90; O10; Q00
Article
Credible economic research demands discipline and defensible modeling assumptions—both theoretical and empirical—but incentives to strategically shape findings (e.g., p‐hack) can be strong. We examine recent waves of empiricism in economics and the ethical concerns and responses they prompted. Statistical abuses that opportunistically search for significance are often inseparable from conceptual abuses of opportunistic model identification (i.e., p‐hacking writ large). We compare neoclassical with positivist hacking proclivities and explore associated implications for empirical analysis and peer review. Drawing on our experiences, 25 years apart, as AJAE editors we reflect on efforts to evaluate research quality and enhance research transparency.
Article
There exists a large body of literature examining the association between built environment factors and dietary intake, physical activity, and weight status; however, synthesis of this literature has been limited. To address this gap, we conducted a scoping review of reviews and identified 74 reviews and meta‐analyses that investigated the association between built environment factors and dietary intake, physical activity, and/or weight status. Results across reviews were mixed, with heterogeneous effects demonstrated in terms of strength and statistical significance; however, preliminary support was identified for several built environment factors. For example, quality of dietary intake appeared to be associated with the availability of grocery stores, higher levels of physical activity appeared to be most consistently associated with greater walkability, and lower weight status was associated with greater diversity in land‐use mix. Overall, reviews reported substantial concern regarding methodological limitations and poor quality of existing studies. Future research should focus on improving study quality (e.g., using longitudinal methods, including natural experiments, and newer mobile sensing technologies) and consensus should be drawn regarding how to define and measure both built environment factors and weight‐related outcomes.
Article
Full-text available
This study investigated whether basic details of the research process are mentioned in social science research articles. Five hundred empirical articles were sampled from the Social Sciences Citation Index. The frequency of omitted details was mixed. For central essential details, omission rates were: 0% for research purpose, 2% for data collection method, 8% for sample size, 20% for data analysis method, and 48% for sampling strategy. The analysis found that 56% of articles were missing one or more of these five details. For more peripheral details, omission rates were: 36% for limitations, 55% for ethical considerations, and 94% for foundational philosophy. Prevalence rates were varied when compared across different disciplines and between qualitative, quantitative, and mixed methods approaches. Possible causes for these findings include the use of secondary data, the view that certain details are not essential, and traditions in ethnographic writing. Those involved in academic publishing are invited to reflect on the basic details that are essential for their specialism and how reporting standards are enforced.
Article
Null hypothesis significance testing has plagued our disciplines for a longer time than is appreciated and despite of various repeated discussion the inherent problems of this approach have not been taken into consideration. Thus, a more radical change in our approach to uncertainty is necessary which also requires changes in publication decisions.
Article
Full-text available
Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study. Science , this issue 10.1126/science.aac4716
Article
Full-text available
Transparency, openness, and reproducibility are readily recognized as vital features of science (1, 2). When asked, most scientists embrace these features as disciplinary norms and values (3). Therefore, one might expect that these valued features would be routine in daily practice. Yet, a growing body of evidence suggests that this is not the case (4–6).
Article
Full-text available
The PLOS Medicine Editors endorse four measures to ensure transparency in the analysis and reporting of observational studies. Please see later in the article for the Editors' Summary.
Article
Full-text available
The vast majority of health-related observational studies are not prospectively registered and the advantages of registration have not been fully appreciated. Nonetheless, international standards require approval of study protocols by an independent ethics committee before the study can begin. We suggest that there is an ethical and scientific imperative to publicly preregister key information from newly approved protocols, which should be required by funders. Ultimately, more complete information may be publicly available by disclosing protocols, analysis plans, data sets, and raw data.
Article
Full-text available
Social scientists generally enjoy substantial latitude in selecting measures and models for hypothesis testing. Coupled with publication and related biases, this latitude raises the concern that researchers may intentionally or unintentionally select models that yield positive findings, leading to an unreliable body of published research. To combat this "fishing" problem in medical studies, leading journals now require pre-registration of designs that emphasize the prior identification of dependent and independent variables. However, we demonstrate here that even with this level of advanced specification, the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan. These concerns could be addressed through the use of a form of comprehensive registration. We experiment with such an approach in the context of an ongoing field experiment for which we drafted a complete "mock report" of findings using fake data on treatment assignment. We describe the advantages and disadvantages of this form of registration and propose that a comprehensive but nonbinding approach be adopted as a first step to combat fishing by social scientists. Likely effects of comprehensive but nonbinding registration are discussed, the principal advantage being communication rather than commitment, in particular that it generates a clear distinction between exploratory analyses and genuine tests.
Article
Full-text available
There is growing appreciation for the advantages of experimentation in the social sciences. Policy-relevant claims that in the past were backed by theoretical arguments and inconclusive correlations are now being investigated using more credible methods. Changes have been particularly pronounced in development economics, where hundreds of randomized trials have been carried out over the last decade. When experimentation is difficult or impossible, researchers are using quasi-experimental designs. Governments and advocacy groups display a growing appetite for evidence-based policy-making. In 2005, Mexico established an independent government agency to rigorously evaluate social programs, and in 2012, the U.S. Office of Management and Budget advised federal agencies to present evidence from randomized program evaluations in budget requests (1, 2).
Article
Full-text available
In 2008, a group of uninsured low-income adults in Oregon was selected by lottery to be given the chance to apply for Medicaid. This lottery provides an opportunity to gauge the effects of expanding access to public health insurance on the health care use, financial strain, and health of low-income adults using a randomized controlled design. In the year after random assignment, the treatment group selected by the lottery was about 25 percentage points more likely to have insurance than the control group that was not selected. We find that in this first year, the treatment group had substantively and statistically significantly higher health care utilization (including primary and preventive care as well as hospitalizations), lower out-of-pocket medical expenditures and medical debt (including fewer bills sent to collection), and better self-reported physical and mental health than the control group. JEL Codes: H51, H75, I1.
Article
Full-text available
For any given research area, one cannot tell how many studies have been conducted but never reported. The extreme view of the "file drawer problem" is that journals are filled with the 5% of the studies that show Type I errors, while the file drawers are filled with the 95% of the studies that show nonsignificant results. Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed. (15 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
We classify all published field experiments in five top economics journals from 1975 to 2010 according to how closely the experimental design and analysis are linked to economic theory. We find that the vast majority of field experiments (68 percent) are Descriptive studies that lack any explicit model; 18 percent are Single Model studies that test a single model-based hypothesis; 6 percent are Competing Models studies that test competing model-based hypotheses; and 8 percent are Parameter Estimation studies that estimate structural parameters in a completely specified model. We also classify laboratory experiments published in these journals over the same period and find that economic theory has played a more central role in the laboratory than in the field. Finally, we discuss in detail three sets of field experiments—on gift exchange, on charitable giving, and on negative income tax—that illustrate both the benefits and the potential costs of a tighter link between experimental design and theoretical underpinnings.
Article
Full-text available
Just over a quarter century ago, Edward Leamer (1983) reflected on the state of empirical work in economics. He urged empirical researchers to “take the con out of econometrics” and memorably observed (p. 37): “Hardly anyone takes data analysis seriously. Or perhaps more accurately, hardly anyone takes anyone else’s data analysis seriously.” Leamer was not alone; Hendry (1980), Sims (1980), and others writing at about the same time were similarly disparaging of empirical practice. Reading these commentaries, we wondered as late-1980s Ph.D. students about the prospects for a satisfying career doing applied work. Perhaps credible empirical work in economics is a pipe dream. Here we address the questions of whether the quality and the credibility of empirical work have increased since Leamer’s pessimistic assessment. Our views are necessarily colored by the areas of applied microeconomics in which we are active, but we look over the fence at other areas as well.
Article
Full-text available
The next step towards research transparency
Article
Full-text available
The primary aim of the paper is to place current methodological discussions in macroeconometric modeling contrasting the ‘theory first’ versus the ‘data first’ perspectives in the context of a broader methodological framework with a view to constructively appraise them. In particular, the paper focuses on Colander’s argument in his paper “Economists, Incentives, Judgement, and the European CVAR Approach to Macroeconometrics” contrasting two different perspectives in Europe and the US that are currently dominating empirical macroeconometric modeling and delves deeper into their methodological/philosophical underpinnings. It is argued that the key to establishing a constructive dialogue between them is provided by a better understanding of the role of data in modern statistical inference, and how that relates to the centuries old issue of the realisticness of economic theories.
Article
Full-text available
[eng] Transportation costs and monopoly location in presence of regional disparities. . This article aims at analysing the impact of the level of transportation costs on the location choice of a monopolist. We consider two asymmetric regions. The heterogeneity of space lies in both regional incomes and population sizes: the first region is endowed with wide income spreads allocated among few consumers whereas the second one is highly populated however not as wealthy. Among the results, we show that a low transportation costs induces the firm to exploit size effects through locating in the most populated region. Moreover, a small transport cost decrease may induce a net welfare loss, thus allowing for regional development policies which do not rely on inter-regional transportation infrastructures. cost decrease may induce a net welfare loss, thus allowing for regional development policies which do not rely on inter-regional transportation infrastructures. [fre] Cet article d�veloppe une statique comparative de l'impact de diff�rents sc�narios d'investissement (projet d'infrastructure conduisant � une baisse mod�r�e ou � une forte baisse du co�t de transport inter-r�gional) sur le choix de localisation d'une entreprise en situation de monopole, au sein d'un espace int�gr� compos� de deux r�gions aux populations et revenus h�t�rog�nes. La premi�re r�gion, faiblement peupl�e, pr�sente de fortes disparit�s de revenus, tandis que la seconde, plus homog�ne en termes de revenu, repr�sente un march� potentiel plus �tendu. On montre que l'h�t�rog�n�it� des revenus constitue la force dominante du mod�le lorsque le sc�nario d'investissement privil�gi� par les politiques publiques conduit � des gains substantiels du point de vue du co�t de transport entre les deux r�gions. L'effet de richesse, lorsqu'il est associ� � une forte disparit� des revenus, n'incite pas l'entreprise � exploiter son pouvoir de march� au d�triment de la r�gion l
Article
There is growing interest in enhancing research transparency and reproducibility in economics and other scientific fields. We survey existing work on these topics within economics and discuss the evidence suggesting that publication bias, inability to replicate, and specification searching remain widespread in the discipline. We next discuss recent progress in this area, including through improved research design, study registration and pre-analysis plans, disclosure standards, and open sharing of data and materials, drawing on experiences in both economics and other social sciences. We discuss areas where consensus is emerging on new practices, as well as approaches that remain controversial, and speculate about the most effective ways to make economics research more credible in the future.
Article
We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.
Article
Using 50,000 tests published in the AER, JPE, and QJE, we identify a residual in the distribution of tests that cannot be explained solely by journals favoring rejection of the null hypothesis. We observe a two-humped camel shape with missing p-values between 0.25 and 0.10 that can be retrieved just after the 0.05 threshold and represent 10-20 percent of marginally rejected tests. Our interpretation is that researchers inflate the value of just-rejected tests by choosing "significant" specifications. We propose a method to measure this residual and describe how it varies by article and author characteristics.
Article
Another social science looks at itself Experimental economists have joined the reproducibility discussion by replicating selected published experiments from two top-tier journals in economics. Camerer et al. found that two-thirds of the 18 studies examined yielded replicable estimates of effect size and direction. This proportion is somewhat lower than unaffiliated experts were willing to bet in an associated prediction market, but roughly in line with expectations from sample sizes and P values. Science , this issue p. 1433
Article
Imagine a nefarious researcher in economics who is only interested in finding a statistically significant result of an experiment. The researcher has 100 different variables he could examine, and the truth is that the experiment has no impact. By construction, the researcher should find an average of five of these variables statistically significantly different between the treatment group and the control group at the 5 percent level—after all, the exact definition of 5 percent significance implies that there will be a 5 percent false rejection rate of the null hypothesis that there is no difference between the groups. The nefarious researcher, who is interested only in showing that this experiment has an effect, chooses to report only the results on the five variables that pass the statistically significant threshold. If the researcher is interested in a particular sign of the result—that is, showing that this program “works” or “doesn’t work”— on average half of these results will go in the direction the researcher wants. Thus, if a researcher can discard or not report all the variables that do not agree with his desired outcome, the researcher is virtually guaranteed a few positive and statistically significant results, even if in fact the experiment has no effect.
Article
The social sciences—including economics—have long called for transparency in research to counter threats to producing robust and replicable results. In this paper, we discuss the pros and cons of three of the more prominent proposed approaches: pre-analysis plans, hypothesis registries, and replications. They have been primarily discussed for experimental research, both in the field including randomized control trials and the laboratory, so we focus on these areas. A pre-analysis plan is a credibly fixed plan of how a researcher will collect and analyze data, which is submitted before a project begins. Though pre-analysis plans have been lauded in the popular press and across the social sciences, we will argue that enthusiasm for pre-analysis plans should be tempered for several reasons. Hypothesis registries are a database of all projects attempted; the goal of this promising mechanism is to alleviate the "file drawer problem," which is that statistically significant results are more likely to be published, while other results are consigned to the researcher's "file drawer." Finally, we evaluate the efficacy of replications. We argue that even with modest amounts of researcher bias—either replication attempts bent on proving or disproving the published work, or poor replication attempts—replications correct even the most inaccurate beliefs within three to five replications. We offer practical proposals for how to increase the incentives for researchers to carry out replications.
Article
We studied publication bias in the social sciences by analyzing a known population of conducted studies—221 in total—in which there is a full accounting of what is published and unpublished. We leveraged Time-sharing Experiments in the Social Sciences (TESS), a National Science Foundation–sponsored program in which researchers propose survey-based experiments to be run on representative samples of American adults. Because TESS proposals undergo rigorous peer review, the studies in the sample all exceed a substantial quality threshold. Strong results are 40 percentage points more likely to be published than are null results and 60 percentage points more likely to be written up. We provide direct evidence of publication bias and identify the stage of research production at which publication bias occurs: Authors do not write up and submit null findings.
Article
I argue that requiring authors to post the raw data supporting their published results has the benefit, among many others, of making fraud much less likely to go undetected. I illustrate this point by describing two cases of suspected fraud I identified exclusively through statistical analysis of reported means and standard deviations. Analyses of the raw data behind these published results provided invaluable confirmation of the initial suspicions, ruling out benign explanations (e.g., reporting errors, unusual distributions), identifying additional signs of fabrication, and also ruling out one of the suspected fraud's explanations for his anomalous results. If journals, granting agencies, universities, or other entities overseeing research promoted or required data posting, it seems inevitable that fraud would be reduced.
Article
We review the statistical models applied to test for heterogeneous treatment effects in the recent empirical literature, with a particular focus on data from randomized field experiments. We show that testing for heterogeneous treatment effects is highly common, and likely to result in a large number of false discoveries when conventional standard errors are applied. We demonstrate that applying correction procedures developed in the statistics literature can fully address this issue, and discuss the implications of multiple testing adjustments for power calculations and experimental design.
Article
Although institutions are believed to be key determinants of economic performance, there is limited evidence on how they can be successfully reformed. Evaluating the effects of specific reforms is complicated by the lack of exogenous variation in the presence of institutions; the difficulty of empirically measuring institutional performance; and the temptation to “cherry pick” a few novel treatment effect estimates from amongst the large number of indicators required to capture the complex and multi-faceted subject. We evaluate one attempt to make local institutions more egalitarian by imposing minority participation requirements in Sierra Leone and test for longer term learning-by-doing effects. In so doing, we address these three pervasive challenges by: exploiting the random assignment of a participatory local governance intervention, developing innovative real-world outcomes measures, and using a pre-analysis plan to bind our hands against data mining. The specific program under study is a “community driven development” (CDD) project, which has become a popular strategy amongst donors to improve local institutions in developing countries. We find positive short-run effects on local public goods provision and economic outcomes, but no sustained impacts on collective action, decision-making processes, or the involvement of marginalized groups (like women) in local affairs, indicating that the intervention was ineffective at durably reshaping local institutions. We further show that in the absence of a pre-analysis plan, we could have instead generated two highly divergent, equally erroneous interpretations of the impacts—one positive, one negative—of external aid on institutions.Institutional subscribers to the NBER working paper series, and residents of developing countries may download this paper without additional charge at www.nber.org.
Article
This article presentsevidence on the employment effects of recent minimum wage increases from a prespecified research design that entailed committing to a detailed set of statistical analyses prior to “going to” the data. The limited data to which the prespecified research design can be applied may preclude finding many significant effects. Nonetheless, the evidence is most consistent with disemployment effects of minimum wages for younger, less-skilled workers.
Article
The view that the returns to public educational investments are highest for early childhood interventions stems primarily from several influential randomized trials - Abecedarian, Perry, and the Early Training Project - that point to super-normal returns to preschool interventions. This paper implements a unified statistical framework to present a de novo analysis of these ex- periments, focusing on two core issues that have received little attention in previous analyses: treatment effect heterogeneity by gender and over-rejection of the null hypothesis due to mul- tiple inference. The primary finding of this reanalysis is that girls garnered substantial short- and long-term benefits from the interventions. However, there were no significant long-term benefits for boys. These conclusions would not be apparent when using "naive" estimators that do not adjust for multiple inference.
Article
This paper presents evidence on the employment effects of recent minimum wage increases from a pre-specified research design that entailed committing to a detailed set of statistical analyses prior to 'going to' the data. Despite the limited data to which the pre-specified research design can be applied, evidence of disemployment effects of minimum wages is often found where we would most expect it--for younger, less-skilled workers.
The registration of observational studies –when metaphors go bad
  • Epidemiology
Reshaping institutions: Evidence on aid impacts using a preanalysis plan
  • Casey
The role of theory in field experiments
  • Card
Multiple Hypothesis Testing in Experimental Economics
  • J A List
  • A M Shaikh
  • Y Xu
Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration
  • Humphreys