Article

Augmenting Pre-Analysis Plans with Machine Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Concerns about the dissemination of spurious results have led to calls for pre-analysis plans (PAPs) to avoid ex-post “p-hacking.” But often the conceptual hypotheses being tested do not imply the level of specificity required for a PAP. In this paper we suggest a framework for PAPs that capitalize on the availability of causal machine-learning (ML) techniques, in which researchers combine specific aspects of the analysis with ML for the flexible estimation of unspecific remainders. A “cheap-lunch” result shows that the inclusion of ML produces limited worst-case costs in power, while offering a substantial upside from systematic specification searches.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... They illustrate their method by using simulated data for performance comparisons across different econometric estimators. use deep neural networks to generate artificial paintings to study gender discrimination in art prices.Finally,Ludwig, Mullainathan, and Spiess (2019) introduce ML-augmented pre-analysis plans to avoid p-hacking. They augment standard linear regression with new regressors from ML. ...
Article
Full-text available
We study how researchers can apply machine learning (ML) methods in finance. We first establish that the two major categories of ML (supervised and unsupervised learning) address fundamentally different problems than traditional econometric approaches. Then, we review the current state of research on ML in finance and identify three archetypes of applications: i) the construction of superior and novel measures, ii) the reduction of prediction error, and iii) the extension of the standard econometric toolset. With this taxonomy, we give an outlook on potential future directions for both researchers and practitioners. Our results suggest large benefits of ML methods compared to traditional approaches and indicate that ML holds great potential for future research in finance. This article is protected by copyright. All rights reserved.
... Second, given the known biases in publication processes towards positive and statistically significant results (Dwan et al. 2008;Munafò et al. 2017), multiverse analyses are a powerful tool to ensure that researchers have not intentionally or unintentionally made apparently reasonable decisions that may bias analyses toward finding a specific result. Incorporating pre-analysis plans can help adjudicate which of these specifications are most appropriate (Ludwig et al. 2019). ...
Article
Full-text available
Overcoming vaccine hesitancy is critical to containing the COVID-19 pandemic in the United States. To increase vaccination rates, the State of Ohio launched a million dollar lottery in May 2021. Following a pre-registered analysis, we estimate the effects of Ohio’s lottery program Vax-a-Million on COVID-19 vaccination rates by comparing it to a “synthetic control” composed of eight other states. We find a statistically insignificant 1.3% decrease in the full vaccination rate in Ohio at the end of the lottery period. We investigate the robustness of our conclusion to model specifications through a multiverse analysis of 216 possible models, including longer time periods and alternative vaccination measures. The majority (88%) find small negative effects in line with the results of our pre-registered model. While our results are most consistent with a decrease in vaccination rate, they do not allow a firm conclusion on whether the lottery increased or decreased vaccine uptake.
... preregistration for a substantial share of observational nonprospective studies is uncertain, but remains a critical direction for future debate and innovation. A cluster of other work is actively enriching preregistration in various ways, including by studies that compare effects of treatments with expert forecasts (Della Vigna and Pope 2018; Della Vigna, Pope, and Vivalt 2019), preregistering plans for split-sample analysis (Fafchamps and Labonne 2016;Anderson and Magruder 2017); or using a pre-analysis plan to guide the application of machine learning tools (Ludwig, Mullainathan, and Spiess 2019). ...
Article
A decade ago, the term “research transparency” was not on economists' radar screen, but in a few short years a scholarly movement has emerged to bring new open science practices, tools and norms into the mainstream of our discipline. The goal of this article is to lay out the evidence on the adoption of these approaches – in three specific areas: open data, pre-registration and pre-analysis plans, and journal policies – and, more tentatively, begin to assess their impacts on the quality and credibility of economics research. The evidence to date indicates that economics (and related quantitative social science fields) are in a period of rapid transition toward new transparency-enhancing norms. While solid data on the benefits of these practices in economics is still limited, in part due to their relatively recent adoption, there is growing reason to believe that critics' worst fears regarding onerous adoption costs have not been realized. Finally, the article presents a set of frontier questions and potential innovations.
Article
The world of fluid mechanics is increasingly generating a large amount of data, thanks to the use of numerical simulation techniques. This offers interesting opportunities for incorporating machine learning methods to solve data-related problems such as model calibration.One of the applications that machine learning can offer to the world of Engineering and Fluid Mechanics in particular is the calibration of models making it possible to approximate a phenomenon. Indeed, the computational cost generated by some models of fluid mechanics pushes scientists to use other models close to the original models but less computationally intensive in order to facilitate their handling. Among the different approaches used: machine learning coupled with some optimization methods and algorithms in order to reduce the computation cost induced. In this paper, we propose a framework which is a new flexible, optimized and improved method,to calibrate a physical model, called the wake oscillator (WO), which simulates the vibratory behaviors of overhead line conductors. an approximation of a heavy and complex model called the strip theory (ST) model. OPTI-ENS is composed of an ensemble machine learning algorithm (ENS) and an optimization algorithm of the WO model so that the WO model can generate the adequate training data as input to the ENS model. ENS model will therefore take as input the data from the WO model and output the data from the ST model. As a benchmark, a series of Machine learning models have been implemented and tested. The OPTI-ENS algorithm was retained with a best Coefficient of determination (R2 Score) of almost 0.7 and a Root mean square error (RMSE) of 7.57e−09. In addition, this model is approximately 170 times faster (in terms of calculation time) than an ENS model without optimization of the generation of training data by the WO model. This type of approach therefore makes it possible to calibrate the WO model so that simulations of the behavior of overhead line conductors are carried out only with the WO model.
Article
Full-text available
In this paper, we present a new approach to use machine learning (ML) for the calibration of a physical model allowing the reproduction of the vibratory behavior of an overhead line conductor. This physical model known as Strip Theory (ST) has the advantage of being very precise but very complicated and cumbersome in its software operations and manipulations. A second model known as the Wake Oscillator (WO) has been implemented in order to meet the limitations of the ST model. In order to be able to use the WO model instead of the ST model, very heavy manual adjustments are required, which makes its use complicated.Precisely, the WO must be able to generate a time series similar to a time series generated by the ST model. In order to respond to this limitation, a machine learning model known as ENS has been proposed. The machine learning model will therefore take as input the data from the WO model and output the data from the ST model. A series of Machine learning models have been implemented and tested. The ENS algorithm was retained with a best Pearson’s linear coefficient of determination (R2 Score) of almost 0.7 and a Root mean square deviation (RMSE) of 7.57e-09. This type of approach therefore makes it possible to calibrate the WO model so that simulations of the behavior of overhead line conductors are carried out only with the WO model.
Article
The 2019 Economics Nobel Laureates have shed light on how several disciplines can learn from each other to achieve a greater goal. Thanks to their work, economics has begun to follow the methodological and institutional path laid out, amongst others, in medical sciences. The prize creates a momentum in economics to work on areas in which the field still falls short of achievable, higher standards and on more rigor in research transparency, cooperation, and accountability. Yet we also argue that the benefits from the linkage between disciplines are not one-sided. The application and recognition of field experiments as a method in economics have also advanced and enlarged the methodological toolkit on topics such as quasi-experimental method, non-compliance, and mediation analysis. Methods urgently needed to address topics of global concern.
Article
What was once broadly viewed as an impossibility—learning from experimental data in economics—has now become commonplace. Governmental bodies, think tanks, and corporations around the world employ teams of experimental researchers to answer their most pressing questions. For their part, in the past two decades academics have begun to more actively partner with organizations to generate data via field experimentation. Although this revolution in evidence‐based approaches has served to deepen the economic science, recently a credibility crisis has caused even the most ardent experimental proponents to pause. This study takes a step back from the burgeoning experimental literature and introduces 12 actions that might help to alleviate this credibility crisis and raise experimental economics to an even higher level. In this way, we view our “12 action wish list” as discussion points to enrich the field.
Article
Full-text available
In randomized experiments, linear regression is often used to adjust for imbalances in covariates between treatment groups, yielding an estimate of the average treatment effect with lower asymptotic variance than the unadjusted estimator. If there are a large number of covariates, many of which are irrelevant to the potential outcomes, the Lasso can be used to both select relevant covariates and perform the adjustment. We study the resulting estimator under the Neyman-Rubin model for randomization. We present conditions on the covariates and potential outcomes which guarantee that the Lasso is more efficient than the unadjusted estimator and provide a conservative estimate of the asymptotic variance. Simulation and data examples show that Lasso-based adjustment can be advantageous even when p<np<n and that a variant of Lasso, cv(Lasso+OLS), is similar to cv(Lasso) in terms of confidence length and coverage, but outperforms cv(Lasso) with much fewer covariates in the adjustment.
Article
Many scientific and engineering challenges---ranging from personalized medicine to customized marketing recommendations---require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms that, to our knowledge, is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially as the number of covariates increases.
Article
There is growing interest in enhancing research transparency and reproducibility in economics and other scientific fields. We survey existing work on these topics within economics and discuss the evidence suggesting that publication bias, inability to replicate, and specification searching remain widespread in the discipline. We next discuss recent progress in this area, including through improved research design, study registration and pre-analysis plans, disclosure standards, and open sharing of data and materials, drawing on experiences in both economics and other social sciences. We discuss areas where consensus is emerging on new practices, as well as approaches that remain controversial, and speculate about the most effective ways to make economics research more credible in the future.
Article
Background:: When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. Objectives:: This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data. Results:: In this article, we propose the "leave-one-out potential outcomes" estimator. We leave out each observation and then impute that observation's treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman-Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.
Article
We revisit the classic semiparametric problem of inference on a low dimensional parameter θ0 in the presence of high-dimensional nuisance parameters η0. We depart from the classical setting by allowing for η0 to be so high-dimensional that the traditional assumptions, such as Donsker properties, that limit complexity of the parameter space for this object break down. To estimate η0, we consider the use of statistical or machine learning (ML) methods which are particularly well-suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η0 cause a heavy bias in estimators of θ0 that are obtained by naively plugging ML estimators of η0 into estimating equations for θ0. This bias results in the naive estimator failing to be N−1/2 consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ0, and (2) making use of cross-fitting which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in a N−1/2 -neighborhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of DML applied to learn the main regression parameter in a partially linear regression model, DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model, DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness, and DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples. This article is protected by copyright. All rights reserved
Article
Abduction is the process of generating and choosing models, hypotheses, and data analyzed in response to surprising findings. All good empirical economists abduct. Explanations usually evolve as studies evolve. The abductive approach challenges economists to step outside the framework of received notions about the "identification problem" that rigidly separates the act of model and hypothesis creation from the act of inference from data. It asks the analyst to engage models and data in an iterative dynamic process, using multiple models and sources of data in a back and forth where both models and data are augmented as learning evolves.
Article
Machines are increasingly doing "intelligent" things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.
Article
Significance As datasets get larger and more complex, there is a growing interest in using machine-learning methods to enhance scientific analysis. In many settings, considerable work is required to make standard machine-learning methods useful for specific scientific applications. We find, however, that in the case of treatment effect estimation with randomized experiments, regression adjustments via machine-learning methods designed to minimize test set error directly induce efficient estimates of the average treatment effect. Thus, machine-learning methods can be used out of the box for this task, without any special-case adjustments.
Article
In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without "sparsity" assumptions. We propose an "honest" approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the "ground truth" for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7-22%.
Article
Imagine a nefarious researcher in economics who is only interested in finding a statistically significant result of an experiment. The researcher has 100 different variables he could examine, and the truth is that the experiment has no impact. By construction, the researcher should find an average of five of these variables statistically significantly different between the treatment group and the control group at the 5 percent level—after all, the exact definition of 5 percent significance implies that there will be a 5 percent false rejection rate of the null hypothesis that there is no difference between the groups. The nefarious researcher, who is interested only in showing that this experiment has an effect, chooses to report only the results on the five variables that pass the statistically significant threshold. If the researcher is interested in a particular sign of the result—that is, showing that this program “works” or “doesn’t work”— on average half of these results will go in the direction the researcher wants. Thus, if a researcher can discard or not report all the variables that do not agree with his desired outcome, the researcher is virtually guaranteed a few positive and statistically significant results, even if in fact the experiment has no effect.
Article
The social sciences—including economics—have long called for transparency in research to counter threats to producing robust and replicable results. In this paper, we discuss the pros and cons of three of the more prominent proposed approaches: pre-analysis plans, hypothesis registries, and replications. They have been primarily discussed for experimental research, both in the field including randomized control trials and the laboratory, so we focus on these areas. A pre-analysis plan is a credibly fixed plan of how a researcher will collect and analyze data, which is submitted before a project begins. Though pre-analysis plans have been lauded in the popular press and across the social sciences, we will argue that enthusiasm for pre-analysis plans should be tempered for several reasons. Hypothesis registries are a database of all projects attempted; the goal of this promising mechanism is to alleviate the "file drawer problem," which is that statistically significant results are more likely to be published, while other results are consigned to the researcher's "file drawer." Finally, we evaluate the efficacy of replications. We argue that even with modest amounts of researcher bias—either replication attempts bent on proving or disproving the published work, or poor replication attempts—replications correct even the most inaccurate beliefs within three to five replications. We offer practical proposals for how to increase the incentives for researchers to carry out replications.
Article
The results of an unusual test with 1,300 families indicate that payments would not reduce their incentive to work. (DM)
  • Belloni Alexandre
  • Casey Katherine