ChapterPDF Available

Mostly Harmless Econometrics: An Empiricist's Companion

Authors:

Abstract

The core methods in today's econometric toolkit are linear regression for statistical control, instrumental variables methods for the analysis of natural experiments, and differences-in-differences methods that exploit policy changes. In the modern experimentalist paradigm, these techniques address clear causal questions such as: Do smaller classes increase learning? Should wife batterers be arrested? How much does education raise wages?Mostly Harmless Econometricsshows how the basic tools of applied econometrics allow the data to speak.In addition to econometric essentials,Mostly Harmless Econometricscovers important new extensions--regression-discontinuity designs and quantile regression--as well as how to get standard errors right. Joshua Angrist and J rn-Steffen Pischke explain why fancier econometric techniques are typically unnecessary and even dangerous. The applied econometric methods emphasized in this book are easy to use and relevant for many areas of contemporary social science.An irreverent review of econometric essentialsA focus on tools that applied researchers use mostChapters on regression-discontinuity designs, quantile regression, and standard errorsMany empirical examplesA clear and concise resource with wide applications.
A preview of the PDF is not available
... The whole framework is designed in a classical two-stage IV study pipeline [1,16]. Generally, in this pipeline, the first stage predicts the treatment with IVs, and the second stage estimates the potential outcomes based on the treatment predicted by the first stage. ...
... Different from most causal inference methods which assume that all the confounders are observed, instrumental variable (IV) based methods provide an alternative approach to identifying causal effects even with the existence of hidden confounders. One of the most well-known lines of IV studies is twostage methods [1,8,16,30]. The two-stage least squares method (2SLS) [1] is the most representative work in this line, which first fits a linear model to predict treatment with features and IVs, and then fits another linear model to predict the outcome with the features and the predicted treatment. ...
... One of the most well-known lines of IV studies is twostage methods [1,8,16,30]. The two-stage least squares method (2SLS) [1] is the most representative work in this line, which first fits a linear model to predict treatment with features and IVs, and then fits another linear model to predict the outcome with the features and the predicted treatment. 2SLS is based on two strong assumptions: homogeneity (treatment effect is the same for different units) and linearity (the linear models are correctly specified). ...
Preprint
Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications.
... TeBlunthuis et al. [63] replicated the analysis of Halfaker et al. using a sample of 740 wikis from Wikia. 1 Their study found similar patterns and explanations for them, suggesting that this is a general trend in peer production communities and a challenge for their sustainability. ...
... Estimating this conditional effect is done using two-stage least squares regression (2SLS) [1], and as our experiment design is similar to that of the Wikipedia Adventure [54], we adopt their approach. In the first stage shown in Equation 2, we estimate the likelihood of making a suggested edit if invited to the homepage. ...
Preprint
For peer production communities to be sustainable, they must attract and retain new contributors. Studies have identified social and technical barriers to entry and discovered some potential solutions, but these solutions have typically focused on a single highly successful community, the English Wikipedia, been tested in isolation, and rarely evaluated through controlled experiments. We propose the Newcomer Homepage, a central place where newcomers can learn how peer production works and find opportunities to contribute, as a solution for attracting and retaining newcomers. The homepage was built upon existing research and designed in collaboration with partner communities. Through a large-scale controlled experiment spanning 27 non-English Wikipedia wikis, we evaluate the homepage and find modest gains, and that having a positive effect on the newcomer experience depends on the newcomer's context. We discuss how this impacts interventions that aim to improve the newcomer experience in peer production communities.
... Our objective is to identify the primary factors that influence the performance of EFCIL algorithms. To interpret causal effects, we employ multiple linear regressions using the Ordinary Least Squares (OLS) method, following established statistical and econometric practices [4,18]. In a linear regression, we aim to explain a target variable Y using explanatory variables X i . ...
Preprint
Full-text available
Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL process. However, the use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum. The initial model of the CIL process may only use the first batch of the target dataset, or also use pre-trained weights obtained on an auxiliary dataset. The choice between these two initial learning strategies can significantly influence the performance of the incremental learning model, but has not yet been studied in depth. Performance is also influenced by the choice of the CIL algorithm, the neural architecture, the nature of the target task, the distribution of classes in the stream and the number of examples available for learning. We conduct a comprehensive experimental study to assess the roles of these factors. We present a statistical analysis framework that quantifies the relative contribution of each factor to incremental performance. Our main finding is that the initial training strategy is the dominant factor influencing the average incremental accuracy, but that the choice of CIL algorithm is more important in preventing forgetting. Based on this analysis, we propose practical recommendations for choosing the right initial training strategy for a given incremental learning use case. These recommendations are intended to facilitate the practical deployment of incremental learning.
... Parallel pre-treatment trends can bolster a DD design's credibility by showing the treatment and control groups moved together before treatment (Angrist and Pischke 2010), but parallel pre-treatment trends are not necessarily sufficient for the PTA to hold (Kahn-Lang and Lang 2020). Additionally, the PTA is violated if there are exogeneous post-treatment shocks that differentially affect the treatment and control groups since pre-post differences across groups can no longer be fully attributed to the treatment (Angrist and Pischke 2009). DD also assumes stable unit treatment values (SUTVA) and no anticipatory treatment effects (Malani and Reif 2015). ...
Preprint
A survey’s mode can influence both who takes the survey (selection) and how they respond to its questionnaire (measurement). To distinguish selection and measurement effects, most studies of mode effects use cross-sectional designs. However, cross-sectional designs risk omitted variable bias when the selection process is not fully modeled, but post-treatment bias if the selection process is modeled with variables measured in different survey modes. To address these shortcomings, I propose using difference-in-differences with mixed-mode panel surveys to identify measurement effects. Difference-in-differences compares changes in survey responses over time among panelists who switch modes to panelists who do not switch modes. Difference-in-differences can help reduce omitted variable bias without introducing post-treatment bias. I demonstrate the difference-in-differences approach by estimating the effects of completing live interviews vs. online surveys on the measurement of racial attitudes and political knowledge in the 2016-2020 ANES and cognitive functioning in the 1992-2020 Health and Retirement Study.
... Through simulations, we will also use the same MR estimator when the first-step models are neural network models (which are slower to converge than parametric models, at a rate of around 1 4 [8]). We will use simulations to study the performance of MR in terms of bias, variance, and root mean square error in the presence of instrumental variables (as defined in Angrist and Pischke [21]) and confounder variables in the data. ...
Preprint
Full-text available
Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyperparameter sets which will result in the best causal effect estimation and inference. Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator. We show that MR estimator is $n^r$ consistent if one of the first-step treatment or outcome models is $n^r$ consistent. We also show that MR is the solution to a broad class of estimating equations, and is asymptotically normal if one of the treatment models is $\sqrt{n}$-consistent. The standard error of MR is also calculated which does not require a knowledge of the true models in the first step. Our simulations study supports the theoretical findings.
Chapter
Examining high-frequency national-level panel data from Center for Monitoring Indian Economy (CMIE) on paid work (employment) and unpaid work (time spent on domestic work), this paper examines the effects of the first wave of the Covid-19 pandemic on the gender gaps in paid and unpaid work until December 2020, using difference-in-differences (DID) for estimating the before (the pandemic) and after (the pandemic set in) effects, and event study estimates around the strict national lockdown in April 2020. The DID estimates reveal a lowering of the gender gap in employment probabilities which occurs due to the lower probability of male employment, rather than an increase in female employment. The first month of the national lockdown, April 2020, saw a large contraction in employment for both men and women, where more men lost jobs in absolute terms. Between April and August 2020, male employment recovered steadily as the economy unlocked. The event study estimates show that in August 2020, for women, the likelihood of being employed was 9 percentage points lower than that for men, compared to April 2019, conditional on previous employment. However, by December 2020, gender gaps in employment were at the December 2019 levels. The burden of domestic chores worsened for women during the pandemic. Men spent more time on housework in April 2020 relative to December 2019, but by December 2020, the average male hours had declined to below the pre-pandemic levels, whereas women’s average hours increased sharply. Time spent with friends fell sharply between December 2019 and April 2020, with a larger decline in the case of women. The hours spent with friends recovered in August 2020, to again decline by December 2020 to roughly one-third of the pre-pandemic levels. The paper adopts an intersectional lens to examine how these trends vary by social group identity.
Article
Adverse life events are often understood as having negative consequences for mental health via objective hardships, which are worse for persons with less income. But adversity can also affect mental health via more subjective mechanisms, and here, it is possible that persons with higher income will exhibit greater psychological sensitivity to negative events, for various reasons. Drawing on multiple sociological literatures, this article theorizes potential mechanisms of increasing sensitivity with income. The proposition of differential sensitivity is tested using the strategic case of spousal and parental bereavement among older US adults. The analyses find consistent evidence of increasing sensitivity of depressive symptoms with income. A series of robustness checks indicate that findings are not due to endogenous or antecedent selection. Further, exploratory analyses of mechanisms suggest that higher sensitivity among the affluent was driven by greater expectations and better relationship quality with the deceased. These findings problematize the conceptualization and assessment of human suffering in economically stratified societies.
Preprint
Full-text available
The log transformation of the dependent variable is not innocuous when using a difference-in-differences (DD) model. With a dependent variable in logs, the DD term captures an approximation of the proportional difference in growth rates across groups. As I show with both simulations and two empirical examples, if the baseline outcome distributions are sufficiently different across groups, the DD parameter for a log-specification can be different in sign to that of a levels-specification. I provide a condition, based on (i) the aggregate time effect, and (ii) the difference in relative baseline outcome means, for when the sign-switch will occur.
Article
Full-text available
We evaluate Angrist and Krueger (1991) and Bound, Jaeger, and Baker (1995) by constructing reliable confidence regions around the 2SLS and LIML estimators for returns-to-schooling regardless of the quality of the instruments. The results indicate that the returns-to-schooling were between 8 and 25 percent in 1970 and between 4 and 14 percent in 1980. Although the estimates are less accurate than previously thought, most specifications by Angrist and Krueger (1991) are informative for returns-to-schooling. In particular, concern about the reliability of the model with 178 instruments is unfounded despite the low first-stage F-statistic. Finally, we briefly discuss bias-adjustment of estimators and pretesting procedures as solutions to the weak-instrument problem.
Article
For a quarter century, American education researchers have tended to favour qualitative and descriptive analyses over quantitative studies using random assignment or featuring credible quasi-experimental research designs. This has now changed. In 2002 and 2003, the US Department of Education funded a dozen randomized trials to evaluate the efficacy of pre-school programmes, up from one in 2000. In this essay, I explore the intellectual and legislative roots of this change, beginning with the story of how contemporary education research fell out of step with other social sciences. I then use a study in which low-achieving high-school students were randomly offered incentives to learn to show how recent developments in research methods answer ethical and practical objections to the use of random assignment for research on schools. Finally, I offer a few cautionary notes based on results from the recent effort to cut class size in California.
Article
S ummary Bounds for matrix weighted averages of pairs of vectors are presented. The weight matrices are constrained to certain classes suggested by the Bayesian analysis of the linear regression model and the multivariate normal model. The bounds identify the region within which the posterior location vector must lie if the prior comes from a certain class of priors.
Article
The paper examines two approaches to the omitted variable problem. Both of them try to correct for the omitted variable bias by specifying several equations in which the unobservable appears. The first approach assumes that the common left out variable is the only thing connecting the residuals from these equations, making it possible to extract this common factor and control for it. The second approach relies on building a model of the unobservable, by specifying observable variables which are causally related to it. A combination of these two methods is applied to the 1964 CPS-NORC veterans sample in order to evaluate the bias in income- schooling regressions caused by the omission of an unobservable initial ‘ability’ variable.
Article
When several candidate tests are available for a given testing problem, and each has nice properties with respect to different criteria such as efficiency and robustness, it is desirable to combine them. We discuss various combined tests based on asymptotically normal tests. When the means of two standardized tests under contiguous alternatives are close, we show that the maximum of the two tests appears to have an overall best performance compared with other forms of combined tests considered, and that it retains most power compared with the better one of the two tests combined. As an application, for testing zero location shift between two groups, we studied the normal, Wilcoxon, median tests and their combined tests. Because of their structural differences, the joint convergence and the asymptotic correlation of the tests are not easily derived from the usual asymptotic arguments of the tests. We developed a novel application of martingale theory to obtain the asymptotic correlations and their estimators. Simulation studies were also performed to examine the small sample properties of these combined tests. Finally we illustrate the methods by a real data example.