ChapterPDF Available

Mostly Harmless Econometrics: An Empiricist's Companion

Authors:

Abstract

The core methods in today's econometric toolkit are linear regression for statistical control, instrumental variables methods for the analysis of natural experiments, and differences-in-differences methods that exploit policy changes. In the modern experimentalist paradigm, these techniques address clear causal questions such as: Do smaller classes increase learning? Should wife batterers be arrested? How much does education raise wages?Mostly Harmless Econometricsshows how the basic tools of applied econometrics allow the data to speak.In addition to econometric essentials,Mostly Harmless Econometricscovers important new extensions--regression-discontinuity designs and quantile regression--as well as how to get standard errors right. Joshua Angrist and J rn-Steffen Pischke explain why fancier econometric techniques are typically unnecessary and even dangerous. The applied econometric methods emphasized in this book are easy to use and relevant for many areas of contemporary social science.An irreverent review of econometric essentialsA focus on tools that applied researchers use mostChapters on regression-discontinuity designs, quantile regression, and standard errorsMany empirical examplesA clear and concise resource with wide applications.
A preview of the PDF is not available
... Our statistical analysis pipeline comprises logit regression [7] and Analysis of Variance (ANOVA) [27]. These methods provide complementary insights into the impact of the studied attributes on fairness. ...
... Logit regression [7] models the relation between attributes and the binary outcome of a model. It is a generalized linear model that estimates the probability of a binary outcome based on one or more independent variables using: ...
Preprint
Full-text available
Face recognition and verification are two computer vision tasks whose performances have advanced with the introduction of deep representations. However, ethical, legal, and technical challenges due to the sensitive nature of face data and biases in real-world training datasets hinder their development. Generative AI addresses privacy by creating fictitious identities, but fairness problems remain. Using the existing DCFace SOTA framework, we introduce a new controlled generation pipeline that improves fairness. Through classical fairness metrics and a proposed in-depth statistical analysis based on logit models and ANOVA, we show that our generation pipeline improves fairness more than other bias mitigation approaches while slightly improving raw performance.
... Inspired by this, we construct a causal graph where document perplexity plays as a treatment and document semantic plays as a confounder (Figure 2). We adopt a two-stage least squares (2SLS) regression procedure (Angrist and Pischke, 2009;Hartford et al., 2017) to eliminate the influence of confounders when estimating this biased effect, the experimental results indicate the effect is significantly negative. Based on these findings, the cause of source bias can be elucidated as the unexpected causal effect of perplexity on estimated relevance scores. ...
Preprint
Full-text available
Previous studies have found that PLM-based retrieval models exhibit a preference for LLM-generated content, assigning higher relevance scores to these documents even when their semantic quality is comparable to human-written ones. This phenomenon, known as source bias, threatens the sustainable development of the information access ecosystem. However, the underlying causes of source bias remain unexplored. In this paper, we explain the process of information retrieval with a causal graph and discover that PLM-based retrievers learn perplexity features for relevance estimation, causing source bias by ranking the documents with low perplexity higher. Theoretical analysis further reveals that the phenomenon stems from the positive correlation between the gradients of the loss functions in language modeling task and retrieval task. Based on the analysis, a causal-inspired inference-time debiasing method is proposed, called Causal Diagnosis and Correction (CDC). CDC first diagnoses the bias effect of the perplexity and then separates the bias effect from the overall estimated relevance score. Experimental results across three domains demonstrate the superior debiasing effectiveness of CDC, emphasizing the validity of our proposed explanatory framework. Source codes are available at https://github.com/WhyDwelledOnAi/Perplexity-Trap.
... These methods typically rely on strong functional form assumptions (e.g., linear relationships) and a relatively low-dimensional set of predictors. Although these assumptions can simplify interpretation, they become problematic when dealing with highly complex nonlinear processes or a large number of confounders ( [58,59,60,61,62]). Many methods assume a "sparse" structure, implying that only a small number of covariates significantly affect the outcome, even if a large number are available ( [63]). ...
Preprint
Full-text available
Heatwaves, intensified by climate change and rapid urbanisation, pose significant threats to urban systems, particularly in the Global South, where adaptive capacity is constrained. This study investigates the relationship between heatwaves and nighttime light (NTL) radiance, a proxy of nighttime economic activity, in four hyperdense cities: Delhi, Guangzhou, Cairo, and Sao Paulo. We hypothesised that heatwaves increase nighttime activity. Using a double machine learning (DML) framework, we analysed data from 2013 to 2019 to quantify the impact of heatwaves on NTL while controlling for local climatic confounders. Results revealed a statistically significant increase in NTL intensity during heatwaves, with Cairo, Delhi, and Guangzhou showing elevated NTL on the third day, while S\~ao Paulo exhibits a delayed response on the fourth day. Sensitivity analyses confirmed the robustness of these findings, indicating that prolonged heat stress prompts urban populations to shift activities to night. Heterogeneous responses across cities highlight the possible influence of urban morphology and adaptive capacity to heatwave impacts. Our findings provide a foundation for policymakers to develop data-driven heat adaptation strategies, ensuring that cities remain liveable and economically resilient in an increasingly warming world.
... However, formal identification and estimation theory for triple difference has received little attention. Although an identification formula for triple difference appears in (Wooldridge 2020) and (Fröhlich and Sperlich 2019), others have settled on a brief mention of this framework without technical details (Lechner et al. 2011;Angrist and Pischke 2009). Recently, Olden and Møen (2022) conducted the first formal study of identification in the triple difference framework. ...
Preprint
Full-text available
The triple difference causal inference framework is an extension of the well-known difference-in-differences framework. It relaxes the parallel trends assumption of the difference-in-differences framework through leveraging data from an auxiliary domain. Despite being commonly applied in empirical research, the triple difference framework has received relatively limited attention in the statistics literature. Specifically, investigating the intricacies of identification and the design of robust and efficient estimators for this framework has remained largely unexplored. This work aims to address these gaps in the literature. From the identification standpoint, we present outcome regression and weighting methods to identify the average treatment effect on the treated in both panel data and repeated cross-section settings. For the latter, we relax the commonly made assumption of time-invariant covariates. From the estimation perspective, we consider semiparametric estimators for the triple difference framework in both panel data and repeated cross-sections settings. We demonstrate that our proposed estimators are doubly robust.
... Our rich data on diverse engagement metrics (reposts, replies, likes, and views) as well as our reconstruction of diffusion cascades (enabling us to study how note attachment influences cascade structure) go far beyond earlier work studying the effects of Community Notes [41], which considered only the effects on reposts and deletions. That prior work also used difference-in-differences methods which, unlike our synthetic control methods, rely on strong "parallel trends" assumptions [42]. When comparable, our independent estimates also provide important corroboration of those prior estimated effects. ...
Preprint
Social networks scaffold the diffusion of information on social media. Much attention has been given to the spread of true vs. false content on online social platforms, including the structural differences between their diffusion patterns. However, much less is known about how platform interventions on false content alter the engagement with and diffusion of such content. In this work, we estimate the causal effects of Community Notes, a novel fact-checking feature adopted by X (formerly Twitter) to solicit and vet crowd-sourced fact-checking notes for false content. We gather detailed time series data for 40,074 posts for which notes have been proposed and use synthetic control methods to estimate a range of counterfactual outcomes. We find that attaching fact-checking notes significantly reduces the engagement with and diffusion of false content. We estimate that, on average, the notes resulted in reductions of 45.7% in reposts, 43.5% in likes, 22.9% in replies, and 14.0% in views after being attached. Over the posts' entire lifespans, these reductions amount to 11.4% fewer reposts, 13.0% fewer likes, 7.3% fewer replies, and 5.7% fewer views on average. In reducing reposts, we observe that diffusion cascades for fact-checked content are less deep, but not less broad, than synthetic control estimates for non-fact-checked content with similar reach. This structural difference contrasts notably with differences between false vs. true content diffusion itself, where false information diffuses farther, but with structural patterns that are otherwise indistinguishable from those of true information, conditional on reach.
... The Weighted Quantile Loss (WQL) [20] measures how well a predictive model performs across different quantiles of the target variable's distribution. It is particularly useful in scenarios where it is important to understand the model's performance across various data distribution segments. ...
Article
Full-text available
Accurate model selection is essential in predictive modelling across various domains, significantly impacting decision-making and resource allocation. Despite extensive research, the model selection process remains challenging. This work aims to integrate the Minimum Description Length principle with the Multi-Criteria Decision Analysis to enhance the selection of forecasting machine learning models. The proposed MDL-MCDA framework combines the MDL principle, which balances model complexity and data fit, with the MCDA, which incorporates multiple evaluation criteria to address conflicting error measurements. Four datasets from diverse domains, including software engineering (effort estimation), healthcare (glucose level prediction), finance (GDP prediction), and stock market prediction, were used to validate the framework. Various regression models and feed-forward neural networks were evaluated using criteria such as MAE, MAPE, RMSE, and Adjusted R 2. We employed the Analytic Hierarchy Process (AHP) to determine the relative importance of these criteria. We conclude that the integration of MDL and MCDA significantly improved model selection across all datasets. The cubic polynomial regression model and the multi-layer perceptron models outperformed other models in terms of AHP score and MDL criterion. Specifically, the MDL-MCDA approach provided a more nuanced evaluation, ensuring the selected models effectively balanced complexity and predictive accuracy.
Article
Recommender systems are important and powerful tools for various personalized services. Traditionally, these systems use data mining and machine learning techniques to make recommendations based on correlations found in the data. However, relying solely on correlation without considering the underlying causal mechanism may lead to various practical issues such as fairness, explainability, robustness, bias, echo chamber and controllability problems. Therefore, researchers in related area have begun incorporating causality into recommendation systems to address these issues. In this survey, we review the existing literature on causal inference in recommender systems. We discuss the fundamental concepts of both recommender systems and causal inference as well as their relationship, and review the existing work on causal methods for different problems in recommender systems. Finally, we discuss open problems and future directions in the field of causal inference for recommendations.
Preprint
Full-text available
We extend the regression discontinuity (RD) design to settings where each unit's treatment status is an average or aggregate across multiple discontinuity events. Such situations arise in many studies where the outcome is measured at a higher level of spatial or temporal aggregation (e.g., by state with district-level discontinuities) or when spillovers from discontinuity events are of interest. We propose two novel estimation procedures - one at the level at which the outcome is measured and the other in the sample of discontinuities - and show that both identify a local average causal effect under continuity assumptions similar to those of standard RD designs. We apply these ideas to study the effect of unionization on inequality in the United States. Using credible variation from close unionization elections at the establishment level, we show that a higher rate of newly unionized workers in a state-by-industry cell reduces wage inequality within the cell.
Article
Full-text available
This tutorial introduces the package sensemakr for R and Stata, which implements a suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett (2020, 2022). Given a regression model, sensemakr can compute sensitivity statistics for routine reporting, such as the robustness value , which describes the minimum strength that unobserved confounders need to have to overturn a research conclusion. The package also provides plotting tools that visually demonstrate the sensitivity of point estimates and t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on sensitivity parameters by means of comparison with the explanatory power of observed variables. All these tools are based on the familiar "omitted variable bias" framework, do not require assumptions regarding the functional form of the treatment assignment mechanism nor the distribution of the unobserved confounders, and naturally handle multiple, non-linear confounders. With sensemakr, users can transparently report the sensitivity of their causal inferences to unobserved confounding, thereby enabling a more precise, quantitative debate as to what can be concluded from imperfect observational studies.
Article
Understanding the causes and consequences of corporate risk-taking has remained a crucial topic for organizational scholars. Using the case of U.S. banks and one dimension of their risk-taking behavior around the 2008 financial crisis, we offer a theory of how the diverse experiences of corporate leaders can shape their risk-taking behavior. Building on the imprinting literature, we theorize how different types of experiences that bank CEOs had in the past interact to shape current risk-taking behavior, resulting in risk moderation under crisis conditions. We focus on two imprinting experiences with particular relevance for bank CEOs’ risk-taking behavior—MBA education and past crisis experience. We argue that the latter played a pronounced role during the crisis because of greater imprint-environment fit. Our analysis using data from 170 large banks between 2001 and 2019 shows that bank CEOs’ firsthand experience of a prior banking crisis not only directly tempered bank risk-taking but also did so indirectly by limiting the risk-taking tendencies of CEOs with an MBA, particularly during the crisis period. Our study contributes to the sociological literature about organizational risk-taking by emphasizing the crucial role of organizational leaders’ biographies and exploring how earlier institutional conditions shape their risk-taking behavior later.
Article
Full-text available
We evaluate Angrist and Krueger (1991) and Bound, Jaeger, and Baker (1995) by constructing reliable confidence regions around the 2SLS and LIML estimators for returns-to-schooling regardless of the quality of the instruments. The results indicate that the returns-to-schooling were between 8 and 25 percent in 1970 and between 4 and 14 percent in 1980. Although the estimates are less accurate than previously thought, most specifications by Angrist and Krueger (1991) are informative for returns-to-schooling. In particular, concern about the reliability of the model with 178 instruments is unfounded despite the low first-stage F-statistic. Finally, we briefly discuss bias-adjustment of estimators and pretesting procedures as solutions to the weak-instrument problem.
Article
For a quarter century, American education researchers have tended to favour qualitative and descriptive analyses over quantitative studies using random assignment or featuring credible quasi-experimental research designs. This has now changed. In 2002 and 2003, the US Department of Education funded a dozen randomized trials to evaluate the efficacy of pre-school programmes, up from one in 2000. In this essay, I explore the intellectual and legislative roots of this change, beginning with the story of how contemporary education research fell out of step with other social sciences. I then use a study in which low-achieving high-school students were randomly offered incentives to learn to show how recent developments in research methods answer ethical and practical objections to the use of random assignment for research on schools. Finally, I offer a few cautionary notes based on results from the recent effort to cut class size in California.
Article
S ummary Bounds for matrix weighted averages of pairs of vectors are presented. The weight matrices are constrained to certain classes suggested by the Bayesian analysis of the linear regression model and the multivariate normal model. The bounds identify the region within which the posterior location vector must lie if the prior comes from a certain class of priors.
Article
The paper examines two approaches to the omitted variable problem. Both of them try to correct for the omitted variable bias by specifying several equations in which the unobservable appears. The first approach assumes that the common left out variable is the only thing connecting the residuals from these equations, making it possible to extract this common factor and control for it. The second approach relies on building a model of the unobservable, by specifying observable variables which are causally related to it. A combination of these two methods is applied to the 1964 CPS-NORC veterans sample in order to evaluate the bias in income- schooling regressions caused by the omission of an unobservable initial ‘ability’ variable.
Article
When several candidate tests are available for a given testing problem, and each has nice properties with respect to different criteria such as efficiency and robustness, it is desirable to combine them. We discuss various combined tests based on asymptotically normal tests. When the means of two standardized tests under contiguous alternatives are close, we show that the maximum of the two tests appears to have an overall best performance compared with other forms of combined tests considered, and that it retains most power compared with the better one of the two tests combined. As an application, for testing zero location shift between two groups, we studied the normal, Wilcoxon, median tests and their combined tests. Because of their structural differences, the joint convergence and the asymptotic correlation of the tests are not easily derived from the usual asymptotic arguments of the tests. We developed a novel application of martingale theory to obtain the asymptotic correlations and their estimators. Simulation studies were also performed to examine the small sample properties of these combined tests. Finally we illustrate the methods by a real data example.