Jann Spiess’s research while affiliated with Stanford University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Fig. 1. Regression-estimated impact of each of our megastudy's 22 intervention conditions on flu vaccine uptake at Walmart by December 31st, 2020. Whiskers depict 95% CIs without correction for multiple comparisons.
A 680,000-person megastudy of nudges to encourage vaccination in pharmacies
  • Article
  • Full-text available

February 2022

·

434 Reads

·

104 Citations

Proceedings of the National Academy of Sciences

·

Linnea Gandhi

·

Mitesh S. Patel

·

[...]

·

Angela L. Duckworth

Significance Encouraging vaccination is a pressing policy problem. Our megastudy with 689,693 Walmart pharmacy customers demonstrates that text-based reminders can encourage pharmacy vaccination and establishes what kinds of messages work best. We tested 22 different text reminders using a variety of different behavioral science principles to nudge flu vaccination. Reminder texts increased vaccination rates by an average of 2.0 percentage points (6.8%) over a business-as-usual control condition. The most-effective messages reminded patients that a flu shot was waiting for them and delivered reminders on multiple days. The top-performing intervention included two texts 3 d apart and stated that a vaccine was “waiting for you.” Forecasters failed to anticipate that this would be the best-performing treatment, underscoring the value of testing.

Download

Fig. 1 | Measured versus predicted changes in weekly gym visits induced by interventions. The measured change (blue) versus change predicted by third-party observers (gold) in weekly gym visits induced by each of the 53 experimental conditions in our megastudy compared with the placebo control condition during a four-week intervention period. The error bars represent the 95% confidence intervals (see Extended Data Table 6 for the complete OLS regression results shown here in blue and the sample sizes for each condition; Supplementary Information 11 for more details about the prediction data shown in gold; and Supplementary Table 1 for full descriptions of each
Measured vs. predicted change in likelihood of gym visit in a given week
The measured change (blue) vs. change predicted by third-party observers (gold) in whether participants visited the gym that was induced by each of our megastudy’s 53 experimental conditions compared to a Placebo Control condition during a four-week intervention period is depicted here. Error bars represent 95% confidence intervals. See Extended Data Table 7 for complete OLS regression results graphed here in blue, Supplementary Information 11 for more details about the prediction data graphed here in gold, and Supplementary Table 1 for full descriptions of each treatment condition in our megastudy. Sample weights were included in the pooled third-party prediction data to ensure equal weighting of each of our three participant samples (professors, practitioners and prolific respondents). The superscripts a–e denote the different incentive amounts offered in different versions of the bonus for returning after missed workouts, higher incentives and rigidity rewarded conditions, which are described in Supplementary Table 1. In conditions with the same name, superscripts that come earlier in the alphabet indicate larger incentives.
Measured versus predicted changes in weekly gym visits induced by interventions
The measured change (blue) versus change predicted by third-party observers (gold) in weekly gym visits induced by each of the 53 experimental conditions in our megastudy compared with the placebo control condition during a four-week intervention period. The error bars represent the 95% confidence intervals (see Extended Data Table 6 for the complete OLS regression results shown here in blue and the sample sizes for each condition; Supplementary Information 11 for more details about the prediction data shown in gold; and Supplementary Table 1 for full descriptions of each treatment condition in our megastudy). Sample weights were included in the pooled third-party prediction data to ensure equal weighting of each of our three participant samples (professors, practitioners and Prolific respondents). The superscripts a–e denote the different incentive amounts offered in different versions of the bonus for returning after missed workouts, higher incentives and rigidity rewarded conditions, which are described in Supplementary Table 1. In conditions with the same name, superscripts that come earlier in the alphabet indicate larger incentives.
Regression-estimated effects of top-performing interventions
Megastudies improve the impact of applied behavioural science

December 2021

·

2,052 Reads

·

167 Citations

Nature

Policy-makers are increasingly turning to behavioural science for insights about how to improve citizens’ decisions and outcomes¹. Typically, different scientists test different intervention ideas in different samples using different outcomes over different time intervals². The lack of comparability of such individual investigations limits their potential to inform policy. Here, to address this limitation and accelerate the pace of discovery, we introduce the megastudy—a massive field experiment in which the effects of many different interventions are compared in the same population on the same objectively measured outcome for the same duration. In a megastudy targeting physical exercise among 61,293 members of an American fitness chain, 30 scientists from 15 different US universities worked in small independent teams to design a total of 54 different four-week digital programmes (or interventions) encouraging exercise. We show that 45% of these interventions significantly increased weekly gym visits by 9% to 27%; the top-performing intervention offered microrewards for returning to the gym after a missed workout. Only 8% of interventions induced behaviour change that was significant and measurable after the four-week intervention. Conditioning on the 45% of interventions that increased exercise during the intervention, we detected carry-over effects that were proportionally similar to those measured in previous research3–6. Forecasts by impartial judges failed to predict which interventions would be most effective, underscoring the value of testing many ideas at once and, therefore, the potential for megastudies to improve the evidentiary value of behavioural science.


Augmenting Pre-Analysis Plans with Machine Learning

May 2019

·

53 Reads

·

9 Citations

AEA Papers and Proceedings

Concerns about the dissemination of spurious results have led to calls for pre-analysis plans (PAPs) to avoid ex-post “p-hacking.” But often the conceptual hypotheses being tested do not imply the level of specificity required for a PAP. In this paper we suggest a framework for PAPs that capitalize on the availability of causal machine-learning (ML) techniques, in which researchers combine specific aspects of the analysis with ML for the flexible estimation of unspecific remainders. A “cheap-lunch” result shows that the inclusion of ML produces limited worst-case costs in power, while offering a substantial upside from systematic specification searches.


Applications of James-Stein Shrinkage (I): Variance Reduction without Bias

August 2017

·

34 Reads

·

1 Citation

In a linear regression model with homoscedastic Normal noise, I consider James-Stein type shrinkage in the estimation of nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I show that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James-Stein shrinkage in an appropriate high-dimensional Normal-means problem; it can be understood as an invariant generalized Bayes estimator with an uninformative (improper) Jeffreys prior in the target parameter.


Applications of James-Stein Shrinkage (II): Bias Reduction in Instrumental Variable Estimation

August 2017

·

26 Reads

·

1 Citation

In a two-stage linear regression model with Normal noise, I consider James-Stein type shrinkage in the estimation of the first-stage instrumental variable coefficients. For at least four instrumental variables and a single endogenous regressor, I show that the standard two-stage least-squares estimator is dominated with respect to bias. I construct the dominating estimator by a variant of James-Stein shrinkage in a first-stage high-dimensional Normal-means problem followed by a control-function approach in the second stage; it preserves invariances of the structural instrumental variable equations.


Machine Learning Tests for Effects on Multiple Outcomes

July 2017

·

146 Reads

·

9 Citations

A core challenge in the analysis of experimental data is that the impact of some intervention is often not entirely captured by a single, well-defined outcome. Instead there may be a large number of outcome variables that are potentially affected and of interest. In this paper, we propose a data-driven approach rooted in machine learning to the problem of testing effects on such groups of outcome variables. It is based on two simple observations. First, the 'false-positive' problem that a group of outcomes is similar to the concern of 'over-fitting,' which has been the focus of a large literature in statistics and computer science. We can thus leverage sample-splitting methods from the machine-learning playbook that are designed to control over-fitting to ensure that statistical models express generalizable insights about treatment effects. The second simple observation is that the question whether treatment affects a group of variables is equivalent to the question whether treatment is predictable from these variables better than some trivial benchmark (provided treatment is assigned randomly). This formulation allows us to leverage data-driven predictors from the machine-learning literature to flexibly mine for effects, rather than rely on more rigid approaches like multiple-testing corrections and pre-analysis plans. We formulate a specific methodology and present three kinds of results: first, our test is exactly sized for the null hypothesis of no effect; second, a specific version is asymptotically equivalent to a benchmark joint Wald test in a linear regression; and third, this methodology can guide inference on where an intervention has effects. Finally, we argue that our approach can naturally deal with typical features of real-world experiments, and be adapted to baseline balance checks.


Machine Learning: An Applied Econometric Approach

May 2017

·

1,358 Reads

·

1,468 Citations

Journal of Economic Perspectives

Machines are increasingly doing "intelligent" things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.

Citations (7)


... Such projects may involve multiple labs that collect data from different sources to replicate a specific effect (e.g., Klein et al., 2014;Verschuere et al., 2018), investigate sources of heterogeneity between samples (e.g., Klein et al., 2018), or compare effects from multiple interventions on a predefined target outcome (megastudies; e.g., Milkman et al., 2021). Milkman et al. (2021Milkman et al. ( , 2022, for example, involved multiple research groups that designed 41 interventions in two megastudy field experiments that examined the effectiveness of text message-based nudges on vaccination uptake on more than 700,000 patients. Big-team science projects may also come in the form of many-analysts studies in which multiple researchers test the same model using the same data but different methods (e.g., Huntington- Klein et al., 2021;Menkveld et al., 2024;Sarstedt et al., 2024;Silberzahn et al., 2018). ...

Reference:

Toward open science in marketing research
A 680,000-person megastudy of nudges to encourage vaccination in pharmacies

Proceedings of the National Academy of Sciences

... For instance, research has shown that financial incentives can increase gym attendance, especially for those who did not previously attend regularly [12]. One study found that rewarding participants for coming back to the gym after they had missed a previous workout paved the way for an increase of 0.4 more weekly gym visits, with just a bonus of 125 points that would be worth almost 10 cents [13]. These findings align with broader research indicating that even very small incentives can have a disproportionately large impact on behavior [14]. ...

Megastudies improve the impact of applied behavioural science

Nature

... They illustrate their method by using simulated data for performance comparisons across different econometric estimators. use deep neural networks to generate artificial paintings to study gender discrimination in art prices.Finally,Ludwig, Mullainathan, and Spiess (2019) introduce ML-augmented pre-analysis plans to avoid p-hacking. They augment standard linear regression with new regressors from ML. ...

Augmenting Pre-Analysis Plans with Machine Learning
  • Citing Article
  • May 2019

AEA Papers and Proceedings

... In a companion paper (Spiess, 2017), I show how analogous shrinkage in at least three control variables provides consistent loss improvement over the least-squares estimator without introducing bias, provided that treatment is assigned randomly. Together, these results suggests different roles of overfitting in instrumental variable and control coefficients, respectively: while overfitting to instrumental variables in the first stage of a two-stage least-squares procedure induces bias, overfitting to control variables induces variance. ...

Applications of James-Stein Shrinkage (I): Variance Reduction without Bias
  • Citing Article
  • August 2017

... In a companion paper (Spiess, 2017), I show how shrinkage in at least four instrumental variables in a canonical structural form provides consistent bias improvement over the two-stage least-squares estimator. Together, these results suggests different roles of overfitting in control and instrumental variable coefficients, respectively: while overfitting to control variables induces variance, overfitting to instrumental variables in the first stage of a two-stage least-squares procedure induces bias. ...

Applications of James-Stein Shrinkage (II): Bias Reduction in Instrumental Variable Estimation
  • Citing Article
  • August 2017

... Furthermore, we ran machine learning-based tests for assessing balance jointly for all covariates across treatments using the approach of Ludwig, Mullainathan, and Spiess (2017). The authors point out that problems of obtaining too many significant differences by testing several hypotheses are tantamount to overfitting -or including too many regressors while predicting a variable − in machine learning. ...

Machine Learning Tests for Effects on Multiple Outcomes
  • Citing Article
  • July 2017

... It is not straightforward to sign the direction of the bias, which would depend on the complex dynamics of the relationships between household consumption and the control variables as well as the magnitudes of the differences for the same variables across the two datasets. 8 See Mullainathan and Spiess (2017) and Athey and Imbens (2019) for recent reviews of these techniques in economics studies. 9 The U.S. presents a relevant case. ...

Machine Learning: An Applied Econometric Approach
  • Citing Article
  • May 2017

Journal of Economic Perspectives