Kosuke Imai’s research while affiliated with Harvard University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (187)


A summer bridge program for first-generation low-income students stretches academic ambitions with no adverse impacts on first-year GPA
  • Article

December 2024

·

1 Read

Proceedings of the National Academy of Sciences

Rebecca A Johnson

·

Tyler Simko

·

Kosuke Imai

A large body of research documents the barriers faced by first-generation, low-income (FGLI) students as “hidden minorities” on elite college campuses. Although existing studies show brief psychological interventions can help mitigate some of these obstacles, universities are investing in more intensive interventions that try to both shift mindsets and mitigate structural disadvantages in FGLI students’ academic preparation. In collaboration with the administrators at a highly selective university, we conducted a randomized controlled trial of a summer bridge program targeted at FGLI students. During summers between 2017 and 2019, we randomly selected 232 out of 418 first-generation or low-income students and invited them to attend an intensive, six-week-long residential summer program featuring courses for academic credit. Students randomized to the control group either interacted with online content offering no academic credit or had no summer intervention. Our preregistered analysis shows that the program encouraged FGLI students to pursue a more ambitious first-year program, increasing the proportion of nonintroductory courses by 7 percentage points. The program also increased the proportion of courses taken for a grade rather than as pass-fail by 6 percentage points. These improvements were accompanied by no discernible impact on first-year grade point averages (GPAs) and academic withdrawal. The findings show the potential to academically integrate FGLI students into selective university communities.


Longitudinal Causal Inference with Selective Eligibility

October 2024

·

3 Reads

·

Eli Ben-Michael

·

·

[...]

·

Kosuke Imai

Dropout often threatens the validity of causal inference in longitudinal studies. While existing studies have focused on the problem of missing outcomes caused by treatment, we study an important but overlooked source of dropout, selective eligibility. For example, patients may become ineligible for subsequent treatments due to severe side effects or complete recovery. Selective eligibility differs from the problem of ``truncation by death'' because dropout occurs after observing the outcome but before receiving the subsequent treatment. This difference makes the standard approach to dropout inapplicable. We propose a general methodological framework for longitudinal causal inference with selective eligibility. By focusing on subgroups of units who would become eligible for treatment given a specific treatment history, we define the time-specific eligible treatment effect (ETE) and expected number of outcome events (EOE) under a treatment sequence of interest. Assuming a generalized version of sequential ignorability, we derive two nonparametric identification formulae, each leveraging different parts of the observed data distribution. We then derive the efficient influence function of each causal estimand, yielding the corresponding doubly robust estimator. Finally, we apply the proposed methodology to an impact evaluation of a pre-trial risk assessment instrument in the criminal justice system, in which selective eligibility arises due to recidivism.


geocausal: An R Package for Spatio-Temporal Causal Inference

August 2024

·

6 Reads

Scholars from diverse fields now use highly disaggregated ("microlevel") data with fine-grained spatial (e.g., locations of villages and individuals) and temporal (days, hours, or even seconds) dimensions to test their theories. Despite the proliferation of these data, however, statistical methods for causal inference with spatio-temporal data remain underdeveloped. We introduce an R package, geocausal, that enables researchers to implement causal inference methods for highly disaggregated spatio-temporal data. The geocausal package implements two necessary steps for spatio-temporal causal inference: (1) preparing the data and (2) estimating causal effects. The geocausal package allows users to effectively use fine-grained spatio-temporal data, test counterfactual scenarios that have spatial and temporal dimensions, and visualize each step efficiently. We illustrate the capabilities of the geocausal package by analyzing the US airstrikes and insurgent attacks in Iraq over various spatial and temporal windows.


Illustration of PAPE for two different ITRs f f and g g . Here, the x x axis is the proportion of individuals treated, and y y axis is PAV. The PAV of f f is higher than PAV of g g , but ITR g g has a positive PAPE τ g {\tau }_{g} and the ITR f f has a negative PAPE τ f {\tau }_{f} .
Numerical experiments: (a) Empirical standard error of PAV estimator as a function of constant shift in potential outcomes. δ = 0 \delta =0 minimizes the standard error of PAV and (b) comparison of empirical standard error of the ex-ante and ex-post PAPE estimators ( y y -axis) for various sample sizes ( x x -axis).
Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
  • Article
  • Full-text available

August 2024

·

17 Reads

Journal of Causal Inference

A century ago, Neyman showed how to evaluate the efficacy of treatment using a randomized experiment under a minimal set of assumptions. This classical repeated sampling framework serves as a basis of routine experimental analyses conducted by today’s scientists across disciplines. In this article, we demonstrate that Neyman’s methodology can also be used to experimentally evaluate the efficacy of individualized treatment rules (ITRs), which are derived by modern causal machine learning (ML) algorithms. In particular, we show how to account for additional uncertainty resulting from a training process based on cross-fitting. The primary advantage of Neyman’s approach is that it can be applied to any ITR regardless of the properties of ML algorithms that are used to derive the ITR. We also show, somewhat surprisingly, that for certain metrics, it is more efficient to conduct this ex-post experimental evaluation of an ITR than to conduct an ex-ante experimental evaluation that randomly assigns some units to the ITR. Our analysis demonstrates that Neyman’s repeated sampling framework is as relevant for causal inference today as it has been since its inception.

Download

Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors

July 2024

·

2 Reads

Political actors frequently manipulate redistricting plans to gain electoral advantages, a process commonly known as gerrymandering. To address this problem, several states have implemented institutional reforms including the establishment of map-drawing commissions. It is difficult to assess the impact of such reforms because each state structures bundles of complex rules in different ways. We propose to model redistricting processes as a sequential game. The equilibrium solution to the game summarizes multi-step institutional interactions as a single dimensional score. This score measures the leeway political actors have over the partisan lean of the final plan. Using a differences-in-differences design, we demonstrate that reforms reduce partisan bias and increase competitiveness when they constrain partisan actors. We perform a counterfactual policy analysis to estimate the partisan effects of enacting recent institutional reforms nationwide. We find that instituting redistricting commissions generally reduces the current Republican advantage, but Michigan-style reforms would yield a much greater pro-Democratic effect than types of redistricting commissions adopted in Ohio and New York.




Population summary statistics for Census geographies. Summaries across populated geographic units studied in this paper. Blocks are nested in block groups, which are nested in tracts. Place stands for census designated Place. For example, the median census Block is populated by 23 people.
Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods

May 2024

·

12 Reads

·

9 Citations

Science Advances

The U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct an independent evaluation of bias and noise induced by the Bureau’s two main disclosure avoidance systems: the TopDown algorithm used for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measurement File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful without measurement error modeling, especially for Hispanic and multiracial populations. TopDown’s postprocessing reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.


A Summer Bridge Program for First-Generation Low-Income Students Stretches Academic Ambitions with No Adverse Impacts on GPA

March 2024

·

18 Reads

·

1 Citation

A large body of research documents the barriers faced by first-generation, low-income (FGLI) students as ``hidden minorities'' on elite college campuses. Although existing studies show brief psychological interventions can help mitigate some of these obstacles, universities are investing in more intensive interventions that try to both shift mindsets and mitigate structural disadvantages in FGLI students' academic preparation. In collaboration with the administrators at a highly selective university, we conducted the first randomized controlled trial of a summer bridge program targeted at FGLI students. During summers between 2017 and 2019, we randomly selected 232 out of 418 first-generation or low-income students and invited them to attend an intensive, six-week long residential summer program featuring courses for academic credit. Students randomized to the control group either interacted with online content offering no academic credit or had no summer intervention. Our pre-registered analysis shows that the program encouraged FGLI students to pursue a more ambitious first-year program, increasing the proportion of non-introductory courses by 7 percentage points. The program also increased the proportion of courses taken for a grade rather than as pass-fail by 6 percentage points. These improvements were accompanied by no discernible impact on first-year GPAs and academic withdrawal. The findings show the ability to academically integrate FGLI students into selective university communities.



Citations (55)


... We further demonstrated how Neyman's repeated-sampling framework can highlight the difference between the ex-ante evaluation and ex-post evaluation of ITRs by showing that the ex-post evaluation is statistically more efficient. Our ongoing work also applies this framework to the estimation of heterogeneous treatment effects discovered by ML algorithms [29]. Altogether, we have shown that a century after his original proposal, Neyman's analytical framework remains relevant and is widely applicable to the evaluation of today's causal ML methods. ...

Reference:

Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments*
  • Citing Article
  • May 2024

Journal of Business and Economic Statistics

... I still opt to use demographics of minors, rather than the total population (which is less likely to be suppressed), due to the increasing diversity of younger US generations over time 68 . Similar decisions about suppression and small group estimates will continue to be relevant as the Census Bureau evolves their use of new privacy techniques [69][70][71][72] . Finally, I multiply the share of third-grade students belonging to a particular racial / ethnic group in that school by the fraction of children belonging to that group who lives in a particular Census block. ...

Evaluating bias and noise induced by the U.S. Census Bureau's privacy protection methods

Science Advances

... Attribute inference is employed widely across various domains as part of assessing racial disparities and enforcing civil rights laws [1,9,11,23,57], despite their known misclassification of a significant proportion of individuals [21]. While approaches to correct inference error have been developed [23,38,40,58], they do not directly apply in our setting of black-box audits of ad delivery algorithms due to two constraints. First, correction methods assume inferred attributes or inference probabilities of individuals are directly accessible at the time of model evaluation. ...

Estimating Racial Disparities When Race is Not Observed
  • Citing Article
  • January 2024

SSRN Electronic Journal

... One class of methods is based on the model-X framework (Candès et al., 2018), which includes model-X knockoffs (Candès et al., 2018) and the holdout randomization test (HRT; Tansey et al., 2022). These methods were developed under the assumption that LpXq is known, which arguably is too strong an assumption except in special cases where this law is under the control of the experimenter (Ham, Imai, and Janson, 2022;Aufiero and Janson, 2022). These methods are often deployed by fitting LpXq in-sample, but general conditions under which the Type-I error is controlled in this context are not known, though some progress in this direction has been made (Fan et al., 2019;Fan, Gao, and Lv, 2023). ...

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis
  • Citing Article
  • February 2024

Political Analysis

... For different settings with point-identified welfare, finite-and large-sample results on optimal treatment choice rules were derived by Canner (1970), Chen and Guggenberger (2024), Hirano andPorter (2009, 2020), Kitagawa, Lee, and Qiu (2022), Schlag (2006), Stoye (2009), andTetenov (2012b). There is also a large literature on optimal policy learning with covariates containing results with point identified (Bhattacharya and Dupas, 2012;Kitagawa andTetenov, 2018, 2021;Mbakop and Tabord-Meehan, 2021;Kitagawa and Wang, 2023;Athey and Wager, 2021;Kitagawa, Sakaguchi, and Tetenov, 2021;Ida, Ishihara, Ito, Kido, Kitagawa, Sakaguchi, and Sasaki, 2022) as well as partially identified (Kallus and Zhou, 2018;Ben-Michael, Greiner, Imai, and Jiang, 2021;Ben-Michael, Imai, and Jiang, 2022;D'Adamo, 2021;Christensen et al., 2022;Adjaho and Christensen, 2022;Kido, 2022;Lei, Sahoo, and Wager, 2023) parameters. Guggenberger, Mehta, and Pavlov (2024) and Manski and Tetenov (2023) analyze related problems but focus on quantile, as opposed to expected, loss; Song (2014) considers partial identification but mean squared error regret loss. ...

Policy Learning with Asymmetric Counterfactual Utilities*
  • Citing Article
  • January 2024

... Stephanopoulos 2021 Recent approaches also do a better job of preserving political subdivisions like counties (Autry et al. 2021, McCartan and Imai 2023, Clelland et al. 2022) and drawing majority-minority or minority-opportunity districts (Cannon et al. 2023, Becker et al. 2021). ...

Sequential Monte Carlo for sampling balanced and compact redistricting plans
  • Citing Article
  • December 2023

The Annals of Applied Statistics

... Substantively, our results suggest that election competition may act as a constraint on politicians from sharing ideologically extreme news media. Institutional and judicial efforts to create more electoral competition (e.g., by overturning heavily gerrymandered districts, Kenny et al. 2023) may thus have important indirect implications for the state of the polarized online information ecosystem. ...

Widespread partisan gerrymandering mostly cancels nationally, but reduces electoral competition

Proceedings of the National Academy of Sciences

... However, the Bureau released the PPMF of the Redistricting data instead to ensure that the final tabulated statistics met data consistency requirements (such as non-negative and integral values for population counts). This post-processing decision created substantial discontent among Census data users and privacy experts alike (Dwork et al., 2021b;McCartan et al., 2023) as it increased the possibility of introducing biases in the released statistics. More significantly, not only were these biases challenging to correct, but future policy decisions based on these data were found to have significantly harmful effects. ...

Researchers need better access to US Census data
  • Citing Article
  • June 2023

Science

... We limit our study to first (given) names, drawn from two datasets: Rosenman et al. (2023) and Tzioumis (2018). The former contains 136,000 first names compiled form voter registration files while the latter contains 4,250 first names compiled mortgage information in the United States. ...

Race and ethnicity data for first, middle, and surnames

Scientific Data