Nathan Kallus’s research while affiliated with Cornell University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (156)


Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands
  • Preprint
  • File available

January 2025

·

4 Reads

·

·

Nathan Kallus

·

Alex Luedtke

We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating debiased estimation and inference procedures in causal inference and semiparametric statistics, our framework enables practitioners to construct valid estimators for complex parameters without requiring specialized expertise. The framework supports Neyman-orthogonal loss functions with unknown nuisance parameters requiring data-driven estimation, as well as vector-valued M-estimands involving simultaneous loss minimization across multiple Hilbert space models. We formalize the class of parameters efficiently estimable by autoDML as a novel class of nonparametric projection parameters, defined via orthogonal minimum loss objectives. We introduce three autoDML estimators based on one-step estimation, targeted minimum loss-based estimation, and the method of sieves. For data-driven model selection, we derive a novel decomposition of model approximation error for smooth functionals of M-estimands and propose adaptive debiased machine learning estimators that are superefficient and adaptive to the functional form of the M-estimand. Finally, we illustrate the flexibility of our framework by constructing autoDML estimators for the long-term survival under a beta-geometric model.

Download

Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference

January 2025

·

1 Read

·

·

Allen Tran

·

[...]

·

Double reinforcement learning (DRL) enables statistically efficient inference on the value of a policy in a nonparametric Markov Decision Process (MDP) given trajectories generated by another policy. However, this approach necessarily requires stringent overlap between the state distributions, which is often violated in practice. To relax this requirement and extend DRL, we study efficient inference on linear functionals of the Q-function (of which policy value is a special case) in infinite-horizon, time-invariant MDPs under semiparametric restrictions on the Q-function. These restrictions can reduce the overlap requirement and lower the efficiency bound, yielding more precise estimates. As an important example, we study the evaluation of long-term value under domain adaptation, given a few short trajectories from the new domain and restrictions on the difference between the domains. This can be used for long-term causal inference. Our method combines flexible estimates of the Q-function and the Riesz representer of the functional of interest (e.g., the stationary state density ratio for policy value) and is automatic in that we do not need to know the form of the latter - only the functional we care about. To address potential model misspecification bias, we extend the adaptive debiased machine learning (ADML) framework of \citet{van2023adaptive} to construct nonparametrically valid and superefficient estimators that adapt to the functional form of the Q-function. As a special case, we propose a novel adaptive debiased plug-in estimator that uses isotonic-calibrated fitted Q-iteration - a new calibration algorithm for MDPs - to circumvent the computational challenges of estimating debiasing nuisances from min-max objectives.


KS-constrained conformal prediction (KS-CP)
Conformal prediction
Coverage under synthetic data (Setting I) with linear regression, 1-α=90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha =90\%$$\end{document}. Here we show the conditional coverage for each method. Our method can achieve the specified conditional coverage while all other methods have significantly lower conditional coverage
WSLAB for UCI datasets with residual score across α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}. Our method consistently improves WSLAB among all α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}
Ablation studies for different choices of λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda$$\end{document} for synthetic data setup I. We report the marginal coverage (MC), conditional coverage (CC), set size, and MSE for log(λ)=-1,0,1,2,3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\log (\lambda ) = -1,0,1,2,3$$\end{document} when 1-α=90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\alpha =90\%$$\end{document}

+3

Adjusting regression models for conditional uncertainty calibration

Machine Learning

Conformal Prediction methods have finite-sample distributional-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an upper bound for the miscoverage gap between the conditional coverage and the nominal coverage rate and propose an end-to-end algorithm to control this upper bound. We demonstrate the efficacy of our method empirically on synthetic and real-world datasets.


On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

October 2024

·

23 Reads

·

29 Citations

Journal of the Royal Statistical Society Series B (Statistical Methodology)

In many experimental and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome with abundant observations of surrogate outcomes, without any assumptions beyond unconfounded treatment assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying long-term-earning effects of job training.


Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits

October 2024

In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible. We show that GAI can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel nonparametric anytime-valid sequential test for labeling arm means. We first establish that our sequential test maintains error control under highly nonparametric assumptions and asymptotically achieves the minimax optimal e-power, a notion of power for anytime-valid tests. Next, by pairing regret-minimizing sampling schemes with our sequential test, we provide an approach that achieves minimax optimal stopping times for labeling arms with means above a threshold, under an error probability constraint. Our empirical results validate our approach beyond the minimax setting, reducing the expected number of samples for all stopping times by at least 50% across both synthetic and real-world settings.


Anytime-Valid Continuous-Time Confidence Processes for Inhomogeneous Poisson Processes

October 2024

Motivated by monitoring the arrival of incoming adverse events such as customer support calls or crash reports from users exposed to an experimental product change, we consider sequential hypothesis testing of continuous-time inhomogeneous Poisson point processes. Specifically, we provide an interval-valued confidence process Cα(t)C^\alpha(t) over continuous time t for the cumulative arrival rate Λ(t)=0tλ(s)ds\Lambda(t) = \int_0^t \lambda(s) \mathrm{d}s with a continuous-time anytime-valid coverage guarantee P[Λ(t)Cα(t)t>0]1α\mathbb{P}[\Lambda(t) \in C^\alpha(t) \, \forall t >0] \geq 1-\alpha. We extend our results to compare two independent arrival processes by constructing multivariate confidence processes and a closed-form e-process for testing the equality of rates with a time-uniform Type-I error guarantee at a nominal α\alpha. We characterize the asymptotic growth rate of the proposed e-process under the alternative and show that it has power 1 when the average rates of the two Poisson process differ in the limit. We also observe a complementary relationship between our multivariate confidence process and the universal inference e-process for testing composite null hypotheses.



Long-term causal inference under persistent confounding via data combination

October 2024

·

9 Reads

·

21 Citations

Journal of the Royal Statistical Society Series B (Statistical Methodology)

We study the identification and estimation of long-term treatment effects by combining short-term experimental data and long-term observational data subject to unobserved confounding. This problem arises often when concerned with long-term treatment effects since experiments are often short-term due to operational necessity while observational data can be more easily collected over longer time frames but may be subject to confounding. In this paper, we tackle the challenge of persistent confounding: unobserved confounders that can simultaneously affect the treatment, short-term outcomes, and long-term outcome. In particular, persistent confounding invalidates identification strategies in previous approaches to this problem. To address this challenge, we exploit the sequential structure of multiple short-term outcomes and develop several novel identification strategies for the average long-term treatment effect. Based on these, we develop estimation and inference methods with asymptotic guarantees. To demonstrate the importance of handling persistent confounders, we apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data.


Fig. F3: Worst Conditional Coverage vs log 10 (γ). The conditional miscoverage rate is relatively stable for γ between 3 and 50.
Quantitative results for each method with synthetic data. 1 − α = 90%.
Adjusting Regression Models for Conditional Uncertainty Calibration

September 2024

·

11 Reads

Conformal Prediction methods have finite-sample distribution-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an upper bound for the miscoverage gap between the conditional coverage and the nominal coverage rate and propose an end-to-end algorithm to control this upper bound. We demonstrate the efficacy of our method empirically on synthetic and real-world datasets.


The Central Role of the Loss Function in Reinforcement Learning

September 2024

·

5 Reads

This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algorithms using the binary cross-entropy loss achieve first-order bounds scaling with the optimal policy's cost and are much more efficient than the commonly used squared loss. Moreover, we prove that distributional algorithms using the maximum likelihood loss achieve second-order bounds scaling with the policy variance and are even sharper than first-order bounds. This in particular proves the benefits of distributional RL. We hope that this paper serves as a guide analyzing decision making algorithms with varying loss functions, and can inspire the reader to seek out better loss functions to improve any decision making algorithm.


Citations (34)


... Athey et al. [2019] study the effect of the Greater Avenues to Independence (GAIN) job training program on long-term labor market outcomes in California, surrogated by short-term employment and earnings. Kallus and Mao [2024] discuss an example of using digital ad clicks, which are available for all users, to surrogate visitations to brick-and-mortar stores, which are only observable for those who agree to share cellphone geolocation data. Recently, tech firms have leveraged online experiments run for two weeks or less as surrogates for the purpose of understanding the long-term effects of a newly launched feature [Athey et al., 2019, Gupta et al., 2019, Zhang et al., 2023, Tran et al., 2023. ...

Reference:

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI
On the role of surrogates in the efficient estimation of treatment effects with limited outcome data
  • Citing Article
  • October 2024

Journal of the Royal Statistical Society Series B (Statistical Methodology)

... Since the tools in each paper are often domain-specific and difficult to apply in our work, we perform ablation tests to show the necessity of a large number of tools, which distinguishes us from existing methods. RAG-based approaches that retrieve similar queries from the training corpus [48] are also unsuitable for our setting due to a small volume of available queries (which we entirely use for evaluation). While some works fine-tune LLMs with traditional user-item data or synthetic queries [15,54], we do not consider them as baselines since we want to incorporate external knowledge without the cost of fine-tuning. ...

Neighborhood-Based Collaborative Filtering for Conversational Recommendation
  • Citing Conference Paper
  • October 2024

... However, in practical, long-term results typically only become apparent after a considerable delay, rendering the task of estimating long-term effects arduous. Fortunately, short-term outcomes which often serve as surrogates, can be observed through short-term experiments, enabling the estimation of long-term outcomes and by extension, long-term causal effects using surrogates [3][4][5][6][7][8]. ...

Long-term causal inference under persistent confounding via data combination
  • Citing Article
  • October 2024

Journal of the Royal Statistical Society Series B (Statistical Methodology)

... This motivates the two-stage least squares (2SLS) approach of estimating β by ordinary least squares (OLS) of Y on the "first-stage" OLS prediction of S given A (for discrete A this is simply the sample means of S for each A value). However, when n ≪ N , even as N → ∞ this can incur non-vanishing bias because the first-stage regression may not converge at all [Angrist et al., 1999, Bibaut et al., 2024, Peysakhovich and Eckles, 2018. JIVE [Angrist et al., 1999] addresses this by regressing Y on a prediction of S given A based on OLS using all the data except the datapoint on which we make the prediction. ...

Learning the Covariance of Treatment Effects Across Many Weak Experiments
  • Citing Conference Paper
  • August 2024

... Surrounded by various similarity measures, cosine similarity and its counterpart, cosine distance, have emerged as popular metrics due to their focus on the angular relation-ship between vectors, independent of magnitude. This attribute has made cosine-based measures highly effective in applications such as document retrieval, text classification, and embedding evaluations [84,85]. ...

Is Cosine-Similarity of Embeddings Really About Similarity?
  • Citing Conference Paper
  • May 2024

... This might seem small compared to real-world applications, but existing methods cannot even handle this due to their variance. We believe OPE/L for CCB with large unique action spaces, potentially leveraging structure in A as studied by [18,36,40,42,44], would be an interesting future topic. ...

Off-Policy Evaluation for Large Action Spaces via Policy Convolution
  • Citing Conference Paper
  • May 2024

... Zhao, Small, and Bhattacharya (2019) derived non-sharp bounds on the average potential outcome E(Y x ) and the average treatment effect (ATE) under the MSM, but also did not derive closed form expressions for these bounds. Dorn and Guo (2023) strengthened that result by deriving sharp bounds on E(Y x ), ATE, and the average effect of treatment on the treated (ATT) under the MSM, but again without closed form expressions. Dorn, Guo, and Kallus (2024) subsequently refined that result by obtaining closed form expressions for sharp bounds on E(Y x ) and ATE under the MSM, in addition to developing the concept of double-validity and double sharpness. Tan (2024) gives alternative sharp bound expressions for the ATE under the MSM. ...

Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding
  • Citing Article
  • April 2024

... We do make a margin assumption to relate convergence of Q-function contrasts to policy value convergence, analogous to (Shi et al., 2022). (Hu et al., 2024) studies consequences of the margin assumption for fitted-Q-iteration with a tighter analysis. Our approach is better suited for settings with highly structured difference-of-Qs since we introduce auxiliary estimation at every timestep. ...

Fast Rates for the Regret of Offline Reinforcement Learning
  • Citing Article
  • March 2024

Mathematics of Operations Research

... Existing works in zero-shot recommendation [5,7,12,36] leverage various forms of metadata-such as textual descriptions, images, and item popularity-as mediums to transfer knowledge from source domains to unseen target domains, thereby enhancing model adaptability and generalization. The rise of large language models (LLMs) has further advanced this approach, offering powerful tools for capturing rich item semantics and user intent through pretrained language representations. ...

Large Language Models as Zero-Shot Conversational Recommenders
  • Citing Conference Paper
  • October 2023