Ryan J. Tibshirani’s research while affiliated with University of California, Berkeley and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (144)


Gradient Equilibrium in Online Learning: Theory and Applications
  • Preprint

January 2025

·

16 Reads

Anastasios N. Angelopoulos

·

·

Ryan J. Tibshirani

We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by nor implies sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.


Figure 4: Convolutional and lagged ratios over various simulation settings, with three different underlying HFR curves (rows) and two delays (columns).
Figure 5: Comparing lagged ratios based on data from JHU versus NCHS.
Figure 6: Comparing methods for approximating ground truth HFR.
Figure 7: Comparing estimates based on real-time versus finalized counts.
Challenges in Estimating Time-Varying Epidemic Severity Rates from Aggregate Data
  • Preprint
  • File available

December 2024

·

7 Reads

Jeremy Goldwasser

·

Addison J. Hu

·

Alyssa Bilinski

·

[...]

·

Ryan J. Tibshirani

Severity rates like the case-fatality rate and infection-fatality rate are key metrics in public health. To guide decision-making in response to changes like new variants or vaccines, it is imperative to understand how these rates shift in real time. In practice, time-varying severity rates are typically estimated using a ratio of aggregate counts. We demonstrate that these estimators are capable of exhibiting large statistical biases, with concerning implications for public health practice, as they may fail to detect heightened risks or falsely signal nonexistent surges. We supplement our mathematical analyses with experimental results on real and simulated COVID-19 data. Finally, we briefly discuss strategies to mitigate this bias, drawing connections with effective reproduction number (Rt) estimation.

Download


Figure 5: Choropleth maps of the state-level estimates of the daily new infections per 100K (top row) and the daily new cases per 100K (bottom row) for five select dates between June 1, 2020 and November 29, 2021. Note that the first date was chosen as a baseline, while the other dates were chosen because they present large counts of infections across all states. In particular, the third and fifth dates present the largest number of total infections across the 50 states within those calendar years.
Incident COVID-19 infections before Omicron in the US

October 2024

·

8 Reads

The timing and magnitude of COVID-19 infections are of interest to the public and to public health, but these are challenging to ascertain due to the volume of undetected asymptomatic cases and reporting delays. Accurate estimates of COVID-19 infections based on finalized data can improve understanding of the pandemic and provide more meaningful quantification of disease patterns and burden. Therefore, we retrospectively estimate daily incident infections for each U.S. state prior to Omicron. To this end, reported COVID-19 cases are deconvolved to their date of infection onset using delay distributions estimated from the CDC line list. Then, a novel serology-driven model is used to scale these deconvolved cases to account for the unreported infections. The resulting infections incorporate variant-specific incubation periods, reinfections, and waning antigenic immunity. They clearly demonstrate that the reported cases fail to reflect the full extent of disease burden in all states. Most notably, infections were severely underreported during the Delta wave, with an estimated reporting rate as low as 6.3% in New Jersey, 7.3% in Maryland, and 8.4% in Nevada. Moreover, in 44 states, fewer than 1/3 of infections appear as cases reports. Therefore, while reported cases offer a convenient proxy of disease burden, they fail to capture the full extent of infections, and can severely underestimate the true disease burden. This retrospective analysis also estimates other important quantities for every state, including variant-specific deconvolved cases, time-varying case ascertainment ratios, and infection-hospitalization ratios.


Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning

October 2024

·

7 Reads

Common practice in modern machine learning involves fitting a large number of parameters relative to the number of observations. These overparameterized models can exhibit surprising generalization behavior, e.g., ``double descent'' in the prediction error curve when plotted against the raw number of model parameters, or another simplistic notion of complexity. In this paper, we revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. Whereas the classical definition is connected to fixed-X prediction error (in which prediction error is defined by averaging over the same, nonrandom covariate points as those used during training), our extension of degrees of freedom is connected to random-X prediction error (in which prediction error is averaged over a new, random sample from the covariate distribution). The random-X setting more naturally embodies modern machine learning problems, where highly complex models, even those complex enough to interpolate the training data, can still lead to desirable generalization performance under appropriate conditions. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments, and illustrate how they can be used to interpret and compare arbitrary prediction models.


Two-Sample Testing with a Graph-Based Total Variation Integral Probability Metric

September 2024

·

17 Reads

We consider a novel multivariate nonparametric two-sample testing problem where, under the alternative, distributions P and Q are separated in an integral probability metric over functions of bounded total variation (TV IPM). We propose a new test, the graph TV test, which uses a graph-based approximation to the TV IPM as its test statistic. We show that this test, computed with an ε\varepsilon-neighborhood graph and calibrated by permutation, is minimax rate-optimal for detecting alternatives separated in the TV IPM. As an important special case, we show that this implies the graph TV test is optimal for detecting spatially localized alternatives, whereas the χ2\chi^2 test is provably suboptimal. Our theory is supported with numerical experiments on simulated and real data.


National incident weekly hospital admissions and select forecasts
National weekly observed hospitalizations (black points) along with FluSight ensemble forecasts for four weeks of submissions in the 2021–22 season (a) and seven weeks of submissions in the 2022-23 season (b). The median FluSight ensemble forecast values (blue points) are shown with the corresponding 50%, 80%, and 95% prediction intervals (blue shaded regions). c–e Show national incident weekly hospital admissions (black points) from the 2022-23 season and predictions from all models submitted on November 11, 2022 (c), December 05, 2022 (d) and February 27, 2023 (e). Colored bands indicate 95% prediction intervals for each model. Team forecasts for additional weeks are available in an interactive dashboard¹².
Standardized rank by season
Standardized rank of weighted interval score (WIS) over all forecast jurisdictions and horizons (1- to 4-week ahead), for the FluSight ensemble and each team submitting at least 75% of the forecast targets (see Table 1 for qualifying teams and season metrics) for the 2021–22 (a) and 2022–23 (b) seasons.
Relative WIS by state and model. State-level WIS values for each team relative to the FluSight baseline model
The range of Relative WIS values below 1, in blue, indicate better performance than the FluSight baseline (white). Relative WIS values above 1, in red, indicate poor performance relative to the FluSight baseline. Teams are ordered on horizontal axis from lowest to highest Relative WIS values for each season, 2021–22 (a) and 2022–23 (b). Analogous jurisdiction-specific relative WIS scores on log transformed counts are displayed in Supplementary Fig. 7.
WIS by model
Time series of log transformed absolute WIS for state and territory targets. Note that the forecast evaluation period translates to 1-week ahead forecast target end dates from February 26–June 25, 2022 (a), and October 22, 2022, to May 20, 2023 (b), and 4-week ahead forecast target end dates from March 19–July 16, 2022 (c), and November 5, 2022–June 10, 2023 (d). Weekly results for the FluSight baseline and ensemble models are shown in red and blue respectively. Results for individual contributing models are shown in light gray.
Coverage by model
1 and 4-week ahead 95% coverage for state and territory targets. Note that the forecast evaluation period translates to 1-week ahead forecast target end dates from February 26–June 25, 2022 (a), and October 22, 2022–May 20, 2023 (b), and 4-week ahead forecast target end dates from March 19–July 16, 2022 (c), and November 5, 2022–June 10, 2023 (d). Weekly results for the FluSight baseline and ensemble models are shown in red and blue, respectively. Results for individual contributing models are shown in light gray.
Evaluation of FluSight influenza forecasting in the 2021–22 and 2022–23 seasons with a new target laboratory-confirmed influenza hospitalizations

July 2024

·

243 Reads

·

9 Citations

Accurate forecasts can enable more effective public health responses during seasonal influenza epidemics. For the 2021–22 and 2022–23 influenza seasons, 26 forecasting teams provided national and jurisdiction-specific probabilistic predictions of weekly confirmed influenza hospital admissions for one-to-four weeks ahead. Forecast skill is evaluated using the Weighted Interval Score (WIS), relative WIS, and coverage. Six out of 23 models outperform the baseline model across forecast weeks and locations in 2021–22 and 12 out of 18 models in 2022–23. Averaging across all forecast targets, the FluSight ensemble is the 2nd most accurate model measured by WIS in 2021–22 and the 5th most accurate in the 2022–23 season. Forecast skill and 95% coverage for the FluSight ensemble and most component models degrade over longer forecast horizons. In this work we demonstrate that while the FluSight ensemble was a robust predictor, even ensembles face challenges during periods of rapid change.




Figure 5: Nowcasts from the state-level and mixed models in scenario 1, the monthly-update period, for NY. The dotted vertical lines mark the observation boundaries, as in Figure 4.
Figure 6: MAE as a function of backcast lag for the state-level, geo-pooled, and mixed models in scenario 2, the no-update period. The shaded regions show ±1 standard error bands, over the state MAE values.
Nowcasting Reported COVID-19 Hospitalizations Using De-Identified, Aggregated Medical Insurance Claims Data

December 2023

·

16 Reads

We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds under a hypothetical scenario in which, during the Delta wave, states only report data on the first day of each month, and on this day, report COVID-19 hospitalization counts for each day in the previous month. In this hypothetical scenario (just as in reality), medical insurance claims data continues to be available daily. At the beginning of each month, we train a regression model, using all data available thus far, to predict hospitalization counts from medical insurance claims. We then use this model to nowcast the (unseen) values of COVID-19 hospitalization counts from medical insurance claims, at each day in the following month. Our analysis uses properly-versioned data, which would have been available in real-time, at the time predictions are produced. In spite of the difficulties inherent to real-time estimation (e.g., latency and backfill) and the complex dynamics behind COVID-19 hospitalizations themselves, we find overall that medical insurance claims can be an accurate predictor of hospitalization reports, with mean absolute errors typically around 0.4 hospitalizations per 100,000 people, i.e., proportion of variance explained around 75%. Perhaps more importantly, we find that nowcasts made using medical insurance claims can qualitatively capture the dynamics (upswings and downswings) of hospitalization waves, which are key features that inform public health decision-making.


Citations (47)


... Our proposal is inspired by several recently introduced randomized methods that provide alternatives to traditional sample splitting for tasks such as model validation, selective inference, and risk estimation. These alternatives include data-fission and data-thinning techniques by Rasines and Young (2023); Leiner et al. (2023);Neufeld et al. (2024); Dharamshi et al. (2024), methods employing Gaussian randomization for selective inference tasks, as considered in Dai et al. (2023); Tian and Taylor (2018); Panigrahi and Taylor (2022); Huang et al. (2023), and randomized methods by Oliveira et al. (2021Oliveira et al. ( , 2022; Fry and Taylor (2023) Leave-one-out CV (K = n) Figure 1: Distribution of the estimated mean squared prediction error for isotonic regression on a simulated data set, comparing our method (left) with K = 2 train-test repetitions to LOO cross-validation (right). The dashed black line is at the true mean squared prediction error. ...

Reference:

Cross-Validation with Antithetic Gaussian Randomization
Unbiased risk estimation in the normal means problem via coupled bootstrap techniques
  • Citing Article
  • January 2024

Electronic Journal of Statistics

... Diaconis and Friedman observed that versions of this claim hold in a number of different models [DF87], and precise quantitative versions have been established for finite exchangeable sequences under different assumptions [Sta78,DF80,GK21,JGK24]. Recently, motivated by applications in conformal prediction (see, e.g., [TBCR19,BCRT23a]), similar results have been obtained for "weighted" exchangeable sequences as well [BCRT23b]. ...

De Finetti’s theorem and related results for infinite weighted exchangeable sequences
  • Citing Article
  • November 2024

Bernoulli

... Despite these advancements, significant gaps remain between current CP methods and the demands of real-world applications. For instance, the epidemic forecasting initiatives hosted by the US Centers for Disease Control and Prevention (CDC) in response to diseases such as Ebola (Viboud et al. 2018;Johansson et al. 2019), influenza (Reich et al. 2019;Mathis et al. 2024), and COVID-19 (Cramer et al. 2022) highlight these challenges. Because these predictions inform critical policymaking decisions, the CDC requires forecasters to provide a comprehensive view of future possibilities. ...

Evaluation of FluSight influenza forecasting in the 2021–22 and 2022–23 seasons with a new target laboratory-confirmed influenza hospitalizations

... This is at the heart of why small (i.e., zero or even negative) values of λ can lead to favorable prediction accuracy in the overparameterized regime. Let us define µ min , as in LeJeune et al. (2024), to be the unique solution that satisfies µ min ą´r min to the fixed point equation: ...

Asymptotics of the Sketched Pseudoinverse
  • Citing Article
  • March 2024

SIAM Journal on Mathematics of Data Science

... Consortia of modelers are dedicating a great effort in projecting seasonal influenza epidemics in the short and long term. In the context of these initiatives, a wide range of mechanistic and statistical models are being combined in ensemble modelling to forecasting influenza incidence weeks ahead or to identify possible evolution scenarios months ahead (53)(54)(55)(56)(57). This provides critical information for response planning. ...

Evaluation of FluSight influenza forecasting in the 2021-22 and 2022-23 seasons with a new target laboratory-confirmed influenza hospitalizations

... Due to the ill-posedness of (1), nonparametric techniques in subcritical Euclidean Sobolev spaces typically wholly discard a continuous analysis, and instead favor a discrete approach [21,24,25] that is grounded in the spectral analysis of a graph Laplacian constructed on the data points. While such approaches, including Laplacian smoothing [24] and PCR [25], have gained popularity due to their tractable implementation, the resulting graph estimators are only defined on the data sample and critically do not provide out-of-sample generalization. ...

Minimax optimal regression over Sobolev spaces via Laplacian Eigenmaps on neighbourhood graphs
  • Citing Article
  • September 2023

Information and Inference A Journal of the IMA

... There is an extensive literature on hypothesis tests for symmetry in the form of invariance under a group of transformations, dating back at least to the work of Hoeffding (1952), which generalized older ideas based on testing permutation-invariance (Fisher, 1935;Pitman, 1937). More recent developments in the area pertain to randomized versions of tests for invariance (Romano, 1988(Romano, , 1989Goeman, 2017, 2018;Dobriban, 2022;Ramdas et al., 2023;Koning and Hemerik, 2024) or approximate invariance (Canay, Romano and Shaikh, 2017); see Ritzwoller, Romano and Shaikh (2024) for a recent review. One takeaway from that line of work is that group-based randomization tests have finite-sample Type I error control (Hoeffding, 1952), power against alternatives (Romano, 1989;Dobriban, 2022), and achieve various notions of optimality and robustness (Romano, 1990;Kashlak, 2022;Koning and Hemerik, 2024). ...

Permutation Tests Using Arbitrary Permutation Distributions

Sankhya A

... Unlike traditional confidence intervals, conformal prediction enables adaptive and valid uncertainty quantification based on the specific data at hand. Split conformal prediction, in particular, enhances this framework by leveraging a calibration dataset to derive confidence intervals [38]. ...

Conformal prediction beyond exchangeability
  • Citing Article
  • April 2023

The Annals of Statistics

... Kuleshov and Deshpande [18] introduced calibrated risk minimization as a principle that maximizes sharpness subject to calibration by adding calibration loss as a constraint in the loss function. Rumack et al. [30] presented a post-processing method called the recalibration ensemble that combines and calibrates forecasts in separate steps and applied this method to recalibrating epidemic forecasts. ...

Recalibrating probabilistic forecasts of epidemics

... These data repositories featured daily updates as well as corrections due to changes in data aggregations. It is worth noting that several researchers have studied the reliability of COVID-19 data sources, including the JHU data repository [38][39][40]; the aggregation and anomaly detection features in the JHU repository have been remarked to be up to the standards [39]. Detailed studies on comparisons of different data repositories and addressing questions related to missing data/data imputation and data augmentation have merits; however, this is beyond the scope of the current work. ...

The United States COVID-19 Forecast Hub dataset

Scientific Data