Andrew Gelman’s research while affiliated with Columbia University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (515)


The Piranha Problem: Large Effects Swimming in a Small Pond
  • Article

January 2025

·

3 Reads

·

1 Citation

Notices of the American Mathematical Society

Christopher Tosh

·

Philip Greengard

·

Ben Goodrich

·

[...]

·

Daniel Hsu



Schematic overview of the SEIR transmission model for SARS-CoV-2 and the steps to generate the number of laboratory-confirmed cases and the observed seroprevalence
Result from unstratified models
(A) Posterior predictive plot for laboratory-confirmed cases (left y-axis, green ribbon) and cumulative incidence (right y-axis, gray ribbon) of SARS-CoV-2 in the canton of Geneva, Switzerland, for three iterations of the model with different sampling distributions (Poisson, quasi-Poisson and negative-binomial). Circles are weekly counts of laboratory-confirmed cases and pluses are estimates of seroprevalence at two time points. (B-D) Comparison of three methods of implementation of time-varying transmission on simulated data of a SARS-CoV-2 epidemic (posterior predictive plot, time-varying transmission ρ(t), and ascertainment rate by period). (E-F) Benchmark of different implementations of time-varying transmission on simulated data of a SARS-CoV-2 epidemic, with performance expressed in effective sample size (ESS) per second, error defined as the difference between the median posterior and true ρ(t), and the width of the 95% credible interval of ρ(t) as a measure for precision. See Table 1 for details about the knot sequence.
Modelled SARS-CoV-2 epidemic in Geneva, Switzerland, in 2020
(A) Posterior predictive plot for laboratory-confirmed cases (left y-axis, colored ribbon) and cumulative incidence (right y-axis, gray ribbon) per age group. Circles are weekly counts of laboratory-confirmed cases and pluses are estimates of seroprevalence at two time points. (B) Estimates of the time-varying change in transmission rate per age group using B-splines. (C) Estimates of the ascertainment rate per age group and time period.
Knot sequences
Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models
  • Article
  • Full-text available

April 2024

·

78 Reads

·

3 Citations

Compartmental models that describe infectious disease transmission across subpopulations are central for assessing the impact of non-pharmaceutical interventions, behavioral changes and seasonal effects on the spread of respiratory infections. We present a Bayesian workflow for such models, including four features: (1) an adjustment for incomplete case ascertainment, (2) an adequate sampling distribution of laboratory-confirmed cases, (3) a flexible, time-varying transmission rate, and (4) a stratification by age group. Within the workflow, we benchmarked the performance of various implementations of two of these features (2 and 3). For the second feature, we used SARS-CoV-2 data from the canton of Geneva (Switzerland) and found that a quasi-Poisson distribution is the most suitable sampling distribution for describing the overdispersion in the observed laboratory-confirmed cases. For the third feature, we implemented three methods: Brownian motion, B-splines, and approximate Gaussian processes (aGP). We compared their performance in terms of the number of effective samples per second, and the error and sharpness in estimating the time-varying transmission rate over a selection of ordinary differential equation solvers and tuning parameters, using simulated seroprevalence and laboratory-confirmed case data. Even though all methods could recover the time-varying dynamics in the transmission rate accurately, we found that B-splines perform up to four and ten times faster than Brownian motion and aGPs, respectively. We validated the B-spline model with simulated age-stratified data. We applied this model to 2020 laboratory-confirmed SARS-CoV-2 cases and two seroprevalence studies from the canton of Geneva. This resulted in detailed estimates of the transmission rate over time and the case ascertainment. Our results illustrate the potential of the presented workflow including stratified transmission to estimate age-specific epidemiological parameters. The workflow is freely available in the R package HETTMO, and can be easily adapted and applied to other infectious diseases.

Download

Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations for Applied Regression and Causal Inference

March 2024

·

22 Reads

·

1 Citation

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.


Setting up a course of study

March 2024

·

1 Read

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, 52 drills, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging “flipped classroom” environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels and practice exam questions to help guide learning. Designed to accompany the authors’ previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or to be used by learners as a hands-on workbook. The authors are experienced researchers who have published articles in hundreds of different scientific journals in fields including statistics, computer science, policy, public health, political science, economics, sociology, and engineering. They have also published articles in the Washington Post, the New York Times, Slate, and other public venues. Their previous books include Bayesian Data Analysis, Teaching Statistics: A Bag of Tricks, and Regression and Other Stories.


Final exam questions

March 2024

·

4 Reads

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.


Week by week: the second semester

March 2024

·

1 Read

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, 52 drills, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging “flipped classroom” environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels and practice exam questions to help guide learning. Designed to accompany the authors’ previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or to be used by learners as a hands-on workbook. The authors are experienced researchers who have published articles in hundreds of different scientific journals in fields including statistics, computer science, policy, public health, political science, economics, sociology, and engineering. They have also published articles in the Washington Post, the New York Times, Slate, and other public venues. Their previous books include Bayesian Data Analysis, Teaching Statistics: A Bag of Tricks, and Regression and Other Stories.


Active learning

March 2024

·

23 Reads

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, 52 drills, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging “flipped classroom” environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels and practice exam questions to help guide learning. Designed to accompany the authors’ previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or to be used by learners as a hands-on workbook. The authors are experienced researchers who have published articles in hundreds of different scientific journals in fields including statistics, computer science, policy, public health, political science, economics, sociology, and engineering. They have also published articles in the Washington Post, the New York Times, Slate, and other public venues. Their previous books include Bayesian Data Analysis, Teaching Statistics: A Bag of Tricks, and Regression and Other Stories.


How to use this book

March 2024

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.


Citations (59)


... As the co-editors introduce each of the six themed articles in their editorial , which I expect to break the record for the most viewed single article in a week in HDSR, currently held by their 2020 editorial , I will only report how delighted I was seeing the balanced coverage in terms of outcomes (vote shares and voter turnout; Gelman et al., 2024, andAnsolabehere et al., 2024), methodologies (quantitative and qualitative; Donnini et al., 2024, andLichtman, 2024), and genre-there is even a murder mystery (Bailey, 2024)! But I would like to highlight the conversation with Minnesota Secretary of State Steve Simon . ...

Reference:

AI Has Won Nobel Prizes in Hard Science: Can Humans Be Smarter—and Softer on Each Other?
Grappling With Uncertainty in Forecasting the 2024 U.S. Presidential Election
  • Citing Article
  • October 2024

... The value of running many chains in parallel is subtler, but very significant. It enables diagnostics (e.g., Gelman and Rubin, 1992;Margossian et al., 2024), offers opportunities for hyperparameter adaptation (e.g., Gilks et al., 1994;Hoffman and Sountsov, 2022), reduces variance, and reduces bias for some estimands such as quantiles. See Margossian and Gelman (2023) for an excellent discussion of the virtues (and challenges) of running many short chains instead of a few long ones. ...

Nested Rˆ: Assessing the Convergence of Markov Chain Monte Carlo When Running Many Short Chains
  • Citing Article
  • January 2024

Bayesian Analysis

... Finally, over-interpretation of arbitrary thresholds like 0.05 have also led to phacking (selective reporting) to achieve significant results. This has led to many calls for reform (5)(6)(7)(8)(9)(10)(11)(12) and to address this issue, multiple researchers have either proposed a shift in the language used (13)(14)(15), a shift in the scale used to introduce and interpret p values (16), or even a retirement of statistical significance altogether (17,18). ...

Abandon statistical significance.
  • Citing Chapter
  • January 2024

... Bayesian estimation methods have gained significant traction in the calibration of epidemic models based on ODEs. These methods integrate prior knowledge with new data to refine parameter estimates, yielding posterior distributions that explicitly account for uncertainty and incorporate expert knowledge into the modeling process [4,5,6,7,8,9,10,11]. Using Bayes' theorem, these methods merge prior parameter distributions with the likelihood of observed data, resulting in posterior distributions. ...

Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models

... For the probabilistic estimation the Bayesian interpretation of probability can be considered as the most adequate basis for the consistent representation of uncertainties, independently of their sources [6,9]. With the Bayesian method the a priori probabilities of the structure safety levels provided by experts are combined with the likelihood probability of the monitoring information [19][20][21]. ...

Past, Present and Future of Software for Bayesian Inference
  • Citing Article
  • February 2024

Statistical Science

... The multilevel regression and poststratification approach (MRP) aim to translate estimates from a nonprobability sample to a target population exhibiting a known demographic structure [18]. Originating from political science, the MRP approach was recently applied in other fields, such as public health [17,18,29]. ...

Using leave‐one‐out cross validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale
  • Citing Article
  • December 2023

Statistics in Medicine

... The existence of such a conceptual framework already makes clear that building such models is not trivial. Subsequently, less suitable modeling decisions for the data at hand (Moran et al., 2023), or 'buggy' computational implementation (Modrák et al., 2023), both can lead to inaccurate inferences, which in turn affect ecological decision-making and policy. But, mistakes are often subtle and difficult to detect, and despite the power of Bayesian models, many ecologists do not fully leverage modern model-building workflow tools, such as provided by Moran et al. (2023), or Modrák et al. (2023), to ensure that their models are correctly specified and implemented. ...

Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity
  • Citing Article
  • January 2023

Bayesian Analysis

... Though incidence/prevalence was slightly higher in London and slightly lower in the North East of England and Scotland. Previous research has demonstrated that there was substantial SARS-CoV-2 spatial heterogeneity early in the SARS-CoV-2 pandemic, however this heterogeneity later disappeared, likely due to changes in mixing behaviours and shifting immunological dynamics [21]. Differences in incidence/prevalence between the sexes were negligible. ...

Bayesian spatial modelling of localised SARS-CoV-2 transmission through mobility networks across England

... Taking the expectation of "ethical harm" aggregates over individuals and groups. Just as a particular value for a model error metric (like accuracy) or a point estimate (like an estimated average treatment effect) can admit numerous solutions that vary at the level of the individual units or groups (e.g., (Coston, Rambachan, and Chouldechova 2021;Gelman, Hullman, and Kennedy 2023;Marx, Calmon, and Ustun 2020)), aggregating ethical harms over different individuals can lead evaluators to overlook individual or group-specific concerns. For example, two evaluation protocols may be expected to result in the same level of ethical harm to participants, while differing greatly in how harm is distributed over the specific participants or groups of participants. ...

Causal Quartets: Different Ways to Attain the Same Average Treatment Effect
  • Citing Article
  • October 2023

The American Statistician

... These concerns have prompted a shift in focus from performance to fairness and trustworthiness as a pilar of model evaluation (Mehrabi et al. 2021;Eshete 2021). While fairness aims to mitigate disparate impacts on different population groups, trustworthiness encompasses broader notions of model reliability, robustness, competence, generalization, explainability, transparency, reproducibility, privacy, security, and accountability (Serban et al. 2021;von Eschenbach 2021;Li et al. 2023;Broderick et al. 2023). In this paper, by borrowing from the philosophy literature, we develop a theory of trustworthiness from a lens of reliability and competence and investigate its implications for classification models in the context of decision-making. ...

Toward a taxonomy of trust for probabilistic machine learning

Science Advances