Andrew Gelman's research while affiliated with Columbia University and other places

Publications (436)

Article
Full-text available
We describe a numerical scheme for evaluating the posterior moments of Bayesian linear regression models with partial pooling of the coefficients. The principal analytical tool of the evaluation is a change of basis from coefficient space to the space of singular vectors of the matrix of predictors. After this change of basis and an analytical inte...
Article
Background: Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally acquired immunity, which varies broa...
Preprint
Full-text available
Recent concerns that machine learning (ML) may be facing a reproducibility and replication crisis suggest that some published claims in ML research cannot be taken at face value. These concerns inspire analogies to the replication crisis affecting the social and medical sciences, as well as calls for greater integration of statistical approaches to...
Article
I discuss a published paper in political science that made a claim that aroused skepticism. The reanalysis is an example of how we, as consumers as well as producers of science, can engage with published work. This can be viewed as a sort of collaboration performed implicitly between the authors of a published paper and later researchers who want t...
Preprint
Explicit knowledge of total community-level immune seroprevalence is critical to developing policies to mitigate the social and clinical impact of SARS-CoV-2. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity, which varies broadly throughou...
Preprint
Full-text available
In this comment, we highlight a difference of opinion with "Mertens, S., Herberz, M., Hahnel, U. J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. Proceedings of the National Academy of Sciences, 119(1)."
Article
The Millennium Villages Project was an integrated rural development program carried out for a decade in 10 clusters of villages in sub-Saharan Africa starting in 2005, and in a few other sites for shorter durations. An evaluation of the 10 main sites compared to retrospectively chosen control sites estimated positive effects on a range of economic,...
Preprint
Full-text available
Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. We need evidence to support that the resulting decisions are well-founded. To aid development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (1) in the translation of real-wo...
Preprint
Full-text available
We describe a class of algorithms for evaluating posterior moments of certain Bayesian linear regression models with a normal likelihood and a normal prior on the regression coefficients. The proposed methods can be used for hierarchical mixed effects models with partial pooling over one group of predictors, as well as random effects models with pa...
Article
The use of statistical inference in linguistics and related areas like psychology typically involves a binary decision: either reject or accept some null hypothesis using statistical significance testing. When statistical power is low, this frequentist data-analytic approach breaks down: null results are uninformative, and effect size estimates ass...
Article
Full-text available
The recent successes and failures of political polling invite several questions: Why did the polls get it wrong in some high-profile races? Conversely, how is it that polls can perform so well, even given all the evident challenges of conducting and interpreting them?
Article
Throughout the COVID-19 pandemic, government policy and healthcare implementation responses have been guided by reported positivity rates and counts of positive cases in the community. The selection bias of these data calls into question their validity as measures of the actual viral incidence in the community and as predictors of clinical burden....
Preprint
We introduce Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathf...
Article
Full-text available
We discuss several issues of statistical design, data collection, analysis, communication, and decision-making that have arisen in recent and ongoing coronavirus studies, focusing on tools for assessment and propagation of uncertainty. This paper does not purport to be a comprehensive survey of the research literature; rather, we use examples to il...
Article
We review the most important statistical ideas of the past half century, which we categorize as: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, Bayesian multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data an...
Article
Can a publication format shape qualities of published research? Higgs and Gelman discuss a new study comparing peer-reviewers’ perceptions of Registered Reports to those of standard research articles. The authors conclude the registered publications were at least as good on the qualities measured, and they discuss challenges of doing research on re...
Preprint
In some scientific fields, it is common to have certain variables of interest that are of particular importance and for which there are many studies indicating a relationship with a different explanatory variable. In such cases, particularly those where no relationships are known among explanatory variables, it is worth asking under what conditions...
Preprint
Throughout the COVID-19 pandemic, government policy and healthcare implementation responses have been guided by reported positivity rates and counts of positive cases in the community. The selection bias of these data calls into question their validity as measures of the actual viral incidence in the community and as predictors of clinical burden....
Preprint
Full-text available
Research and development in computer science and statistics have produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But design philosophies that emphasize exploration over other phases of analysis risk confusing a need for flexibility with a conclus...
Article
Psychology research often focuses on interactions, and this has deep implications for inference from nonrepresentative samples. For the goal of estimating average treatment effects, we propose to fit a model allowing treatment to interact with background variables and then average over the distribution of these variables in the population. This can...
Article
It is not always clear how to adjust for control data in causal inference, balancing the goals of reducing bias and variance. We show how, in a setting with repeated experiments, Bayesian hierarchical modeling yields an adaptive procedure that uses the data to determine how much adjustment to perform. The result is a novel analysis with increased s...
Article
To explain the political clout of different social groups, traditional accounts typically focus on the group’s size, resources, or commonality and intensity of its members’ interests. We contend that a group’s penumbra—the set of individuals who are personally familiar with people in that group—is another important explanatory factor that merits sy...
Preprint
Stacking is a widely used model averaging technique that yields asymptotically optimal prediction among all linear averages. We show that stacking is most effective when the model predictive performance is heterogeneous in inputs, so that we can further improve the stacked mixture with a hierarchical model. With the input-varying yet partially-pool...
Preprint
Millions of people in Bangladesh drink well water contaminated with arsenic. Despite the severity of this heath crisis, little is known about the extent to which groundwater arsenic concentrations change over time: Are concentrations generally rising, or is arsenic being flushed out of aquifers? Are spatially patterns of high and low concentrations...
Preprint
Full-text available
We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common...
Preprint
We describe a numerical scheme for evaluating the posterior moments of Bayesian linear regression models with partial pooling of the coefficients. The principal analytical tool of the evaluation is a change of basis from coefficient space to the space of singular vectors of the matrix of predictors. After this change of basis and an analytical inte...
Preprint
Full-text available
The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using thes...
Article
Being able to draw accurate conclusions from childhood obesity trials is important to make advances in reversing the obesity epidemic. However, obesity research sometimes is not conducted or reported to appropriate scientific standards. To constructively draw attention to this issue, we present 10 errors that are commonly committed, illustrate each...
Preprint
Accounting for sex and gender characteristics is a complex, structural challenge in social science research. While other methodology papers consider issues surrounding appropriate measurement, we consider how gender and sex impact adjustments for non-response patterns in sampling and survey estimates. We consider the problem of survey adjustment ar...
Article
Full-text available
Presidential elections can be forecast using information from political and economic conditions, polls, and a statistical model of changes in public opinion over time. However, these "knowns" about how to make a good presidential election forecast come with many unknowns due to the challenges of evaluating forecast calibration and communication. We...
Preprint
The normalizing constant plays an important role in Bayesian computation, and there is a large literature on methods for computing or approximating normalizing constants that cannot be evaluated in closed form. When the normalizing constant varies by orders of magnitude, methods based on importance sampling can require many rounds of tuning. We pre...
Article
When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modelling capturing variation in these parameters across experiments. Another concern is the people in the sample not bei...
Article
Full-text available
Abstract The roof and spire of Notre‐Dame cathedral in Paris that caught fire and collapsed on 15 April 2019 were covered with 460 t of lead (Pb). Government reports documented Pb deposition immediately downwind of the cathedral and a twentyfold increase in airborne Pb concentrations at a distance of 50 km in the aftermath. For this study, we colle...
Article
A central theme in the field of survey statistics is estimating population-level quantities through data coming from potentially non-representative samples of the population. Multilevel regression and poststratification (MRP), a model-based approach, is gaining traction against the traditional weighted approach for survey estimates. MRP estimates a...
Preprint
When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms can have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights...
Preprint
We discuss several issues of statistical design, data collection, analysis, communication, and decision making that have arisen in recent and ongoing coronavirus studies, focusing on tools for assessment and propagation of uncertainty. This paper does not purport to be a comprehensive survey of the research literature; rather, we use examples to il...
Article
Full-text available
Objectives Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and estimated statistical power. This is “Weisburd’s paradox” and has been attributed by Weisburd et al. (in Crime Justice 17:337–...
Preprint
When testing for a rare disease, prevalence estimates can be highly sensitive to uncertainty in the specificity and sensitivity of the test. Bayesian inference is a natural way to propagate these uncertainties, with hierarchical modeling capturing variation in these parameters across experiments. Another concern is the people in the sample not bein...
Article
Declining telephone response rates have forced several transformations in survey methodology, including cell phone supplements, nonprobability sampling, and increased reliance on model-based inferences. At the same time, advances in statistical methods and vast amounts of new data sources suggest that new methods can combat some of these problems....
Preprint
Every philosophy has holes, and it is the responsibility of proponents of a philosophy to point out these problems. Here are a few holes in Bayesian data analysis: (1) the usual rules of conditional probability fail in the quantum realm, (2) flat or weak priors lead to terrible inferences about things we care about, (3) subjective priors are incohe...
Article
Full-text available
Why is there no consensual way of conducting Bayesian analyses? We present a summary of agreements and disagreements of the authors on several discussion points regarding Bayesian inference. We also provide a thinking guideline to assist researchers in conducting Bayesian inference in the social and behavioural sciences.
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
Full-text available
We present a consensus-based checklist to improve and document the transparency of research reports in social and behavioural research. An accompanying online application allows users to complete the form and generate a report that they can submit with their manuscript or post to a public repository.
Preprint
Data analysis in linguistics and related areas like psychology typically involves a binary decision: either reject or accept the null hypothesis. This frequentist data-analytic approach has not only been widely misused, but also does not lead to any decisive conclusions, particularly when statistical power is low. Using an example from psycholingui...
Article
Cognitive modeling shares many features with statistical modeling, making it seem trivial to borrow from the practices of robust Bayesian statistics to protect the practice of robust cognitive modeling. We take one aspect of statistical workflow—prior predictive checks—and explore how they might be applied to a cognitive modeling task. We find that...
Article
Debate abounds about how to describe weaknesses in statistics. Andrew Gelman has no confidence in the term "confidence interval," but Sander Greenland doesn't find "uncertainty interval" any better and argues instead for "compatibility interval" © Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under...
Preprint
A central theme in the field of survey statistics is estimating population-level quantities through data coming from potentially non-representative samples of the population. Multilevel Regression and Poststratification (MRP), a model-based approach, is gaining traction against the traditional weighted approach for survey estimates. MRP estimates a...
Article
Full-text available
Being able to draw accurate conclusions from childhood obesity trials is important to make advances in reversing the obesity epidemic. However, obesity research sometimes is not conducted or reported to appropriate scientific standards. To constructively draw attention to this issue, we present 10 errors that are commonly committed, illustrate each...
Article
The foundations and the practice of statistics are in turmoil, with corresponding threads of argument in biology, economics, political science, psychology, public health, and other fields that rely on quantitative research in the presence of variation and uncertainty. Lots of people (myself included) have strong opinions on what should not be done,...
Article
Full-text available
This report presents a new implementation of the Besag-York-Mollié (BYM) model in Stan, a probabilistic programming platform which does full Bayesian inference using Hamiltonian Monte Carlo (HMC). We review the spatial auto-correlation models used for areal data and disease risk mapping, and describe the corresponding Stan implementations. We also...
Preprint
Psychology is all about interactions, and this has deep implications for inference from non-representative samples. For the goal of estimating average treatment effects, we propose to fit a model allowing treatment to interact with background variables and then average over the distribution of these variables in the population. This can be seen as...
Preprint
To explain the political clout of different social groups, traditional accounts typically focus on the group's size, resources, or commonality and intensity of its members' interests. We contend that a group's "penumbra"-the set of individuals who are personally familiar with people in that group--is another important explanatory factor that merits...
Preprint
Cognitive modelling shares many features with statistical modelling, making it seem trivial to borrow from the practices of robust Bayesian statistics to protect the practice of robust cognitive modelling. We take one aspect of statistical workflow-prior predictive checks-and explore how they might be applied to a cognitive modelling task. We find...
Preprint
Full-text available
The new book by philosopher Deborah Mayo is relevant to data science for topical reasons, as she takes various controversial positions regarding hypothesis testing and statistical practice, and also as an entry point to thinking about the philosophy of statistics. The present article is a slightly expanded version of a series of informal reviews an...
Preprint
It is not always clear how to adjust for control data in causal inference, balancing the goals of reducing bias and variance. In a setting with repeated experiments, Bayesian hierarchical modeling yields an adaptive procedure that uses the data to determine how much adjustment to perform. We demonstrate this procedure on the example that motivated...
Article
How does statistics help us to understand the world? Andrew Gelman weighs up Ian Stewart’s analysis. How does statistics help us to understand the world? Andrew Gelman weighs up Ian Stewart’s analysis. A technician holds a sheet of pre-coated OLED test cells in a laboratory
Preprint
Full-text available
To the Editor of JAMA Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p <0.05 world.” We appreciate that he echoes our calls for “embracing uncertai...
Preprint
Full-text available
To the Editor of JAMA Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p <0.05 world.” We appreciate that he echoes our calls for “embracing uncertai...
Article
Full-text available
When data analysts operate within different statistical frameworks (e.g., frequentist versus Bayesian, emphasis on estimation versus emphasis on testing), how does this impact the qualitative conclusions that are drawn for real data? To study this question empirically we selected from the literature two simple scenarios—involving a comparison of tw...
Preprint
Full-text available
Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic $\widehat{R}$ of Gelman and Rubin (1992) has serious flaws and we propose an alternative that fixes them. We also introduce a co...
Preprint
Full-text available
Compared to the relatively standard way of conducting null hypothesis significance testing, there seem to be fairly large differences in opinion among experts in Bayesian statistics on how best to conduct Bayesian inference. Employing Bayesian methods involves making choices about prior distributions, likelihood functions, and robustness checks, as...
Article
The usual definition of R² (variance of the predicted values divided by the variance of the data) has a problem for Bayesian fits, as the numerator can be larger than the denominator. We propose an alternative definition similar to one that has appeared in the survival analysis literature: the variance of the predicted values divided by the varianc...
Preprint
Full-text available
When data analysts operate within different statistical frameworks (e.g., frequentist versus Bayesian, emphasis on estimation versus emphasis on testing), how does this impact the qualitative conclusions that are drawn for real data? To study this question empirically we selected from the literature two simple scenarios --involving a comparison of...
Article
Full-text available
It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overoptimistic expectations of replicability. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significanc...
Article
Full-text available
In an earlier article in this journal, Gronau and Wagenmakers (2018) discuss some problems with leave-one-out cross-validation (LOO) for Bayesian model selection. However, the variant of LOO that Gronau and Wagenmakers discuss is at odds with a long literature on how to use LOO well. In this discussion, we discuss the use of LOO in practical data a...
Preprint
This article is an invited discussion of the article by Gronau and Wagenmakers (2018) that can be found at https://dx.doi.org/10.1007/s42113-018-0011-7.
Article
Throughout the different phases of a drug development program, randomized trials are used to establish the tolerability, safety and efficacy of a candidate drug. At each stage one aims to optimize the design of future studies by extrapolation from the available evidence at the time. This includes collected trial data and relevant external data. How...
Article
No replication is truly direct, and I recommend moving away from the classification of replications as “direct” or “conceptual” to a framework in which we accept that treatment effects vary across conditions. Relatedly, we should stop labeling replications as successes or failures and instead use continuous measures to compare different studies, ag...
Article
Full-text available
Background: The Millennium Villages Project (MVP) was a 10 year, multisector, rural development project, initiated in 2005, operating across ten sites in ten sub-Saharan African countries to achieve the Millennium Development Goals (MDGs). In this study, we aimed to estimate the project's impact, target attainment, and on-site spending. Methods:...
Article
As a research field expands, scientists have to update their knowledge and integrate the outcomes of a sequence of studies. H