# Sander GreenlandUniversity of California, Los Angeles | UCLA

Sander Greenland

## About

603

Publications

102,439

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

79,493

Citations

## Publications

Publications (603)

Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them ar...

It has long been argued that we need to consider much more than an observed point estimate and a p-value to understand statistical results. One of the most persistent misconceptions about p-values is that they are necessarily calculated assuming a null hypothesis of no effect is true. Instead, p-values can and should be calculated for multiple hypo...

This is a reply to Muff, S. et al. (2022) Rewriting results sections in the language of evidence, Trends in Ecology & Evolution 37, 203-210.

Objective
Recently Doi et al. argued that risk ratios should be replaced with odds ratios in clinical research. We disagreed, and empirically documented the lack of portability of odds ratios, while Doi et al. defended their position. In this response we highlight important errors in their position.
Study Design and Setting
We counter Doi et al.’s...

Objective: Recently Doi et al. argued that risk ratios should be replaced with odds ratios in clinical research. We disagreed, and empirically documented the lack of portability of odds ratios, while Doi et al. defended their position. In this response we highlight important errors in their position. Study Design and Setting: We counter Doi et al.'...

A previous note illustrated how the odds of an outcome has an undesirable property for risk summarization and communication: Noncollapsibility, defined as a failure of a group measure to represent a simple average of the measure over individuals or subgroups. The present sequel discusses how odds ratios amplify odds noncollapsibility and provides a...

To prevent statistical misinterpretations, it has long been advised to focus on estimation instead of statistical testing. This sound advice brings with it the need to choose the outcome and effect measures on which to focus. Measures based on odds or their logarithms have often been promoted due to their pleasing statistical properties, but have a...

Statistical science (as opposed to mathematical statistics) involves far more than probability theory, for it requires realistic causal models of data generators even for purely descriptive goals. Statistical decision theory requires more causality: Rational decisions are actions taken to minimize costs while maximizing benefits, and thus require e...

Background:
Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple methods to...

An extended technical discussion of $S$-values and unconditional information can be found in Greenland, 2019. Here we briefly cover several technical topics mentioned in our main paper, Rafi & Greenland, 2020: Different units for (scaling of) the $S$-value besides base-2 logs (bits); the importance of uniformity (validity) of the $P$-value for inte...

Measures of information and surprise, such as the Shannon information (the $S$ value), quantify the signal present in a stream of noisy data. We illustrate the use of such information measures in the context of interpreting $P$ values as compatibility indices. $S$ values help communicate the limited information supplied by conventional statistics a...

Whether or not "the foundations and the practice of statistics are in turmoil", it is wise to question methods whose misuse has been lamented for over a century. Perhaps the most widespread misuse of statistics is taking the crossing of some threshold as license for declaring "statistical significance" and for generalizing from a single study. Such...

Link (https://arxiv.org/abs/1909.08583)
We have elsewhere reviewed proposals to reform terminology and improve interpretations of conventional statistics by emphasizing logical and information concepts over probability concepts. We here give detailed reasons and methods for reinterpreting statistics (including but not limited to) P-values and inte...

Link: (https://arxiv.org/abs/1909.08579)
Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we...

We have elsewhere reviewed proposals to reform terminology and improve interpretations of conventional statistics by emphasizing logical and information concepts over probability concepts. We here give detailed reasons and methods for reinterpreting statistics (including but not limited to) P-values and interval estimates in unconditional terms, wh...

Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple proposals to aid resear...

Debate abounds about how to describe weaknesses in statistics. Andrew Gelman has no confidence in the term "confidence interval," but Sander Greenland doesn't find "uncertainty interval" any better and argues instead for "compatibility interval" © Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under...

To the Editor of JAMA
Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p <0.05 world.” We appreciate that he echoes our calls for “embracing uncertai...

To the Editor of JAMA
Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p <0.05 world.” We appreciate that he echoes our calls for “embracing uncertai...

The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent misinterpretation of P-values as error probabilities. It then discusses several properties of P-values that have been presented as fatal flaws: That P-values exhi...

Traducción: Francesc J. Hernàndez (Universitat de València)

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from sing...

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from sing...

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from sing...

There is a massive crisis of confidence in statistical inference, which has largely been attributed to overemphasis on and abuse of hypothesis testing. Much of the abuse stems from failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Unedited and un...

Misconceptions about the impact of case–control matching remain common. We discuss several subtle problems associated with matched case–control studies that do not arise or are minor in matched cohort studies: (1) matching, even for non-confounders, can create selection bias; (2) matching distorts dose–response relations between matching variables...

Our results illustrate the sensitivity of odds ratios to the transient elastography cutpoint used to define cirrhosis. Given problems with cutpoints, we recommend regression analysis with a continuous or ordinal-outcome model to obtain a summary across the variation.

Marginal structural models for time-fixed treatments fit using inverse-probability weighted estimating equations are increasingly popular. Nonetheless, the resulting effect estimates are subject to finite-sample bias when data are sparse, as is typical for large-sample procedures. Here we propose a semi-Bayes estimation approach which penalizes or...

There is no complete solution for the problem of abuse of statistics, but methodological training needs to cover cognitive biases and other psychosocial factors affecting inferences. The present paper discusses 3 common cognitive distortions: 1) dichotomania, the compulsion to perceive quantities as dichotomous even when dichotomization is unnecess...

Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with st...

Purpose:
Measurement error is an important source of bias in epidemiological studies. We illustrate three approaches to sensitivity analysis for the effect of measurement error: imputation of the 'true' exposure based on specifying the sensitivity and specificity of the measured exposure (SS); direct imputation (DI) using a regression model for th...

I present an overview of two methods controversies that are central to analysis and inference: That surrounding causal modeling as reflected in the “causal inference” movement, and that surrounding null bias in statistical methods as applied to causal questions. Human factors have expanded what might otherwise have been narrow technical discussions...

Effects of treatment or other exposure on outcome events are commonly measured by ratios of risks, rates, or odds. Adjusted versions of these measures are usually estimated by maximum likelihood regression (eg, logistic, Poisson, or Cox modelling). But resulting estimates of effect measures can have serious bias when the data lack adequate case num...

Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly
when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled
covariates are often deployed. We review several traditional modelling strategies, including...

Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an atte...

Risk and rate advancement periods (RAP) measure the impact of an exposure on the relation of age to disease. Specifically,
they quantify the time by which the risk or rate of a disease is advanced among exposed subjects conditional on disease-free
survival to a certain baseline age. The fact that these measures incorporate timing of disease occurre...

Consider an observational study of the effect of a “treatment” variable X on an outcome variable Y in which multiple confounders must be controlled. Simultaneous stratification on all observed confounder combinations may lead to many uninformative strata (i.e., strata in which there is either no variation in the treatment or no variation in the out...

We present a command, penlogit, for approximate Bayesian logistic regression using penalized likelihood estimation via data augmentation. This command automatically adds specific prior-data records to a dataset. These records are computed so that they generate a penalty function for the log likelihood of a logistic model, which equals (up to an add...

This article discusses definitions and concepts of effect modification, interaction, synergism, and related concepts and terms.

Job exposure matrices (JEMs) are used to measure exposures based on information about particular jobs and tasks. JEMs are especially useful when individual exposure data cannot be obtained. Nonetheless, there may be other workplace exposures associated with the study disease that are not measured in available JEMs. When these exposures are also ass...

Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small-sample bias-reduction method (al...

RotaTeq® pentavalent human rotavirus vaccine (RV5) is effective against rotavirus illness and rotavirus-related hospitalizations and death. Effectiveness depends on adherence to the dosing schedule, which includes 3 doses at ages 2, 4 and 6 months. Two studies have used automated claims databases to estimate the proportion of vaccinated infants who...

A probability distribution may have some properties that are stable under a structure (e.g., a causal graph) and other properties that are unstable. Stable properties are implied by the structure and thus will be shared by populations following the structure. In contrast, unstable properties correspond to special circumstances that are unlikely to...

Most epidemiology textbooks that discuss models are vague on details of model selection. This lack of detail may be understandable since selection should be strongly influenced by features of the particular study, including contextual (prior) information about covariates that may confound, modify, or mediate the effect under study. It is thus impor...

We describe how ordinary interpretations of causal models and causal graphs fail to capture important distinctions among ignorable allocation mechanisms for subject selection or allocation. We illustrate these limitations in the case of random confounding and designs that prevent such confounding. In many experimental designs individual treatment a...

Measures of causal attribution and preventive potential appear deceptively simple to define, yet have many subtle variations and are subject to numerous pitfalls in conceptualization, interpretation and application. This paper reviews basic concepts, measures, and problems to serve as an introduction to more detailed literature. Allowing for validi...

One-carbon metabolism (folate metabolism) is considered important in carcinogenesis because of its involvement in DNA synthesis and biological methylation reactions. We investigated the associations of single nucleotide polymorphisms (SNPs) in folate metabolic pathway and the risk of three GI cancers in a population-based case-control study in Taix...

This article discusses the concepts and definitions associated with terms such as attributable fraction, attributable risk, and probability of causation. It is emphasized that there are several incompatible definitions of attributable risk. Some of these definitions apply to concepts of excess caseload and excess rate, which are estimable from epid...

This article discusses definitions and concepts of effect modification, interaction, synergism, and related concepts and terms.
Keywords:
antagonism;
causal coaction;
effect-measure modification;
effect modification;
heterogeneity of effect;
interaction;
synergism

The word confounding has been used to refer to at least three distinct concepts. In the oldest usage, confounding is a bias in estimating causal effects. In a second and more recent usage, confounding is a synonym for noncollapsibility. In a third usage, originating in the experimental-design literature, confounding refers to inseparability of main...

The word confounding has been used to refer to at least three distinct concepts. In the oldest usage, confounding is a bias in estimating causal effects (see Causation). This bias is sometimes informally described as a mixing of effects of extraneous factors (called confounders) with the effect of interest. This usage predominates in nonexperimenta...

In a cohort study, the numerator and denominator of each disease frequency (incidence proportion, incidence rate, or incidence odds) are measured, which requires enumerating the entire population and keeping it under surveillance. A case‐control study observes the population more efficiently by using a sample of the population, which becomes the co...

Quantitative bias analysis serves several objectives in epidemiological research. First, it provides a quantitative estimate of the direction, magnitude and uncertainty arising from systematic errors. Second, the acts of identifying sources of systematic error, writing down models to quantify them, assigning values to the bias parameters and interp...

An association between testosterone therapy (TT) and cardiovascular disease has been reported and TT use is increasing rapidly.
We conducted a cohort study of the risk of acute non-fatal myocardial infarction (MI) following an initial TT prescription (N = 55,593) in a large health-care database. We compared the incidence rate of MI in the 90 days f...

Correctable weaknesses in the design, conduct, and analysis of biomedical and public health research studies can produce misleading results and waste valuable resources. Small effects can be difficult to distinguish from bias introduced by study design and analyses. An absence of detailed written protocols and poor documentation of research is comm...

The word confounding has been used to refer to at least three distinct concepts. In the oldest usage, confounding is a bias in estimating causal effects (see Causation). This bias is sometimes informally described as a mixing of effects of extraneous factors (called confounders) with the effect of interest. This usage predominates in nonexperimenta...

The convention in epidemiology and biostatistics is to divide the study of mismeasured variables into the areas of measurement error for continuous variables and misclassification for categorical variables. Although the topics overlap considerably, chapter Measurement Error of this handbook focuses on measurement error, whereas the present chapter...

In observational studies of the effect of an exposure on an outcome, the exposure-outcome association is usually confounded by other causes of the outcome (potential confounders). One common method to increase efficiency is to match the study on potential confounders. Matched case-control studies are relatively common and well covered by the litera...

Although single nucleotide polymorphisms (SNPs) of NBS1 have been associated with susceptibility to lung and upper aerodigestive tract (UADT) cancers, their relations to cancer survival and measures of effect are largely unknown.
Using follow-up data from 611 lung cancer cases and 601 UADT cancer cases from a population-based case-control study in...

The method of maximum likelihood is widely used in epidemiology, yet many epidemiologists receive little or no education in the conceptual underpinnings of the approach. Here we provide a primer on maximum likelihood and some important extensions which have proven useful in epidemiologic research, and which reveal connections between maximum likeli...

Background
Lung cancer remains the leading cause of cancer death worldwide, with tobacco smoking established as the main risk factor. Cannabis smoke contains similar carcinogens as tobacco smoke including the polycyclic aromatic hydrocarbons; animal studies and human case series and histopathologic studies have suggested its potential carcinogenic...

Birth certificates are a convenient source of population data for epidemiologic studies. It is well documented, however, that birth certificate data can be highly inaccurate. Nonetheless, studies based on birth certificates are routinely analyzed without accounting for sources of data errors. We focused on the association between maternal cigarette...

We use causal diagrams to illustrate the consequences of matching and the appropriate handling of matched variables in cohort and case-control studies. The matching process generally forces certain variables to be independent despite their being connected in the causal diagram, a phenomenon known as unfaithfulness. We show how causal diagrams can b...

: Previous studies reported associations of occupational electric and magnetic fields (MF) with neurodegenerative diseases (NDDs). Results differ between studies using proxy exposure based on occupational titles and estimated MF levels. We conducted a meta-analysis of occupational MF NDD, primarily Alzheimer disease (AD), and motor neuron diseases...

It is common to present multiple adjusted effect estimates from a single model in a single table. For example, a table might
show odds ratios for one or more exposures and also for several confounders from a single logistic regression. This can lead
to mistaken interpretations of these estimates. We use causal diagrams to display the sources of the...

Purpose:
Special care must be taken when adjusting for outcome misclassification in case-control data. Basic adjustment formulas using either sensitivity and specificity or predictive values (as with external validation data) do not account for the fact that controls are sampled from a much larger pool of potential controls. A parallel problem ari...

A control group is a group of subjects who are given a control (reference) exposure, such as a placebo or no exposure, to provide a baseline against which to measure effects of the exposure in an exposed group of subjects.

In response to the widespread abuse and misinterpretation of significance tests of null hypotheses, some editors and authors have strongly discouraged P values. However, null P values still thrive in most journals and are routinely misinterpreted as probabilities of a "chance finding" or of the null, when they are no such thing. This misuse may be...

The logistic distribution is a popular probability model yet it is usually not motivated with reference to the processes under study. While its popularity can be attributed to its simplicity, it can also be derived from basic contextual considerations. Although it has been shown that logistic growth is the limiting form of a class of Markov process...

Bayesian methods have been found to have clear utility in epidemiologic analyses involving sparse-data bias or considerable background information. Easily implemented methods for conducting Bayesian analyses by data augmentation have been previously described but remain in scant use. Thus, we provide guidance on how to do these analyses with ordina...

Ratio estimators of effect are ordinarily obtained by exponentiating maximum-likelihood estimators (MLEs) of log-linear or logistic regression coefficients. These estimators can display marked positive finite-sample bias, however. We propose a simple correction that removes a substantial portion of the bias due to exponentiation. By combining this...

IntroductionA brief commentary on developments since 1970Ambiguities of observational extensionsCausal diagrams and structural equationsCompelling versus plausible assumptions, models and inferencesNonidentification and the curse of dimensionalityIdentification in practiceIdentification and bounded rationalityConclusion
AcknowledgmentsReferences

Experience with tolerance protocols has shown that none is perfect and that each escape from tolerance must be identified early to prevent graft failure. In addition, some test is needed for patients who are weaned off immunosuppression (IS) to forewarn of weaning failure. The usual measures of function--such as serum creatinine levels--are not sen...

Causation probabilities are often a component of decisions on awarding compensation for radiation exposure and descriptions of the number of cancers caused by radiation releases. In many instances, the use of epidemiologic data to calculate such probabilities may seriously underestimate the number of people harmed and the percentage of cancers indu...

In hemodialysis patients, lower body mass index and weight loss have been associated with higher mortality rates, a phenomenon sometimes called the obesity paradox. This apparent paradox might be explained by loss of muscle mass. The authors thus examined the relation to mortality of changes in dry weight and changes in serum creatinine levels (a m...

This article summarizes arguments against the use of power to analyze data, and illustrates a key pitfall: Lack of statistical significance (e.g., p > .05) combined with high power (e.g., 90%) can occur even if the data support the alternative more than the null. This problem arises via selective choice of parameters at which power is calculated, b...

Bayesian posterior parameter distributions are often simulated using Markov chain Monte Carlo (MCMC) methods. However, MCMC methods are not always necessary and do not help the uninitiated understand Bayesian inference. As a bridge to understanding Bayesian inference, the authors illustrate a transparent rejection sampling method. In example 1, the...

Values influence choice of methodology and thus influence every risk assessment and inference. To deal with this inescapable reality, we need to replace vague and unattainable calls for objectivity with more precise operational qualities. Among qualities that seem widely valued are transparency (openness) and neutrality (balance, fairness). Conform...

We are grateful to Dershin Ke for pointing out a numerical error in the last example of our paper, ‘Ecological Bias, Confounding,
and Effect Modification’ (Int J Epidemiol 1989; 18:269-274): In the unnumbered table at the bottom of p. 272, the alcohol-use
prevalence in region A (given as 0.174) should be 0.384. Consequently, the rate-ratio estimate...

This chapter review and classify uncertainties in clinical medicine. Epistemic uncertainty is intimately linked to the relationship between theory, evidence, and knowledge. The relationships among observed, observable, and unobservable realities express uncertainties that can be characterized as a lack of knowledge about what is known (unknown know...

This chapter uses stark and purely logical descriptions to discuss some familiarity with probability and statistics. Special emphasis is given to issues of causal inference from uncontrolled observations. It is possible to distinguish two kinds of inference: Inference to causal models from observations, and inference from causal models to the effec...