# Paul R. Rosenbaum's research while affiliated with University of Pennsylvania and other places

## Publications (329)

Article
In causal inference, natural strata are a new compromise between conventional strata and matching in a fixed ratio, say pair matching or matching two controls to each treated individual. Like matching in a fixed ratio, natural strata: (a) do not require weights, (b) balance many measured covariates beyond those that define the strata and (c) provid...
Article
Are weak associations between a treatment and a binary outcome always sensitive to small unmeasured biases in observational studies? This possibility is often discussed in epidemiology. The familiar Mantel‐Haenszel test for a 2×2×S$$2\times 2\times S$$ contingency table exaggerates sensitivity to unmeasured biases when the population odds ratios...
Article
Multivariate matching has two goals: (i) to construct treated and control groups that have similar distributions of observed covariates, and (ii) to produce matched pairs or sets that are homogeneous in a few key covariates. When there are only a few binary covariates, both goals may be achieved by matching exactly for these few covariates. Commonl...
Article
In an observational study, the treatment received and the outcome exhibited may be associated in the absence of an effect caused by the treatment, even after controlling for observed covariates. Two tactics are common: (i) a test for unmeasured bias may be obtained using a secondary outcome for which the effect is known, and (ii) a sensitivity anal...
Article
Full-text available
A quantitative study of treatment effects may form many matched pairs of a treated subject and an untreated control who look similar in terms of covariates measured prior to treatment. When treatments are not randomly assigned, one inevitable concern is that individuals who look similar in measured covariates may be dissimilar in unmeasured covaria...
Article
Objective: To determine if surgery and anesthesia in the elderly may promote Alzheimer's Disease and Related Dementias (ADRD). Background: There is a substantial conflicting literature concerning the hypothesis that surgery and anesthesia promotes ADRD. Much of the literature is confounded by indications for surgery or has small sample size. Thi...
Article
Introduction: This study develops a measure of Alzheimer's disease and related dementias (ADRD) using Medicare claims. Methods: Validation resembles the approach of the American Psychological Association, including (1) content validity, (2) construct validity, and (3) predictive validity. Results: We found that four items-a Medicare claim reco...
Article
Background Nursing resources, such as staffing ratios and skill mix, vary across hospitals. Better nursing resources have been linked to better patient outcomes but are assumed to increase costs. The value of investments in nursing resources, in terms of clinical benefits relative to costs, is unclear.Objective To determine whether there are differ...
Chapter
Design sensitivity is used to quantify the effectiveness of devices discussed in Chap. 5. Several of those devices anticipate a particular pattern of results, perhaps coherence among several outcomes, or a dose–response relationship. To what extent do these considerations reduce sensitivity to unmeasured biases? Must a coherent pattern of associati...
Chapter
The basic tools of multivariate matching are introduced, including the propensity score, distance matrices, calipers imposed using a penalty function, optimal matching, matching with multiple controls, and full matching. The tools are illustrated with a tiny example from genetic toxicology (n = 46), an example that is so small that one can keep tra...
Chapter
“Make your theories elaborate” in observational studies, argued R.A. Fisher, so that the many predictions of such a theory may disambiguate the association between treatment and outcome. How should one plan the analysis of an observational study to check the predictions of an elaborate theory?
Chapter
In an experiment, power and sample size calculations anticipate the outcome of a statistical test that will be performed when the experimental data are available for analysis. In parallel, in an observational study, the power of a sensitivity analysis anticipates the outcome of a sensitivity analysis that will be performed when the observational da...
Chapter
When a treatment may be given at various times, it is important to form matched pairs or sets in which subjects are similar prior to treatment but avoid matching on events that were subsequent to treatment. This is done using risk-set matching, in which a newly treated subject at time t is matched to one or more controls who are not yet treated at...
Chapter
Before R.A. Fisher introduced randomized experimentation, the literature on causal inference emphasized reduction of heterogeneity of experimental units. To what extent is heterogeneity relevant to causal claims in observational studies when random assignment of treatments is unethical or infeasible?
Chapter
Transparency means making evidence evident. An observational study that is not transparent may be overwhelming or intimidating, but it is unlikely to be convincing. Several aspects of transparency are briefly discussed.
Chapter
Three design tasks may usefully follow matching and precede planning of the analysis. Splitting the sample of I pairs into a small planning sample and a large analysis sample may aid in planning the analysis in a manner that increases the design sensitivity. If there will be analytic adjustments for some unmatched variables, it is prudent to check...
Chapter
A counterclaim disputes the claim that an association between treatment received and outcome exhibited reflects an effect caused by the treatment. Some counterclaims undermine themselves. A supplemental statistical analysis may demonstrate this.
Chapter
Observational studies differ from experiments in that randomization is not used to assign treatments. How were treatments assigned? This chapter introduces two simple models for treatment assignment in observational studies. The first model is useful but naïve: it says that people who look comparable are comparable. The second model speaks to a cen...
Chapter
Optimal matching without groups, or optimal nonbipartite matching, offers many additional options for matched designs in both observational studies and experiments. One starts with a square, symmetric distance matrix with one row and one column for each subject recording the distance between any two subjects. Then the subjects are divided into pair...
Chapter
Having constructed a matched control group, one must check that it is satisfactory, in the sense of balancing the observed covariates. If some covariates are not balanced, then adjustments are made to bring them into balance. Three adjustments are near-exact matching, exact matching, and the use of small penalties. Exact matching has a special role...
Chapter
Fine balance means constraining a match to balance a nominal variable, without restricting who is matched to whom, when matching to minimize a distance between treated and control subjects. It may be applied to: (1) a nominal variable with many levels that is difficult to balance using propensity scores, (2) a rare binary variable that is difficult...
Chapter
What features of the design of an observational study affect its ability to distinguish a treatment effect from bias due to an unmeasured covariate uij? This topic, which is the focus of Part III of the book, is sketched in informal terms in the current chapter. An opportunity is an unusual setting in which there is less confounding with unobserved...
Chapter
Large effects in moderate to large studies are typically insensitive to small and moderate unobserved biases, but the concept of a “large effect” is vague. What if most subjects are not much affected by treatment, but a small fraction, perhaps 10% or 20% of subjects, are strongly affected? On average, such an effect may be small, but not at all sma...
Chapter
Simple calculations in the statistical language R illustrate the computations involved in one simple form of multivariate matching. The focus is on how matching is done, not on the many aspects of the design of an observational study. The process is made tangible by describing it in detail, step-by-step, closely inspecting intermediate results; how...
Chapter
This introductory chapter mentions some of the issues that arise in observational studies and describes a few well designed studies. Section 1.7 outlines the book, describes its structure, and suggests alternative ways to read it.
Chapter
An observational study is an empiric investigation of treatment effects when random assignment to treatment or control is not feasible. Because observational studies are structured to resemble simple randomized experiments, an understanding of the role randomization plays in experiments is important as background. As a prelude to the discussion of...
Chapter
The choice of one test statistic rather than another affects the design sensitivity, as it affects the power and efficiency of a randomization test. This is demonstrated by computing the design sensitivity for several competing test statistics in the same sampling situation. Familiar test statistics, such as Wilcoxon’s signed rank statistic and the...
Chapter
In a well-designed experiment or observational study, competing theories make conflicting predictions. Several examples, some quite old, are used to illustrate. Also discussed are: the goals of replication, empirical studies of reasons for effects, and the importance of systemic knowledge in eliminating errors.
Chapter
An observational study has two evidence factors if it permits two essentially independent tests of the null hypothesis of no treatment effect, where each test is unaffected by some unmeasured bias that would invalidate the other test. Because the two tests are essentially independent, the evidence they provide—their hypothesis tests and sensitivity...
Chapter
As a prelude to several chapters describing the construction of a matched control group, the current chapter presents an example of a matched observational study as it might (and did) appear in a scientific journal. When reporting a matched observational study, the matching methods are described very briefly in the Methods section. In more detail,...
Article
In an observational study matched for observed covariates, an association between treatment received and outcome exhibited may indicate not an effect caused by the treatment, but merely some bias in the allocation of treatments to individuals within matched pairs. The evidence that distinguishes moderate biases from causal effects is unevenly dispe...
Article
We show that the strength of an instrument is incompletely characterized by the proportion of compliers, and we propose and evaluate new methods that extract more information from certain settings with comparatively few compliers. Specifically, we demonstrate that, for a fixed small proportion of compliers, the presence of an equal number of always...
Article
Background There are known clinical benefits associated with investments in nursing. Less is known about their value. Aims To compare surgical patient outcomes and costs in hospitals with better versus worse nursing resources and to determine if value differs across these hospitals for patients with different mortality risks. Methods Retrospectiv...
Article
Absent randomization, causal conclusions gain strength if several independent evidence factors concur. We develop a method for constructing evidence factors from several instruments plus a direct comparison of treated and control groups, and we evaluate the method’s performance in terms of design sensitivity and simulation. In the application, we c...
Article
A study has two evidence factors if it permits two statistically independent inferences about one treatment effect such that each factor is immune to some bias that would invalidate the other factor. Because the two factors are statistically independent, the evidence they provide may be combined using methods associated with meta-analysis for indep...
Book
This second edition of Design of Observational Studies is both an introduction to statistical inference in observational studies and a detailed discussion of the principles that guide the design of observational studies. An observational study is an empiric investigation of effects caused by treatments when randomized experimentation is unethical o...
Article
Background: Teaching hospitals typically pioneer investment in new technology and cultivate workforce characteristics generally associated with better quality, but the value of this extra investment is unclear. Objective: Compare outcomes and costs between major teaching and non-teaching hospitals by closely matching on patient characteristics....
Article
Objective: To compare outcomes and costs between major teaching and nonteaching hospitals on a national scale by closely matching on patient procedures and characteristics. Background: Teaching hospitals have been shown to often have better quality than nonteaching hospitals, but cost and value associated with teaching hospitals remains unclear....
Article
Using a small example as an illustration, this article reviews multivariate matching from the perspective of a working scientist who wishes to make effective use of available methods. The several goals of multivariate matching are discussed. Matching tools are reviewed, including propensity scores, covariate distances, fine balance, and related met...
Article
Background: Children with complex chronic conditions (CCCs) utilize a disproportionate share of hospital resources. Objective: We asked whether some hospitals display a significantly different pattern of resource utilization than others when caring for similar children with CCCs admitted for medical diagnoses. Research design: Using Pediatric...
Article
Objective: To determine whether outcomes achieved by new surgeons are attributable to inexperience or to differences in the context in which care is delivered and patient complexity. Background: Although prior studies suggest that new surgeon outcomes are worse than those of experienced surgeons, factors that underlie these phenomena are poorly...
Article
Multivariate matching in observational studies tends to view covariate differences symmetrically: a difference in age of 10 years is thought equally problematic whether the treated subject is older or younger than the matched control. If matching is correcting an imbalance in age, such that treated subjects are typically older than controls, then t...
Article
MINI: Duty hour reform resulted in substantial changes in surgical education. In this difference-in-differences study, we examine the outcomes of patients treated by new surgeons who trained before and after duty reform. New surgeons trained after the duty hour reform achieved similar clinical results to those trained before the reform when compare...
Article
Observational or nonrandomized studies of treatment effects are often constructed with the aid of polynomial-time algorithms that optimally form matched treatment-control pairs or matched sets. Because each observational comparison may potentially be affected by bias, investigators often reinforce a single comparison with an additional comparison t...
Article
Policy Points • Patients with low socioeconomic status (SES) experience poorer survival rates after diagnosis of breast cancer, even when enrolled in Medicare and Medicaid. • Most of the difference in survival is due to more advanced cancer on presentation and the general poor health of lower SES patients, while only a very small fraction of the S...
Article
In observational studies of treatment effects, it is common to have several outcomes, perhaps of uncertain quality and relevance, each purporting to measure the effect of the treatment. A single planned combination of several outcomes may increase both power and insensitivity to unmeasured bias when the plan is wisely chosen, but it may miss opport...
Article
Background: There are numerous definitions of multimorbidity (MM). None systematically examines specific comorbidity combinations accounting for multiple testing when exploring large datasets. Objectives: Develop and validate a list of all single, double, and triple comorbidity combinations, with each individual qualifying comorbidity set (QCS)...
Article
Full-text available
Background Coronary atherosclerosis raises the risk of acute myocardial infarction (AMI), and is usually included in AMI risk‐adjustment models. Percutaneous coronary intervention (PCI) does not cause atherosclerosis, but may contribute to the notation of atherosclerosis in administrative claims. We investigated how adjustment for atherosclerosis a...
Data
Table S1. Creation of Study Cohort Table S2. Characteristics of the Study Cohort Table S3. Hierarchical Models With and Without Atherosclerosis Table S4. Logistic Models Predicting 30‐Day Mortality Table S5. Directly Standardized Analysis of Logistic Models Comparing Outcomes at PCI Hospitals and Non‐PCI Hospitals Table S6. Adjusted Outcome Ra...
Article
Full-text available
Causal effects are commonly defined as comparisons of the potential outcomes under treatment and control, but this definition is threatened by the possibility that the treatment or control condition is not well-defined, existing instead in more than one version. A simple, widely applicable analysis is proposed to address the possibility that the tr...
Article
Weak instruments produce causal inferences that are sensitive to small failures of the assumptions underlying an instrumental variable, so strong instruments are preferred. The possibility of strengthening an instrument at the price of a reduced sample size has been proposed in the statistical literature and used in the medical literature, but ther...
Article
Kidney transplant recipients often receive antibody induction. Previous studies of induction therapy were often limited by short follow-up and/or absence of information about complications. After linking Organ Procurement and Transplantation Network data with Medicare claims, we compared outcomes between three induction therapies for kidney recipie...
Article
We discuss observational studies that test many causal hypotheses, either hypotheses about many outcomes or many treatments. To be credible an observational study that tests many causal hypotheses must demonstrate that its conclusions are neither artifacts of multiple testing nor of small biases from nonrandom treatment assignment. In a sense that...
Article
Background: With increasing Medicaid coverage, it has become especially important to determine if racial differences exist within the Medicaid system. We asked if disparities exist in hospital practice and patient outcomes between matched black and white Medicaid children with chronic conditions undergoing surgery. Study design: A matched cohort...
Article
Effect modification occurs when the magnitude or stability of a treatment effect varies as a function of an observed covariate. Generally, larger and more stable treatment effects are insensitive to larger biases from unmeasured covariates, so a causal conclusion may be considerably firmer if effect modification is noted when it occurs. We propose...
Article
Bayesian models are increasingly fit to large administrative data sets and then used to make individualized recommendations. In particular, Medicare's Hospital Compare webpage provides information to patients about specific hospital mortality rates for a heart attack or Acute Myocardial Infarction (AMI). Hospital Compare's current recommendations a...
Article
Background and objectives: Black children with asthma comprise one-third of all asthma patients in Medicaid. With increasing Medicaid coverage, it has become especially important to monitor Medicaid for differences in hospital practice and patient outcomes by race. Methods: A multivariate matched cohort design, studying 11 079 matched pairs of c...
Article
Objectives: With differential payment between Medicaid and Non-Medicaid services, we asked whether style-of-practice differs between similar Medicaid and Non-Medicaid children with complex chronic conditions (CCCs) undergoing surgery. Summary of background data: Surgery in children with CCCs accounts for a disproportionately large percentage of...
Article
In a sensitivity analysis in an observational study with a binary outcome, is it better to use all of the data or to focus on subgroups that are expected to experience the largest treatment effects? The answer depends on features of the data that may be difficult to anticipate, a trade-off between unknown effect-sizes and known sample sizes. We pro...
Article
In an observational study of the effects caused by treatments, a sensitivity analysis asks about the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naive analysis that presumes adjustments for measured covariates remove all biases. When there are two or more outcomes in an observational stud...
Article
Fisher tested the fit of Gaussian linear models using replicated observations. We refine this method by (1) constructing near-replicates using an optimal nonbipartite matching and (2) defining a distance that focuses on predictors important to the model's predictions. Near-replicates may not exist unless the predictor set is low-dimensional; the te...
Article
Importance: Asthma is the most prevalent chronic illness among children, remaining a leading cause of pediatric hospitalizations and representing a major financial burden to many health care systems. Objective: To implement a new auditing process examining whether differences in hospital practice style may be associated with potential resource s...
Article
Background and objectives: With American children experiencing increased Medicaid coverage, it has become especially important to determine if practice patterns differ between Medicaid and non-Medicaid patients. Auditing such potential differences must carefully compare like patients to avoid falsely identifying suspicious practice patterns. We as...
Article
There is effect modification if the magnitude or stability of a treatment effect varies systematically with the level of an observed covariate. \ A larger or more stable treatment effect is typically less sensitive to bias from unmeasured covariates, so it is important to recognize effect modification when it is present. \ We illustrate a recent pr...
Article
Full-text available
Objective: To improve the predictions provided by Medicare's Hospital Compare (HC) to facilitate better informed decisions regarding hospital choice by the public. Data sources/setting: Medicare claims on all patients admitted for Acute Myocardial Infarction between 2009 through 2011. Study design: Cohort analysis using a Bayesian approach, co...
Article
Modern methods construct a matched sample by minimizing the total cost of a flow in a network, finding a pairing of treated and control individuals that minimizes the sum of within-pair covariate distances subject to constraints that ensure distributions of covariates are balanced. In aggregate, these methods work well; however, they can exhibit a...
Article
Objective: To develop a method to allow a hospital to compare its performance using its entire patient population to the outcomes of very similar patients treated elsewhere. Data sources/setting: Medicare claims in orthopedics and common general, gynecologic, and urologic surgery from Illinois, New York, and Texas from 2004 to 2006. Study desig...
Article
Importance The literature suggests that hospitals with better nursing work environments provide better quality of care. Less is known about value (cost vs quality).Objectives To test whether hospitals with better nursing work environments displayed better value than those with worse nursing environments and to determine patient risk groups associ...
Article
Full-text available
Bayesian models are increasing fit to large administrative data sets and then used to make individualized recommendations. For instance, Medicare's Hospital Compare webpage provides information to patients about specific hospital mortality rates for a heart attack or Acute Myocardial Infarction (AMI). Hospital Compare's current recommendations are...
Article
An effect modifier is a pretreatment covariate that affects the magnitude of the treatment effect or its stability. When there is effect modification, an overall test that ignores an effect modifier may be more sensitive to unmeasured bias than a test that combines results from subgroups defined by the effect modifier. If there is effect modificati...
Article
A common practice with ordered doses of treatment and ordered responses, perhaps recorded in a contingency table with ordered rows and columns, is to cut or remove a cross from the table, leaving the outer corners-that is, the high-versus-low dose, high-versus-low response corners-and from these corners to compute a risk or odds ratio. This little...
Article
The informal folklore of observational studies claims that if an irrelevant observed covariate is left uncontrolled, say unmatched, then it will influence treatment assignment in haphazard ways, thereby diminishing the biases from unmeasured covariates. We prove a result along these lines: it is true, in a certain sense, to a limited degree, under...
Article
Claims based on observational studies that a treatment has certain effects are often met with counterclaims asserting that the treatment is without effect, that associations are produced by biased treatment assignment. Some counterclaims undermine themselves in the following specific sense: presuming the counterclaim to be true may strengthen the s...
Chapter
The propensity score is the conditional probability of exposure to treatment rather than control given observed covariates, or more generally, the conditional probability of selection into a group given observed covariates. It is used in an effort to adjust for nonrandom treatment assignment or nonrandom selection. Matching or stratifying on the sc...
Article
Racial disparities in general surgical outcomes are known to exist but not well understood. To determine if black-white disparities in general surgery mortality for Medicare patients are attributable to poorer health status among blacks on admission or differences in the quality of care provided by the admitting hospitals. Matched cohort study usin...
Article
An observational study draws inferences about treatment effects when treatments are not randomly assigned, as they would be in a randomized experiment. The naive analysis of an observational study assumes that adjustments for measured covariates suffice to remove bias from nonrandom treatment assignment. A sensitivity analysis in an observational s...
Article
In a well-conducted, slightly idealized, randomized experiment, the only explanation of an association between treatment and outcome is an effect caused by the treatment. However, this is not true in observational studies of treatment effects, in which treatment and outcomes may be associated because of some bias in the assignment of treatments to...
Article
Every newly trained surgeon performs her first unsupervised operation. How do the health outcomes of her patients compare with the patients of experienced surgeons? Using data from 498 hospitals, we compare 1252 pairs comprised of a new surgeon and an experienced surgeon working at the same hospital. We introduce a new form of matching that matches...
Article
A natural experiment is a type of observational study in which treatment assignment, though not randomized by the investigator, is plausibly close to random. A process that assigns treatments in a highly nonrandom, inequitable manner may, in rare and brief moments, assign aspects of treatments at random or nearly so. Isolating those moments and asp...
Article
Differences in colon cancer survival by race are a recognized problem among Medicare beneficiaries. To determine to what extent the racial disparity in survival is due to disparity in presentation characteristics at diagnosis or disparity in subsequent treatment. Black patients with colon cancer were matched with 3 groups of white patients: a "demo...
Article
In Reply Our study evaluated the association between anesthesia technique and outcome among patients with hip fracture, observing an indeterminate effect of anesthesia technique on 30-day mortality and a shorter inpatient length of stay with regional anesthesia. One of our analyses used geographic proximity to hospitals that used more regional anes...
Article
Objective: Develop an improved method for auditing hospital cost and quality tailored to a specific hospital's patient population. Data sources/setting: Medicare claims in general, gynecologic and urologic surgery, and orthopedics from Illinois, New York, and Texas between 2004 and 2006. Study design: A template of 300 representative patients...
Article
An instrument or instrumental variable is often used in an effort to avoid selection bias in inference about the effects of treatments when treatment choice is based on thoughtful deliberation. Instruments are increasingly used in health outcomes research. An instrument is a haphazard push to accept one treatment or another, where the push can affe...
Chapter
The propensity score is the conditional probability of exposure to treatment rather than control given observed covariates, or more generally, the conditional probability of selection into a group given observed covariates. It is used in an effort to adjust for nonrandom treatment assignment or nonrandom selection. Matching or stratifying on the sc...
Chapter
An observational study is an empiric investigation that attempts to estimate the effects caused by a treatment when it is not possible to perform an experiment. Random assignment of subjects to treatment or control in an experiment ensures that comparable groups of subjects are compared under alternative treatments. Without random assignment, in an...
Article
In a nonrandomized or observational study, a weak association between receipt of the treatment and an outcome may be explained not as effects caused by the treatment but rather by a small bias in the assignment of individuals to treatment or control; however, a strong association may be explained as noncausal only by a large bias. The strength of t...
Article
Importance: More than 300,000 hip fractures occur each year in the United States. Recent practice guidelines have advocated greater use of regional anesthesia for hip fracture surgery. Objective: To test the association of regional (ie, spinal or epidural) anesthesia vs general anesthesia with 30-day mortality and hospital length of stay after h...

## Citations

... Many authors have speculated on why results would differ dramatically between the trial and observational study. Some major concerns include: (i) potential bias due to unmeasured confounding in observational studies (Humphrey et al., 2002;Rutter, 2007;Yu et al., 2021); (ii) biological differences between trial participants and those in the observational study (Michels and Manson, 2003); (iii) differences in time since menopause at hormone therapy initiation (Prentice et al., 2005;Willett et al., 2006;Rossouw et al., 2007;Hernán et al., 2008;Prentice et al., 2009). Table 1 summarizes some important baseline covariates in the WHI trial and associated observational study, and illustrates some of these concerns. ...
... A small λ value gives priority to matched samples' internal validity, while a large λ value generalizability to the target population. A similar weighting scheme is also used in Zhang et al. (2021) but for a different purpose. ...
... To address analogy, different hypertensive drugs of the same class, say a different beta blocker, might be used, in both designs, to better probe the "logic" of the disease. Rosenbaum (2015) reflected upon Cochran's idea of a "causal crossword" to better articulate an intricately woven argument showing how cigarette smoking might be causally related to lung cancer. Below, Rosenbaum's words appear in italics but are interrupted to allow relevant Hillian and Campbellian commentary. ...
... Finally, we believe another strength of our study is that our causality as a whole is not affected by unobserved covariates. We measured the Wilcoxon Signed Rank P-Value based on the Rosenbaum Sensitivity test [82], which estimates the hidden bias and explains the impact it has on our results. In spite of the high gamma levels, no unobserved confounding factors have been identified by the P-values in our study. ...