Content uploaded by Carlos Cinelli

Author content

All content in this area was uploaded by Carlos Cinelli on Apr 28, 2020

Content may be subject to copyright.

JSS Journal of Statistical Software

MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00

sensemakr: Sensitivity Analysis Tools for OLS in R

and Stata

Carlos Cinelli

University of California, Los Angeles

Jeremy Ferwerda

Dartmouth College

Chad Hazlett

University of California, Los Angeles

Abstract

This paper introduces the package sensemakr for Rand Stata, which implements a

suite of sensitivity analysis tools for regression models developed in Cinelli and Hazlett

(2020a). Given a regression model, sensemakr can compute sensitivity statistics for rou-

tine reporting, such as the robustness value, which describes the minimum strength that

unobserved confounders need to have to overturn a research conclusion. The package also

provides plotting tools that visually demonstrate the sensitivity of point estimates and

t-values to hypothetical confounders. Finally, sensemakr implements formal bounds on

sensitivity parameters by means of comparison with the explanatory power of observed

variables. All these tools are based on the familiar “omitted variable bias” framework, do

not require assumptions regarding the functional form of the treatment assignment mech-

anism nor the distribution of the unobserved confounders, and naturally handle multiple,

non-linear confounders. With sensemakr, users can transparently report the sensitivity

of their causal inferences to unobserved confounding, thereby enabling a more precise,

quantitative debate as to what can be concluded from imperfect observational studies.

Keywords: causal inference, sensitivity analysis, omitted variable bias, robustness value, R,

Stata, bounds.

1. Introduction

Across disciplines, investigators face the perennial challenge of making and defending causal

claims using observational data. The most common identiﬁcation strategy in these circum-

stances is to adjust for a set of observed covariates deemed suﬃcient to control for confounding,

with linear regression remaining among the most popular statistical method for making such

adjustments. Researchers who argue that a regression coeﬃcient unbiasedly reﬂects a causal

relationship must also be able to argue that there are no unobserved confounders—a diﬃcult

2sensemakr: Sensitivity Analysis Tools for OLS

or impossible assumption to defend in most applied settings.1What value can we draw from

these studies, knowing that this ideal condition is likely to fail? Fortunately, the assumption

of zero unobserved confounding need not hold precisely for an observational study to remain

substantively informative. In these cases, sensitivity analyses play a useful role by allowing

researchers to quantify how strong unobserved confounding needs to be in order to substan-

tially change a research conclusion, and by aiding in determining whether confounding of such

strength is plausible.

Although numerous methods for sensitivity analyses have been proposed, these tools are still

under-utilized.2As argued in Cinelli and Hazlett (2020a), several reasons may contribute to

the low adoption of these methods. First, many of these methods impose complicated and

strong assumptions regarding the nature of the confounder, which many users cannot or are

not willing to defend. Second, while users routinely report regression tables or coeﬃcient plots,

until recently investigators have lacked “standard” quantities that can easily and correctly

summarize the robustness of a regression coeﬃcient to unobserved confounding. Finally,

connecting the results of a formal sensitivity analysis to a cogent argument about what types

of confounders may exist in one’s research project remains diﬃcult, particularly when there

are no compelling arguments as to why the treatment assignment should be approximately

“ignorable,”“exogeneous,” or “as-if random.”

This paper introduces the Rand Stata package sensemakr (Cinelli, Ferwerda, and Hazlett

2020b,a), which implements a suite of sensitivity analysis tools proposed in Cinelli and Hazlett

(2020a) to address these challenges. Within the familiar regression framework and without

the need for additional assumptions, sensemakr enables analysts to easily answer a variety of

common sensitivity questions, such as:

•How strong would an unobserved confounder (or a group of confounders) have to be to

change a research conclusion?

•In a worst-case scenario, how robust are the results to all unobserved confounders acting

together, possibly non-linearly?

•How strong would confounding need to be, relative to the strength of observed covariates,

to change the answer by a certain amount?

Speciﬁcally, given a full regression model, or simply standard statistics found in conventional

regression tables, sensemakr is able to: (i) compute sensitivity statistics for routine reporting,

such as the robustness value describing the minimum strength that unobserved confounders

would need to have to overturn the research conclusions; (ii) provide graphical tools that

1This condition is also known as “selection on observables,”“conditional igorability,”“conditional exogene-

ity,”“conditional exchangeability,” or “backdoor admissibility” (Angrist and Pischke 2008;Pearl 2009;Imbens

and Rubin 2015;Hern´an and Robins 2020).

2Dating back to at least Cornﬁeld, Haenszel, Hammond, Lilienfeld, Shimkin, and Wynder (1959), a par-

tial list of sensitivity analysis proposals includes Rosenbaum and Rubin (1983); Robins (1999); Frank (2000);

Rosenbaum (2002); Imbens (2003); Brumback, Hern´an, Haneuse, and Robins (2004); Frank, Sykes, Anagnos-

topoulos, Cannata, Chard, Krause, and McCrory (2008); Hosman, Hansen, and Holland (2010); Imai, Keele,

Yamamoto et al. (2010); Vanderweele and Arah (2011); Blackwell (2013); Frank, Maroulis, Duong, and Kelcey

(2013); Carnegie, Harada, and Hill (2016); Dorie, Harada, Carnegie, and Hill (2016); Middleton, Scott, Diakow,

and Hill (2016); Oster (2017); Cinelli, Kumor, Chen, Pearl, and Bareinboim (2019); Franks, D’Amour, and

Feller (2019).

Journal of Statistical Software 3

enable users to visually explore the implications of unobserved confounding, such as contour

plots showing adjusted point estimates and t-values under confounding of various strengths,

as well as plots showing adjusted estimates under extreme (pessimistic) scenarios; and (iii)

place formal bounds on the maximum strength of confounding, based on plausibility judg-

ments regarding how unobserved confounders compare with observed variables. These tools

do not require additional assumptions regarding the functional form of the treatment assign-

ment mechanism nor on the distribution of the unobserved confounders, and naturally handle

multiple confounders, possibly acting non-linearly.

In what follows, Section 2brieﬂy reviews the omitted variable bias framework for sensitivity

analysis developed in Cinelli and Hazlett (2020a), which provides the theoretical foundations

for the tools in sensemakr. Next, Section 3describes the basic functionality and provides a

practical introduction to sensitivity analysis using sensemakr for R. Section 4describes ad-

vanced usage of the Rpackage, and shows how to leverage individual functions for customized

sensitivity analyses. Finally, Section 5describes sensemakr for Stata, and Section 6concludes

with a brief discussion of what sensitivity analysis can and cannot do in practice.

2. Sensitivity analysis in an omitted variable bias framework

In this section, we brieﬂy review the omitted variable bias (OVB) framework for sensitivity

analysis presented in Cinelli and Hazlett (2020a). This method builds on a scale-free reparam-

eterization of the OVB formula in terms of partial R2values, which allows us to: (i) assess the

sensitivity of point estimates, t-values, and conﬁdence intervals under the same conceptual

framework; (ii) easily assess the sensitivity of multiple confounders acting together, possibly

non-linearly; (iii) exploit knowledge of the relative strength of variables to posit plausible

bounds on unobserved confounding; and (iv) construct a set of summary sensitivity statistics

suitable for routine reporting.

2.1. The OVB framework

The starting point of our analysis is a “full” linear regression model of an outcome Yon a

treatment D, controlling for a set of covariates given by both Xand Z,

Y= ˆτD +Xˆ

β+ ˆγZ + ˆfull (1)

where Yis an (n×1) vector containing the outcome of interest for each of the nobservations

and Dis an (n×1) treatment variable (which may be continuous or binary); Xis an (n×p)

matrix of observed covariates including the constant; and Zis a single (n×1) unobserved

covariate (we discuss how to extend results for a multivariate Zbelow).

Equation 1is the regression model that the investigator wished she had run to obtain a valid

causal estimate of the eﬀect of Don Y. Nevertheless, Zis unobserved. Therefore, the feasible

regression the investigator is able to estimate is the “restricted“ model omitting Z, that is,

Y= ˆτresD+Xˆ

βres + ˆres (2)

Given the discrepancy of what we wish to know and what we actually have, the main question

we would like to answer is: how do the observed point estimate and standard error of the

4sensemakr: Sensitivity Analysis Tools for OLS

restricted regression, ˆτres and bse(ˆτres ), compare to the desired point estimate and standard

error of the full regression, ˆτand bse(ˆτ)?

OVB with the partial R2 parameterization

Deﬁne as d

bias the diﬀerence between the full and restricted estimates, d

bias := ˆτres −ˆτ.

Now let (i) R2

D∼Z|Xdenote the share of residual variance of the treatment Dexplained by

the omitted variable Z, after accounting for the remaining covariates X; and, (ii) R2

Y∼Z|D,X

denote the share of residual variance of the outcome Yexplained by the omitted variable Z,

after accounting for Xand D.Cinelli and Hazlett (2020a) have shown that these quantities

are suﬃcient for determining the bias, adjusted estimate, and adjusted standard errors of the

full regression of Equation 1.

More precisely, the bias can be written as,

|d

bias|=bse(ˆτres)v

u

u

tR2

Y∼Z|D,XR2

D∼Z|X

1−R2

D∼Z|X

(df) (3)

Where df stands for the degrees of freedom of the restricted regression actually run. Moreover,

the estimated standard error of ˆτcan be recovered with,

bse(ˆτ) = bse(ˆτres )v

u

u

t1−R2

Y∼Z|D,X

1−R2

D∼Z|Xdf

df −1.(4)

Given hypothetical values of R2

D∼Z|Xand R2

Y∼Z|D,X, Equations 3and 4allow investigators

to examine the sensitivity of point estimates and standard-errors (and consequently t-values,

conﬁdence intervals or p-values) to the inclusion of any omitted variable Zwith such strengths.

Conversely, given a critical threshold deemed to be problematic, one can ﬁnd the strength of

confounders capable of bringing about a bias reducing the adjusted eﬀect to that threshold.

Another useful property of the OVB formula with the partial R2parameterization is that the

eﬀect of R2

Y∼Z|D,Xon the bias is bounded. This allows investigators to contemplate extreme

sensitivity scenarios, in which the parameter R2

Y∼Z|D,Xis set to 1 (or another conservative

value), and see what happens as R2

D∼Z|Xvaries.

2.2. Sensitivity statistics for routine reporting

The previous formulas can be used to assess the sensitivity of an estimate to confounders with

any hypothesized strength. However, making sensitivity analyses standard practice beneﬁts

from simple and interpretable sensitivity statistics that can quickly summarize the robustness

of a study result to unobserved confounding. With this in mind, Cinelli and Hazlett (2020a)

propose two main sensitivity statistics for routine reporting: (i) the (observed) partial R2

of the treatment with the outcome, R2

Y∼D|X; and, (ii) the robustness value,RVq,α. These

statistics serve two main purposes:

Journal of Statistical Software 5

1. They can be easily displayed alongside other summary statistics in regression tables,

making sensitivity analysis to unobserved confounding simple, accessible, and standard-

ized;

2. They can be easily computed from quantities found in a regression table, thereby en-

abling readers and reviewers to assess the sensitivity of results they see in print, even if

the original authors did not perform sensitivity analyses.

The partial R2 of the treatment with the outcome

In addition to quantifying how much variation of the outcome is explained by the treatment,

the partial R2of the treatment with the outcome also conveys how robust the point estimate

is to unobserved confounding in an “extreme scenario.” Speciﬁcally, suppose the unobserved

confounder Zexplains all residual variance of the outcome, that is, RY∼Z|D,X= 1. For

this confounder to bring the point estimate to zero, it must explain at least as much residual

variation of the treatment as the residual variation of the outcome that the treatment currently

explains. Put diﬀerently, if RY∼Z|D,X= 1, then we must have that R2

D∼Z|X≥R2

Y∼D|X,

otherwise this confounder cannot logically account for all the observed association between

the treatment and the outcome (Cinelli and Hazlett 2020a).

The Robustness Value

The second sensitivity statistic proposed in Cinelli and Hazlett (2020a) is the robustness value.

The robustness value RVq,α quantiﬁes the minimal strength of association that the confounder

needs to have, both with the treatment and with the outcome, so that a conﬁdence interval

of level αincludes a change of q% of the current estimated value.

Let fq:= q|fY∼D|X|, where |fY∼D|X|is the partial Cohen’s f of the treatment with the

outcome multiplied by the percentage reduction qdeemed to be problematic.3Also, let

|t∗

α,df−1|denote the t-value threshold for a t-test with signiﬁcance level of αand df −1 degrees

of freedom, and deﬁne f∗

α,df−1:= |t∗

α,df−1|/√df −1. Finally, construct fq,α, which “deducts”

from fY∼D|Xboth the proportion of reduction qof the point estimate and the boundary below

which statistical signiﬁcance is lost at the level of α. That is, fq,α := fq−f∗

α,df−1. We then

have that RVq,α is given by (Cinelli and Hazlett 2020a,b),

RVq,α =

0,if fq,α <0

1

2qf4

q,α + 4f2

q,α −f2

q,α,if fq<1/f∗

α,df−1

f2

q−f∗2

α,df−1

1 + f2

q

,otherwise.

(5)

Any confounder that explains RVq,α% of the residual variance of both the treatment and of

the outcome is suﬃciently strong to make the adjusted t-test not reject the null hypothesis

H0:τ= (1 −q)|ˆτres |at the αlevel (or, equivalently, suﬃciently strong to make the adjusted

1−αconﬁdence interval include (1 −q)|ˆτres |). Likewise, a confounder with associations lower

3The partial Cohen’s f2can be written as f2

Y∼D|X=R2

Y∼D|X/(1 −R2

Y∼D|X)

6sensemakr: Sensitivity Analysis Tools for OLS

than RVq,α is not capable of overturning the conclusion of such a test. Setting α= 1 returns

the robustness value for the point estimate. Further details on how to interpret the robustness

value in practice are given in the next sections.

2.3. Bounds on the strength of confounding using observed covariates

Consider a confounder orthogonal to the observed covariates, ie., Z⊥X, or, equivalently,

consider only the part of Znot linearly explained by X. Now denote by Xja speciﬁc covariate

of the set Xand deﬁne

kD:= R2

D∼Z|X−j

R2

D∼Xj|X−j

, kY:= R2

Y∼Z|X−j,D

R2

Y∼Xj|X−j,D

.(6)

where X−jrepresents the vector of covariates Xexcluding Xj. That is, the terms kDand kY

represent how strong the confounder Zis relative to observed covariate Xj, where “strength”

is measured by how much residual variation they explain of the treatment (for kD) and of

the outcome (for kY). Given kDand kY, we can rewrite the strength of the confounders as

(Cinelli and Hazlett 2020a),

R2

D∼Z|X=kDf2

D∼Xj|X−j, R2

Y∼Z|D,X≤η2f2

Y∼Xj|X−j,D (7)

where ηis a scalar which depends on kY,kDand R2

D∼Xj|X−j.

These equations allow the investigator to assess the maximum bias that a hypothetical con-

founder at most “k times” as strong as a particular covariate Xjcould cause. This can be

used to explore the relative strength of confounding necessary for bias to have changed the

research conclusion. Furthermore, when the researcher has domain knowledge to argue that

a certain covariate Xjis particularly important in explaining treatment or outcome varia-

tion, and that omitted variables cannot explain as much residual variance of Dor Yas that

observed covariate, these results can be used to set plausible bounds in the total amount of

confounding. The same inequalities hold if one uses a group of variables for benchmarking,

by simply replacing the individual partial R2with the group partial R2of those variables

(Cinelli and Hazlett 2020b).

2.4. Multiple or non-linear confounders

Finally, suppose that, instead of a single unobserved confounder Z, there are multiple unob-

served confounders Z= [Z1, Z2, . . . , Zk]. In this case, the regression the investigator wished

she had run becomes:

Y= ˆτD +Xˆ

β+Zˆγ+ ˆfull .(8)

As Cinelli and Hazlett (2020a) show, the previous results considering a single unobserved

confounder are in fact conservative when considering the impact of multiple confounders,

barring an adjustment in the degrees of freedom of Equation 4. Moreover, since the vector

Zis arbitrary, this can also accommodate non-linear confounders or even misspeciﬁcation of

the functional form of the observed covariates X. In other words, to assess the maximum

bias that multiple, non-linear confounders could cause in our current estimates, it suﬃces to

Journal of Statistical Software 7

think in terms of the maximum explanatory power that Zcould have in the treatment and

outcome regressions, as parameterized by R2

D∼Z|Xand R2

Y∼Z|D,X.

3. sensemakr for R: Basic functionality

In this section we illustrate the basic functionality of sensemakr for R. Given that sensitivity

analysis requires contextual knowledge to be properly interpreted, we illustrate these tools

with a real example. We use sensemakr to reproduce all results found in Section 5 of Cinelli

and Hazlett (2020a), which estimates the eﬀects of exposure to violence on attitudes towards

peace, in Darfur, Sudan. Further details about this application and the data can be found in

Hazlett (2019).

3.1. Violence in Darfur: data and research question

In 2003 and 2004, the Darfurian government orchestrated a horriﬁc campaign of violence

against civilians, killing an estimated two hundred thousand people. This application asks

whether, on average, being directly injured or maimed in this episode made individuals more

likely to feel“vengeful” and unwilling to make peace with those who perpetrated this violence.

Or, might those who directly suﬀered such violence be motivated to see it end, supporting

calls for peace?

The sensemakr package provides the data required for this example based on a survey among

Darfurian refugees in eastern Chad (Hazlett 2019). To get started we ﬁrst need to install the

package. From within R, the sensemakr package can be installed from the Comprehensive R

Archive Network (CRAN).

R> install.packages("sensemakr")

After loading the package, the data can be loaded with the command data("darfur").

R> library(sensemakr)

R> data("darfur")

The “treatment” variable of interest is directlyharmed, which indicates whether the individ-

ual was physically injured or maimed during the attack on her or his village in Darfur. The

main outcome of interest is peacefactor, an index measuring pro-peace attitudes. Other

covariates in the data include: village (a factor variable indicating the original village of

the respondent), female (a binary indicator of gender), age,herder_dar (whether they were

a herder in Darfur), farmer_dar (whether they were a farmer in Darfur), and past_voted

(whether they report having voted in an earlier election, prior to the conﬂict). For further

details, see ?darfur.

Hazlett (2019) argues that the purpose of these attacks was to punish civilians from ethnic

groups presumed to support the opposition and to kill or drive these groups out so as to reduce

this support. Violence against civilians included aerial bombardments by the government as

well as assaults by the Janjaweed, a pro-government militia. For this example, suppose a

researcher argues that, while some villages were more or less intensively attacked, within

village violence was largely indiscriminate. The bombings were crude, could not be ﬁnely

8sensemakr: Sensitivity Analysis Tools for OLS

Dependent variable:

peacefactor

directlyharmed 0.097∗∗∗

(0.023)

female −0.232∗∗∗

(0.024)

Observations 1,276

R20.512

Residual Std. Error 0.310 (df = 783)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Table 1: OLS results for darfur.model. To conserve space, only the results for

directlyharmed and female are shown.

targeted below the level of village, and the strategic purpose of the attacks was not kill

or capture speciﬁc individuals. Similarly, the Janjaweed had no reason to target certain

individuals rather than others, and no information with which to do so, with one major

exception—women were targeted and often subjected to sexual violence.

Supported by these considerations, this researcher may argue that adjusting for village and

female is suﬃcient for control of confounding, and run the following linear regression model

(in which other pre-treatment covariates, although not necessary for identiﬁcation, are also

included):

R> darfur.model <- lm(peacefactor ~ directlyharmed + village + female +

R+ age + farmer_dar + herder_dar +

R+ pastvoted + hhsize_darfur,

R+ data = darfur)

This regression model results in the estimates shown in Table 1. According to this model,

those who were directly harmed in violence were on average more “pro-peace,” not less.

The threat of unobserved confounders

The previous estimate requires the assumption of no unobserved confounders for unbiasedness.

While supported by the claim that there is no targeting of violence within village and gender

strata, other investigators may challenge this account. For example, although the bombing

was crude, perhaps bombs were still more likely to hit the center of the village, and those in

the center were also likely to hold diﬀerent attitudes towards peace. Or, it could be the case

that the Janjaweed observed signals that indicate individual characteristics such as wealth,

and targeted using this information. Or perhaps an individual’s (prior) political attitudes

could have led them to take actions that exposed them to greater risk during the attack. To

complicate things, all these factors could interact with each other or otherwise have other

non-linear eﬀects.

Journal of Statistical Software 9

These concerns suggest that, instead of the previous linear model (darfur.model), we should

have run a model such as:

R> darfur.complete.model <- lm(peacefactor ~ directlyharmed + village +

R+ female + age + farmer_dar + herder_dar +

R+ pastvoted + hhsize_darfur +

R+ center*wealth*political_attitudes,

R+ data = darfur)

Where center*wealth*political_attitudes indicates fully interacted terms for these three

variables. However trying to ﬁt the model darfur.complete.model will result in error: none

of the variables center,wealth or political_attitudes were measured.

Given an assumption on how strongly omitted variables relate to the treatment and the

outcome, how would including them have changed our inferences regarding the coeﬃcient

of directlyharmed? Or, what is the minimal strength that these unobserved confounders

(or all remaining unobserved confounders) need to have to change our previous conclusions?

Additionally, how can we leverage our contextual knowledge about the attacks to judge how

plausible such confounders are? For instance, given the limited opportunities for targeting

and the special role of gender in this case, if we assumed that unobserved confounding cannot

explain more than female, what would this imply about the maximum possible strength of

confounding? We show next how to use sensemakr to answer each of these questions.

3.2. Violence in Darfur: sensitivity analysis

The main function in sensemakr for Ris sensemakr(). This function performs the most

commonly required sensitivity analyses and returns an object of class sensemakr, which

can then be further explored with the print,summary and plot methods (see details in

?print.sensemakr and ?plot.sensemakr). We begin the analysis by applying sensemakr()

to the original regression model, darfur.model.

R> darfur.sensitivity <- sensemakr(model = darfur.model,

R+ treatment = "directlyharmed",

R+ benchmark_covariates = "female",

R+ kd = 1:3,

R+ ky = 1:3,

R+ q = 1,

R+ alpha = 0.05,

R+ reduce = TRUE)

The arguments of this call are:

•model: the lm object with the outcome regression. In our case, darfur.model.

•treatment: the name of the treatment variable. In our case, "directlyharmed".

•benchmark covariates: the names of covariates that will be used to bound the plau-

sible strength of the unobserved confounders. Here, we put "female", which one could

10 sensemakr: Sensitivity Analysis Tools for OLS

argue to be among the main determinants of exposure to violence. It was also found to

be among the strongest determinants of attitudes towards peace empirically. Variables

considered as separate benchmarks can be passed as a single character vector; variables

that should be treated jointly as a group for benchmarks should be passed as named

list of character vectors.

•kd and ky: these arguments parameterize how many times stronger the confounder is

related to the treatment ( kd ) and to the outcome ( ky ) in comparison to the observed

benchmark covariate ( "female" ). In our example, setting kd = 1:3 and ky = 1:3

means we want to investigate the maximum strength of a confounder once, twice, or

three times as strong as female (in explaining treatment and outcome variation). If only

kd is given, ky will be set equal to it by default.

•q: this allows the user to specify what fraction of the eﬀect estimate would have to be

explained away to be problematic. Setting q=1means that a reduction of 100% of the

current eﬀect estimate (i.e. a true eﬀect of zero) would be deemed problematic. The

default is q=1.

•alpha: signiﬁcance level of interest for making statistical inferences. The default is

alpha = 0.05.

•reduce: should we consider confounders acting towards increasing or reducing the ab-

solute value of the estimate? The default is reduce = TRUE, which means we are con-

sidering confounders that pull the estimate towards (or through) zero. Setting reduce

= FALSE will consider confounders that pull the estimate away from zero.

Using the default arguments, one can simplify the previous call to

R> darfur.sensitivity <- sensemakr(model = darfur.model,

R+ treatment = "directlyharmed",

R+ benchmark_covariates = "female",

R+ kd = 1:3)

After running sensemakr(), we can explore the sensitivity analysis results. We note that the

function sensemakr() also has formula and numeric methods. See ?sensemakr for details.

Sensitivity statistics for routine reporting

The print method for sensemakr provides the original (unadjusted) estimate along with three

summary sensitivity statistics suited for routine reporting: (1) the partial R2of the treatment

with the outcome; (2) the robustness value (RV) required to reduce the estimate entirely to

zero (i.e. q= 1); and, (3) the RV beyond which the estimate would no longer be statistically

distinguishable from zero at the 5% level (q= 1, α= 0.05).

R> darfur.sensitivity

Sensitivity Analysis to Unobserved Confounding

Journal of Statistical Software 11

Model Formula: peacefactor ~ directlyharmed + village + female + age + farmer_dar +

herder_dar + pastvoted + hhsize_darfur

Unadjusted Estimates of 'directlyharmed ':

Coef. estimate: 0.097

Standard Error: 0.023

t-value: 4.18

Sensitivity Statistics:

Partial R2 of treatment with outcome: 0.022

Robustness Value, q = 1 : 0.139

Robustness Value, q = 1 alpha = 0.05 : 0.076

For more information, check summary.

The package also provides a function that creates a latex or html table with these results, as

shown in Table 2(for the html table, simply change the argument to format = "html").

R> ovb_minimal_reporting(darfur.sensitivity, format = "latex")

Outcome: peacefactor

Treatment: Est. S.E. t-value R2

Y∼D|XRVq=1 RVq=1,α=0.05

directlyharmed 0.097 0.023 4.184 2.2% 13.9% 7.6%

df = 783 Bound (1x female):R2

Y∼Z|X,D = 12.5%, R2

D∼Z|X= 0.9%

Table 2: Minimal sensitivity analysis reporting.

Together these three sensitivity statistics provide the ingredients for a standard reporting

template proposed in Cinelli and Hazlett (2020a). More precisely:

•The robustness value for bringing the point estimate of directlyharmed exactly to zero

(RVq=1) is 13.9%. This means that unobserved confounders that explain 13.9% of the

residual variance both of the treatment and of the outcome are suﬃciently strong to

explain away all the observed eﬀect. On the other hand, unobserved confounders that

do not explain at least 13.9% of the residual variance both of the treatment and of the

outcome are not suﬃciently strong to do so.

•The robustness value for testing the null hypothesis that the coeﬃcient of directlyharmed

is zero (RVq=1,α=0.05) falls to 7.6%. This means that unobserved confounders that ex-

plain 7.6% of the residual variance both of the treatment and of the outcome are suﬃ-

ciently strong to bring the lower bound of the conﬁdence interval to zero (at the chosen

signiﬁcance level of 5%). On the other hand, unobserved confounders that do not ex-

plain at least 7.6% of the residual variance both of the treatment and of the outcome

are not suﬃciently strong to do so.

•Finally, the partial R2of directlyharmed with peacefactor means that, in an extreme

scenario, in which we assume that unobserved confounders explain all of the left out

12 sensemakr: Sensitivity Analysis Tools for OLS

variance of the outcome, these unobserved confounders would need to explain at least

2.2% of the residual variance of the treatment to fully explain away the observed eﬀect.

These quantities summarize what we need to know in order to safely rule out confounders

that are deemed to be problematic. Researchers can then argue as to whether they fall within

plausible bounds on the maximum explanatory power that unobserved confounders could have

in a given application.

Where investigators are unable to oﬀer strong arguments limiting the absolute strength of

confounding, it can be productive to consider relative claims, for instance, by arguing that

unobserved confounders are likely not multiple times stronger than a certain observed covari-

ate. In our application, this is indeed the case. One could argue that, given the nature of

the attacks, it is hard to imagine that unobserved confounding could explain much more of

the residual variance of targeting than what is explained by the observed variable female.

The lower corner of the table, thus, provides bounds on confounding as strong as female,

R2

Y∼Z|X,D = 12.5%, and R2

D∼Z|X= 0.9%. Since both of those are below the robustness value,

confounders as strong as female are not suﬃcient to explain away the observed estimate.

Moreover, the bound on R2

D∼Z|Xis below the partial R2of the treatment with the outcome,

R2

Y∼D|X. This means that even an extreme confounder explaining all residual variation of

the outcome and as strongly associated with the treatment as female would not overturn the

research conclusions. As noted in Section 2.4, these results are exact for a single unobserved

confounder, and conservative for multiple confounders, possibly acting non-linearly.

Finally, the summary method for sensemakr provides an extensive report with verbal descrip-

tions of all these analyses. Entering the command summary(darfur.sensitivity) produces

verbose output similar to the text explanations in the last several paragraphs (and thus not

reproduced here), so that researchers can directly cite or include such text in their reports.

Sensitivity contour plots of point estimates and t-values

The minimal report of sensitivity results provided by Table 2oﬀers a useful summary of how

robust the current estimate is to unobserved confounding. Researchers can extend and reﬁne

sensitivity analyses through plotting methods for sensemakr that visually explore the whole

range of possible estimates that confounders with diﬀerent strengths could cause. These plots

can also represent diﬀerent bounds on the plausible strength of confounding based on diﬀerent

assumptions on how they compare to observed covariates.

We begin by examining the default plot type, contour plots for the point estimate.

R> plot(darfur.sensitivity)

The resulting plot is shown in the left panel of Figure 1. The horizontal axis shows the

residual share of variation of the treatment that is hypothetically explained by unobserved

confounding, R2

D∼Z|X. The vertical axis shows the hypothetical partial R2of unobserved con-

fouding with the outcome, R2

Y∼Z|X,D. The contours show what estimate for directlyharmed

would have been obtained in the full regression model including unobserved confounders with

such hypothetical strengths. Note the plot is parameterized in way that hurts our preferred

hypothesis, by pulling the estimate towards zero. Recall that the direction of the bias was

determined by the argument reduce = TRUE of the sensemakr() call.

Journal of Statistical Software 13

Partial R2 of confounder(s) with the treatment

Partial R2 of confounder(s) with the outcome

−0.25

−0.2

−0.15

−0.1

−0.05

0.05

0.0 0.1 0.2 0.3 0.4

0.0 0.1 0.2 0.3 0.4

0

Unadjusted

(0.097)

1x female

(0.075)

2x female

(0.053)

3x female

(0.03)

Partial R2 of confounder(s) with the treatment

Partial R2 of confounder(s) with the outcome

−12

−10

−8

−6

−4

−2

0

4

0.0 0.1 0.2 0.3 0.4

0.0 0.1 0.2 0.3 0.4

1.963

Unadjusted

(4.2)

1x female

(3.439)

2x female

(2.6)

3x female

(1.628)

Figure 1: Sensitivity contour plots of point estimate (left) and t-value (right)

The bounds on the strength of confounding, determined by the parameter kd = 1:3 in the

call for sensemakr(), are also shown in the plot. The plot reveals that the direction of the

eﬀect (positive) is robust to confounding once, twice or even three times as strong as the

observed covariate female, although in this last case the magnitude of the eﬀect is reduced

to a third of the original estimate.

We now examine the sensitivity of the t-value for testing the null hypothesis of zero eﬀect by

choosing the option sensitivity.of = "t-value" of the plot() method.

R> plot(darfur.sensitivity, sensitivity.of = "t-value")

The resulting plot is shown in the right of Figure 1. At the 5% signiﬁcance level, the null

hypothesis of zero eﬀect would still be rejected given confounders once or twice as strong

as female. However, while the point-estimate remains positive, accounting for sampling

uncertainty now means that the null hypothesis of zero eﬀect would not be rejected with the

inclusion of a confounder three times as strong as female.

Sensitivity plots of extreme scenarios

Sometimes researchers may be better equipped to make plausibility judgments about the

strength of determinants of the treatment assignment mechanism, and have less knowledge

about the determinants of the outcome. In those cases, sensitivity plots using extreme sce-

narios are a useful option. These are produced with the option type = extreme. Here one

assumes confounding explains all or some large fraction of the residual variance of the out-

come, then vary how strongly such confounding is hypothetically related to the treatment to

see how this aﬀects the resulting point estimate.

R> plot(darfur.sensitivity, type = "extreme")

14 sensemakr: Sensitivity Analysis Tools for OLS

0.00 0.02 0.04 0.06 0.08 0.10

−0.10 −0.05 0.00 0.05 0.10

Partial R2 of confounder(s) with the treatment

Adjusted effect estimate

Partial R2 of confounder(s) with the outcome

100% 75% 50%

Figure 2: Sensitivity analysis to extreme scenarios.

Figure 2shows the produced plot. By default these plots consider confounding that explains

100%, 75%, and 50% of the residual variance of the outcome, producing three separate curves.

This is equivalent to setting the argument r2yz.dx = c(1, .75, .5). The bounds on the

strength of association of a confounder once, twice or three times as strongly associated with

the treatment as female are shown as red ticks in the horizontal axis. As the plot shows, even

in the most extreme case (R2

Y∼Z|X,D = 100%), confounders would need to be more than twice

as strongly associated with the treatment as female to fully explain away the point estimate.

Moving to the scenarios R2

Y∼Z|X,D = 75% and R2

Y∼Z|X,D = 50%, confounders would need to

be more than three times as strongly associated with the treatment as female to fully explain

away the point estimate.

Group benchmarks

Users can also use a group of variables collectively as benchmarks, by providing a named list of

character vectors to the benchmark_covariates argument. Each character vector of the list

forms its own group. For example, the command below computes bounds on the maximum

strength of confounding once, twice or three times as strong as the combined explanatory

power of the covariates female and pastvoted. The names of the list are used for setting the

benchmark labels in plots and tables.

R> group.sens <- sensemakr(model = darfur.model,

R+ treatment = "directlyharmed",

R+ benchmark_covariates =

R+ list(female_past = c("female", "pastvoted")),

R+ kd = 1:3)

Journal of Statistical Software 15

4. sensemakr for R: Advanced use

The standard functionality demonstrated in the previous section will suﬃce for most users,

most of the time. More ﬂexibility can be obtained when needed by employing additional

functions, particularly:

•functions for computing the bias, adjusted estimates and standard errors: these com-

prise, among others, the functions bias(),adjusted_estimate(),adjusted_se() and

adjusted_t(). They take as input the original (unadjusted) estimate (in the form of a

linear model or numeric values) and a pair of sensitivity parameters (the partial R2of

the omitted variable with the treatment and the outcome), and return the new quantity

adjusted for omitted variable bias.

•functions for computing sensitivity statistics: these comprise, among others, the func-

tions partial_r2(),robustness_value(), and sensitivity_stats(). These func-

tions compute sensitivity statistics suited for routine reporting, as proposed in Cinelli

and Hazlett (2020a). They take as input the original (unadjusted) estimate (in the form

of a linear model or numeric values), and return the corresponding sensitivity statistic.

•sensitivity plots:ovb_contour_plot() and ovb_extreme_plot() allow estimation and

plotting of the contour and extreme scenario plots, respectively. The convenience func-

tion add_bound_to_contour() allows the user to place manually computed bounds on

contour plots. All plot functions return invisibly the data needed to replicate the plot,

so users can produce their own plots if preferred. The default options for plots work

best with width and height around 4 to 5 inches.

•bounding functions:ovb_bounds() computes bounds on the maximum strength of con-

founding “k times” as strong as certain observed covariates. The auxiliary function

ovb_partial_r2_bound() computes bounds for confounders by passing the values of

the partial R2of the benchmarks directly.

We demonstrate the use of these functions below through examples chosen to illustrate im-

portant features of sensitivity analysis.

4.1. Formal versus informal benchmarking: customizing bounds

Informal “benchmarking” procedures have been suggested as aids to interpretation for numer-

ous sensitivy analyses. These approaches are usually described as revealing how an unobserved

confounder Z“not unlike” some observed covariate Xjwould alter the results of a study (Im-

bens 2003;Blackwell 2013;Hosman et al. 2010;Carnegie et al. 2016;Dorie et al. 2016;Hong,

Qin, and Yang 2018). As shown in Cinelli and Hazlett (2020a), these informal proposals

may lead users to erroneous conclusions, even when they make correct suppositions about

how unobserved confounders compare to observed covariates. Here we replicate Section 6.1

of Cinelli and Hazlett (2020a) using sensemakr and provide a numerical example illustrating

the potential for misleading results from informal benchmarking. This example also demon-

strates advanced usage of the package, including how to construct sensitivity contour plots

with customized bounds.

16 sensemakr: Sensitivity Analysis Tools for OLS

Data and model

We begin by simulating the data generating process which will be used in our example, as

given by Equations 9to 12 below. Here we have a treatment variable D, an outcome variable

Y, one observed confounder X, and one unobserved confounder Z. All disturbance variables

Uare standardized mutually independent normals. Note that in this case, the treatment D

has no causal eﬀect on Y.

Model 1:

Z=Uz(9)

X=Ux(10)

D=X+Z+Ud(11)

Y=X+Z+Uy(12)

Also note that, in this model: (i) the unobserved confounder Zis independent of X; and, (ii)

the unobserved confounder Zis exactly like Xin terms of its strength of association with the

treatment and the outcome. The code below draws 100 samples from this data generating

process. We use the function resid_maker() to make sure the residuals are standardized and

orthogonal, thus all properties that we describe here hold exactly even with ﬁnite sample size.

R> n <- 100

R> X <- scale(rnorm(n))

R> Z <- resid_maker(n, X)

R> D <- X + Z + resid_maker(n, cbind(X, Z))

R> Y <- X + Z + resid_maker(n, cbind(X, Z, D))

In this example, the investigator knows that she needs to adjust for the confounder Zbut,

unfortunately, does not observe Z. Therefore, she is forced to ﬁt the restricted linear model

adjusting for Xonly.

R> model.ydx <- lm(Y ~ D + X)

Results from this regression are shown in the ﬁrst column of Table 3, showing a large and

statistically signiﬁcant coeﬃcient estimate for both Dand X.

Formal benchmarks

Suppose the investigator correctly knows that: (i) Zand Xhave the same strength of associ-

ation with Dand Y; and, (ii) Zis independent of X. How can she leverage this information

to understand how much bias a confounder Z“not unlike” Xcould cause? As shown in Sec-

tion 2.3, Equation 7can be used to bound the maximum amount of confounding caused by an

unobserved confounder Zas strongly associated with the treatment Dand with the outcome

Yas the observed covariate X.

Separately from the main sensemakr() function, these bounds can be computed with the

function ovb_bounds(). In this function one needs to specify the linear model being used

Journal of Statistical Software 17

Dependent variable:

Y

Restricted OLS Full OLS

(1) (2)

D 0.500∗∗∗ 0.000

(0.088) (0.102)

X 0.500∗∗∗ 1.000∗∗∗

(0.152) (0.144)

Z 1.000∗∗∗

(0.144)

Observations 100 100

R20.500 0.667

Residual Std. Error 1.240 (df = 97) 1.020 (df = 96)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Table 3: First column: results of the restricted regression adjusting for Xonly. Second

column: results of the full regression adjusting for Xand Z.

(model = model.ydx), the treatment of interest (treatment = "D"), the observed variable

used for benchmarking (benchmark_covariates = "X"), and how many times stronger Z

is in explaining treatment (kd = 1) and outcome (ky = 1) variation, as compared to the

benchmark variable X.

R> formal_bound <- ovb_bounds(model = model.ydx,

R+ treatment = "D",

R+ benchmark_covariates = "X",

R+ kd = 1,

R+ ky = 1)

We can now inspect the output of ovb_bounds().

R> formal_bound[1:6]

bound_label r2dz.x r2yz.dx treatment adjusted_estimate adjusted_se

1 1x X 0.5 0.333 D 0 0.102

As we can see, the results of the bounding procedure correctly shows that an unobserved

confounder Z, that is truly “not unlike X”, would: (1) explain 50% of the residual variation of

the treatment and 33% of the residual variation of the outcome; (2) bring the point estimate

exactly to zero; and, (3) bring the standard error to 0.102. This is precisely what one obtains

18 sensemakr: Sensitivity Analysis Tools for OLS

when running the full regression model adjusting for both Xand Z, as shown in the second

column of Table 3.

Informal benchmarks

We now demonstrate an “informal benchmark” to show its dangers. Computing the bias due

to the omission of Zrequires two sensitivity parameters: its partial R2with the treatment D

and its partial R2with the outcome Y. Informal approaches follow from the intuition that we

can simply take the observed associations of Xwith Dand Y, found directly from regressions

for the treatment and the outcome, to “calibrate” the magnitude of the sensitivity parameters

of an unobserved confounder “not unlike” X. Unfortunately, as formalized in Cinelli and

Hazlett (2020a), these observed associations are themselves aﬀected by the omission of the

omitted variable, making naive comparisons potentially misleading.

What happens if we nevertheless attempt to use those observed statistics for benchmarking?

To compute the informal benchmarks, we ﬁrst need to obtain the observed partial R2of X

with the outcome Y. This can be done using the partial_r2() function of sensemakr in the

model.ydx regression.

R> r2yx.d <- partial_r2(model.ydx, covariates = "X")

We next need to obtain the partial R2of Xwith the treatment D. For that, we need to ﬁt a

new regression of the treatment Don the observed covariate Xhere denoted by model.dx.

R> model.dx <- lm(D ~ X)

R> r2dx <- partial_r2(model.dx, covariates = "X")

We then determine what would be the implied adjusted estimate due to an unobserved con-

founder Zwith this pair of partial R2values. This can be computed using the adjusted_estimate()

function.

R> informal_adjusted_estimate <- adjusted_estimate(model = model.ydx,

R+ treatment = "D",

R+ r2dz.x = r2dx,

R+ r2yz.dx = r2yx.d)

Let us now compare those informal benchmarks with the formal bounds. To prepare, we ﬁrst

plot sensitivity contours with the function ovb_contour_plot(). Next, we add the informal

benchmark to the contours, using the numeric method of the function add_bound_to_contour().

Finally, we use add_bound_to_contour() again to add the previously computed formal

bounds.

R> # draws sensitivity contours

R> ovb_contour_plot(model = model.ydx,

R+ treatment = "D",

R+ lim = .6)

R>

Journal of Statistical Software 19

Partial R2 of confounder(s) with the treatment

Partial R2 of confounder(s) with the outcome

−0.2

−0.1

0.1

0.2

0.3

0.4

0.5

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0

Unadjusted

(0.5)

Informal benchmark

(0.31)

Formal bound

(0)

Figure 3: Informal benchmarking versus proper bounds.

R> # adds informal benchmark

R> add_bound_to_contour(r2dz.x = r2dx,

R+ r2yz.dx = r2yx.d,

R+ bound_value = informal_adjusted_estimate,

R+ bound_label = "Informal benchmark")

R>

R> # adds formal bound

R> add_bound_to_contour(bounds = formal_bound,

R+ bound_label = "Formal bound")

Note how the results from informal benchmarking are misleading: the benchmark point is still

far from zero, which would suggest that an unobserved confounder Z“not unlike” Xis unable

to explain away the observed eﬀect, when in fact it is, as it was shown in Table 3. This incorrect

conclusion occurs despite the investigator correctly assuming both that: (i) Zand Xhave the

same strength of association with Dand Y; and, (ii) Zis independent of X. Therefore, we do

not recommend using informal benchmarks for sensitivity analysis, and suggest researchers

use formal approaches such as the ones provided with ovb_bounds(). For further details and

discussion, see Sections 4.4 and 6.1 of Cinelli and Hazlett (2020a).

4.2. Assessing the sensitivity of existing regression results

We conclude this section by demonstrating how to replicate Section 3using only the statistics

found in the regression table along with the individual functions available in the package.

20 sensemakr: Sensitivity Analysis Tools for OLS

Sensitivity statistics

The robustness value and the partial R2are key sensitivity statistics, useful for standardized

sensitivity analyses reporting. Beyond the main sensemakr() function, these statistics can be

computed directly by the user with the functions robustness_value() and partial_r2().

With a ﬁtted lm model in hand, the most convenient way to compute the RV and partial R2

is by employing the lm methods for these functions, as in

R> robustness_value(model = darfur.model, covariates = "directlyharmed")

R> partial_r2(model = darfur.model, covariates = "directlyharmed")

However, when one does not have access to the data in order to run this model, simple

summary statistics such as: (i) the point estimate for the directlyharmed (0.097); (ii) its

estimated standard error (0.023); and, (ii) the degrees of freedom of the regression (783) are

suﬃcient to compute the RV and the partial R2.

R> robustness_value(t_statistic = 0.097/0.023, dof = 783)

R> partial_r2(t_statistic = 0.097/0.023, dof = 783)

The convenience function sensitivity_stats() also computes all sensitivity statistics for a

regression coeﬃcient of interest and returns them in a data.frame.

Plotting functions

All plotting functions can be called directly with lm objects or numerical data. For example,

the code below uses the function ovb_contour_plot() to replicate Figure 1(without the

bounds) using only the summary statistics of Table 1.

R> ovb_contour_plot(estimate = 0.097, se = 0.023, dof = 783)

R> ovb_contour_plot(estimate = 0.097, se = 0.023, dof = 783,

R> sensitivity.of = "t-value")

The extreme scenario plots (as in Figure 2) can also be reproduced from summary statistics

using the function ovb_extreme_plot(),

R> ovb_extreme_plot(estimate = 0.097, se = 0.023, dof = 783)

All plotting functions return (invisibly) the data needed to reproduce them, allowing users to

create their own plots if they prefer.

Adjusted estimates, standard errors and t-values

These functions allow users to compute the adjusted estimates given diﬀerent postulated

degrees of confounding. For instance, suppose a researcher has reasons to believe a confounder

explains 10% of the residual variance of the treatment and 15% of the residual variance of

the outcome. If the underlying data are not available, the investigator can still compute the

adjusted estimate and t-value that one would have obtained in the full regression adjusting

for such confounder.

Journal of Statistical Software 21

Dependent variable:

directlyharmed

female −0.097∗∗∗

(0.036)

Observations 1,276

R20.426

Residual Std. Error 0.476 (df = 784)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Table 4: Treatment regression for the Darfur example. To conserve space only the results for

female are shown, which will be used for benchmarking.

R> adjusted_estimate(estimate = 0.097, se = 0.023, dof = 783,

R+ r2dz.x = .1, r2yz.dx = 0.15)

[1] 0.0139

R> adjusted_t(estimate = 0.097, se = 0.023, dof = 783,

R+ r2dz.x = .1, r2yz.dx = 0.15)

[1] 0.622

The computation shows that this confounder is not strong enough to bring the estimate to

zero, but it is suﬃcient to bring the t-value below the usual 5% signiﬁcance threshold of 1.96.

Computing bounds from summary statistics

Finally, we show how users can compute bounds on the strength of confounding using only

summary statistics, if the paper also provides a treatment regression table, i.e., a regression of

the treatment on the observed covariates. Such regressions are sometimes shown in published

works as part of eﬀorts to describe the “determinants” of the treatment, or as “balance tests”

in which the investigator assesses whether observed covariates predict treatment assigment.

For the Darfur example, this regression is shown in Table 4.

Using the results of Tables 1and 4we can compute the bounds on confounding 1, 2 and 3

times as strong as female, as we have done before. First we compute the partial R2of female

with the treatment and the outcome

R> r2yxj.dx <- partial_r2(t_statistic = -0.232/0.024, dof = 783)

R> r2dxj.x <- partial_r2(t_statistic = -0.097/0.036, dof = 783)

Next, we compute the bounds on the partial R2of the unobserved confounder using the

ovb_partial_r2_bound() function.

22 sensemakr: Sensitivity Analysis Tools for OLS

R> bounds <- ovb_partial_r2_bound(r2dxj.x = r2dxj.x,

R+ r2yxj.dx = r2yxj.dx,

R+ kd = 1:3,

R+ ky = 1:3,

R+ bound_label = paste(1:3, "x", "female"))

Finally, the adjusted_estimate() function computes the estimates implied by these hypo-

thetical confounders.

R> bound.values <- adjusted_estimate(estimate = 0.0973,

R+ se = 0.0232,

R+ dof = 783,

R+ r2dz.x = bounds$r2dz.x,

R+ r2yz.dx = bounds$r2yz.dx)

This information along with the numeric methods for the plot functions, allow us to reproduce

the contour plots of Figure 1using only summary statistics. Note that, since we are performing

all calculations manually, appropriate limits of the plot area need to be set by the user.

R> ovb_contour_plot(estimate = 0.0973, se = 0.0232, dof = 783, lim = 0.45)

R> add_bound_to_contour(bounds, bound_value = bound.values)

5. sensemakr for Stata

For Stata users, we have also developed a homonymous package sensemakr, which is available

for download on SSC. The package can be installed as follows:

ssc install sensemakr, replace all

The main function of the Stata package is sensemakr, which is called using the format:

sensemakr depvar covar [if] [in], treat(varlist)

For consistency with the syntax of the well-known regress command, the ﬁrst variable is

assumed to be the dependent variable, while the subsequent treatment variable and covariates

can appear in any order. The required argument is treat(varlist), which indicates the

treatment variable for which sensitivity analysis is conducted.

By default, sensemakr displays sensitivity statistics for routine reporting, as well as a text

interpretation of the results. Speciﬁcally, the output table reports three key values: the partial

R2of the treatment with the outcome (R2yd.x), the robustness value (RV) required to reduce

the point estimate entirely to zero (if q= 1), and the RV beyond which the estimate would

no longer be statistically distinguishable from zero at the 5% level (q= 1, α= 0.05).

Should users wish to bound the plausible strength of unobserved confounders relative to ex-

isting covariates, they can specify the option benchmark(varlist).benchmark() can accept

Journal of Statistical Software 23

multiple covariates from the main speciﬁcation, including time-series and factor variables.

If a benchmark is speciﬁed, sensemakr displays a bounds table. By default, this bounds

table displays estimates for a hypothetical confounder that is 1, 2, and 3 times as strong

as each benchmark covariate in explaining residual variation in both the treatment and the

outcome, as well as adjusted coeﬃcient estimates for the treatment if such a confounder were

present. In addition to these bounds, the table displays treatment coeﬃcients under an “ex-

treme scenario,” in which the confounder is assumed to have the same relationship to the

treatment (R2dz.x) as each benchmark, but explains all the residual variance of the outcome

(R2yz.dx=1).

5.1. Violence in Darfur

In this section, we brieﬂy demonstrate how to replicate the analysis of Section 3, using the

dataset darfur.dta included with sensemakr for Stata.

Users can investigate the sensitivity of the directlyharmed treatment estimate, as well as

bounds using the benchmark covariate female, via the following call:

. use darfur.dta, clear

. sensemakr peacefactor directlyharmed age farmer herder pastvoted hhsize ///

female i.village_, treat(directlyharmed) benchmark(female)

Grouped benchmarks can be assessed using the gbenchmark(varlist) option. For instance,

the following code adds the joint benchmark female and pastvoted. Note that while the

options gbenchmark() and benchmark() can be used in tandem, only a single grouped

benchmark, consisting of all the variables speciﬁed in gbenchmark(), can be evaluated per

sensemakr call.

. sensemakr peacefactor directlyharmed age farmer herder pastvoted hhsize ///

female i.village_, treat(directlyharmed) benchmark(female) ///

gbenchmark(female pastvoted)

Users can modify the output using the following options:

•alpha(real): the signiﬁcance level. The default is 0.05.

•gname(string): enables the user to specify a custom name for the group benchmark

speciﬁed in gbenchmark() (if used). By default, names for grouped benchmarks are

constructed by appending variables with ‘-’.

•kd(numlist) and ky(numlist): these arguments parameterize how many times stronger

the confounder is related to the treatment (kd) and to the outcome (ky), in compari-

son to the benchmark covariate. By default, kd and ky are set to (1 2 3), so provides

estimates for a hypothetical confounder that is 1, 2, and 3 times as strong as each

benchmark covariate. If only option kd(numlist) is provided, ky will be set equal to

kd by default. If the user opts to specify kd and ky, the number of elements within each

option must be equivalent.

•latex(ﬁlename): saves a condensed version of the reporting outputs in ﬁlename.tex.

24 sensemakr: Sensitivity Analysis Tools for OLS

•noreduce: the default functionality assumes that confunders reduce the absolute value

of the estimate. If the user wishes to assume that confounders pull the estimate away

from zero, they can specify the noreduce ﬂag.

•q(real): this option enables the user to specify what fraction of the eﬀect estimate

would have to be explained away to be problematic. Defaults to 1, implying that a

reduction of 100% of the current eﬀect estimate (true eﬀect of 0) would be problematic.

•r2yz(numlist): Allows the user to specify alternative scenarios for the extreme bounds

table. For instance, inputting (.5 .75) would display the expected treatment coeﬃ-

cients if a confounder explained 50% and 75% of the residual variance of the outcome.

By default r2yz is set to 1.

•suppress: eliminates verbose description of sensitivity statistics.

Should users wish to design their own custom exports, all reported estimates are accessible

within the e() class.

Sensitivity contour plots of point estimates and t-values

Sensitivity plots for point estimates and t-values can be generated by appending the options

contourplot and tcontourplot, respectively, to the sensemakr call. The contour plots can

be customized with the following display options:

•clines: the number of contour lines to display on each plot. Defaults to 7.

•clim(numlist): the symmetric axis limits for the contour plots. Max range is (0 1)

In addition, advanced users can generate their own plots by accessing the raw contour data

within e(contourgrid) or e(tcontourgrid).

Sensitivity plots of extreme scenarios

Plots for extreme confounding scenarios are generated using the extremeplot option. By

default these plots consider confounding that explains 100%, 75%, and 50% of variation in

the residual outcome, producing three separate curves for each scenario. The extreme scenario

plot can be customized with the following display options:

•r2yz(numlist): enables the user to specify custom values for the extreme plot. Users

can specify a maximum of four custom values.

•elim(numlist): adjusts the x-axis limits of the plot. Max range is (0 1). Note that

limits for the y-axis are set automatically to include the critical value.

6. Discussion

We recognize that the tools we present here have the potential to be misused, and that it

may be tempting to use sensitivity analyses as “robustness tests” that should be “passed,”

Journal of Statistical Software 25

in way similar to the current abuse we observe, for instance, with statistical signiﬁcance

testing (Ziliak and McCloskey 2008;Benjamin, Berger, Johannesson, Nosek, Wagenmakers,

Berk, Bollen, Brembs, Brown, Camerer et al. 2018;Amrhein and Greenland 2018). We thus

conclude the paper with brief remarks regarding the appropriate use of sensitivity analysis in

general and as applied to the tools provided by sensemakr in particular.

What sensitivity analyses can and cannot tell us

The quantities and graphics computed by sensemakr tell us what we need to be prepared to

believe in order to sustain that a given conclusion is not due to confounding. For instance,

in the applied example discussed in this paper, sensemakr reveals that, even in a worst case

scenario where the unobserved confounder explains all the residual variation of the outcome,

this unobserved confounder would need to be more than twice as strongly associated with

the treatment as the covariate female to fully explain away the observed estimated eﬀect

of directlyharmed. This is a true quantitative statement that describes the strength of

confounding needed to overturn the research conclusions.

Note, however, that sensitivity analyses cannot tell us whether such confounder is likely to

exist. The role of sensitivity analysis is, therefore, to discipline the discussion regarding the

causal interpretation of the eﬀect estimate. Ultimately, this discussion needs to rely on domain

knowledge, and is beyond the realm of statistics alone. To illustrate using our example:

1. A causal interpretation of the research conclusion may be defended by claiming that,

given the way injuries (the “treatment”) occurred, the scope for targeting particular

types of individuals was quite limited; aircraft dropped makeshift and unguided bombs

and other objects over villages, and militia raided without concern for who they would

attack—the only known major exception to this, due to sexual assaults, was targeting

gender, which is also one of the most visually apparent characteristics of an individual.

Thus, a confounder twice as strong as female would be indeed surprising.

2. Similarly, for the causal conclusion to be persuasively dismissed, it does not suﬃce to

argue that some confounding might exist. Helpful skepticism must articulate why a

confounder that explains more than twice of the variation of the treatment assignment

than the covariate female is plausible. Otherwise, the putative confounder cannot

logically account for all the observed association, even if it explains all or some large

portion of the residual outcome variation.

Robustness to confounding is thus claimed to the extent one agrees with the arguments

articulated in point 1, while the results can be deemed fragile insofar as alternative stories

meeting the requirements in point 2 can be oﬀered. Both types of arguments need to rely on

domain knowledge as to how the attacks occurred and what could presumably inﬂuence the

outcome variable.

In sum, sensitivity analyses should not be used to obviate discussions about confounding by

engaging in automatic procedures; rather, they should be used to stimulate a disciplined,

quantitative argument about confounding, in which such statements are made and debated.

The tools provided by sensemakr allow users to easily and transparently report the sensi-

tivity of their causal inferences to unobserved confounding, thereby enabling this disciplined

discussion as to what can be concluded from imperfect observational studies.

26 sensemakr: Sensitivity Analysis Tools for OLS

References

Amrhein V, Greenland S (2018). “Remove, rather than redeﬁne, statistical signiﬁcance.”

Nature Human Behaviour,2(1), 4–4.

Angrist JD, Pischke JS (2008). Mostly harmless econometrics: An empiricist’s companion.

Princeton university press.

Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA,

Brembs B, Brown L, Camerer C, et al. (2018). “Redeﬁne statistical signiﬁcance.” Nature

Human Behaviour,2(1), 6.

Blackwell M (2013). “A selection bias approach to sensitivity analysis for causal eﬀects.”

Political Analysis,22(2), 169–182.

Brumback BA, Hern´an MA, Haneuse SJ, Robins JM (2004). “Sensitivity analyses for unmea-

sured confounding assuming a marginal structural model for repeated measures.” Statistics

in medicine,23(5), 749–767.

Carnegie NB, Harada M, Hill JL (2016). “Assessing sensitivity to unmeasured confounding

using a simulated potential confounder.” Journal of Research on Educational Eﬀectiveness,

9(3), 395–420.

Cinelli C, Ferwerda J, Hazlett C (2020a). sensemakr for Stata: sensitivity analysis tools for

OLS.Stata package version 1.0.

Cinelli C, Ferwerda J, Hazlett C (2020b). sensemakr: sensitivity analysis tools for OLS.

Rpackage version 0.3, URL https://CRAN.R-project.org/package=sensemakr.

Cinelli C, Hazlett C (2020a). “Making Sense of Sensitivity: Extending Omitted Variable

Bias.” Journal of the Royal Statistical Society: Series B (Statistical Methodology),82(1),

39–67. doi:10.1111/rssb.12348.

Cinelli C, Hazlett C (2020b). “An omitted variable bias framework for sensitivity analysis of

instrumental variables.” Working Paper.

Cinelli C, Kumor D, Chen B, Pearl J, Bareinboim E (2019). “Sensitivity Analysis of Linear

Structural Causal Models.” International Conference on Machine Learning.

Cornﬁeld J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL (1959).

“Smoking and lung cancer: recent evidence and a discussion of some questions.” journal of

National Cancer Institute, (23), 173–203.

Dorie V, Harada M, Carnegie NB, Hill J (2016). “A ﬂexible, interpretable framework for

assessing sensitivity to unmeasured confounding.” Statistics in medicine,35(20), 3453–

3470.

Frank KA (2000). “Impact of a confounding variable on a regression coeﬃcient.” Sociological

Methods & Research,29(2), 147–194.

Journal of Statistical Software 27

Frank KA, Maroulis SJ, Duong MQ, Kelcey BM (2013). “What would it take to change an

inference? Using Rubin’s causal model to interpret the robustness of causal inferences.”

Educational Evaluation and Policy Analysis,35(4), 437–460.

Frank KA, Sykes G, Anagnostopoulos D, Cannata M, Chard L, Krause A, McCrory R (2008).

“Does NBPTS certiﬁcation aﬀect the number of colleagues a teacher helps with instructional

matters?” Educational Evaluation and Policy Analysis,30(1), 3–30.

Franks A, D’Amour A, Feller A (2019). “Flexible sensitivity analysis for observational studies

without observable implications.” Journal of the American Statistical Association, (just-

accepted), 1–38.

Hazlett C (2019). “Angry or Weary? How Violence Impacts Attitudes toward Peace among

Darfurian Refugees.” Journal of Conﬂict Resolution, p. 0022002719879217.

Hern´an M, Robins J (2020). “Causal inference: What if.” Boca Raton: Chapman & Hill/CRC.

Hong G, Qin X, Yang F (2018). “Weighting-Based Sensitivity Analysis in Causal Mediation

Studies.” Journal of Educational and Behavioral Statistics,43(1), 32–56.

Hosman CA, Hansen BB, Holland PW (2010). “The Sensitivity of Linear Regression Co-

eﬃcients’ Conﬁdence Limits to the Omission of a Confounder.” The Annals of Applied

Statistics, pp. 849–870.

Imai K, Keele L, Yamamoto T, et al. (2010). “Identiﬁcation, inference and sensitivity analysis

for causal mediation eﬀects.” Statistical science,25(1), 51–71.

Imbens GW (2003). “Sensitivity to exogeneity assumptions in program evaluation.” The

American Economic Review,93(2), 126–132.

Imbens GW, Rubin DB (2015). Causal inference in statistics, social, and biomedical sciences.

Cambridge University Press.

Middleton JA, Scott MA, Diakow R, Hill JL (2016). “Bias ampliﬁcation and bias unmasking.”

Political Analysis,24(3), 307–323.

Oster E (2017). “Unobservable selection and coeﬃcient stability: Theory and evidence.”

Journal of Business & Economic Statistics, pp. 1–18.

Pearl J (2009). Causality. Cambridge university press.

Robins JM (1999). “Association, causation, and marginal structural models.” Synthese,

121(1), 151–179.

Rosenbaum PR (2002). “Observational studies.” In Observational studies, pp. 1–17. Springer.

Rosenbaum PR, Rubin DB (1983). “Assessing sensitivity to an unobserved binary covariate

in an observational study with binary outcome.” Journal of the Royal Statistical Society.

Series B (Methodological), pp. 212–218.

Vanderweele TJ, Arah OA (2011). “Bias formulas for sensitivity analysis of unmeasured

confounding for general outcomes, treatments, and confounders.” Epidemiology (Cambridge,

Mass.),22(1), 42–52.

28 sensemakr: Sensitivity Analysis Tools for OLS

Ziliak S, McCloskey DN (2008). The cult of statistical signiﬁcance: How the standard error

costs us jobs, justice, and lives. University of Michigan Press.

Aﬃliation:

Carlos Cinelli

University of California, Los Angeles

Department of Statistics, 8125 Math Sciences Building, Los Angeles, CA 90095, USA.

E-mail: carloscinelli@ucla.edu

URL: http://carloscinelli.com

Jeremy Ferwerda

Dartmouth College

Department of Government, Hanover, NH 03755

E-mail: jeremy.a.ferwerda@dartmouth.edu

URL: http://jeremyferwerda.com/

Chad Hazlett

University of California, Los Angeles

Department of Statistics, 8125 Math Sciences Building, Los Angeles, CA 90095, USA.

E-mail: chazlett@ucla.edu

URL: http://chadhazlett.com

Journal of Statistical Software http://www.jstatsoft.org/

published by the Foundation for Open Access Statistics http://www.foastat.org/

MMMMMM YYYY, Volume VV, Issue II Submitted: yyyy-mm-dd

doi:10.18637/jss.v000.i00 Accepted: yyyy-mm-dd