PreprintPDF Available

Estimating causal effects of new treatments despite self-selection: The case of experimental medical treatments

Authors:

Abstract

A method for estimating treatment effects of newly available treatments under arbitrary selection-into-treatment, as illustrated by application to novel medical treatments.
J. Causal Infer. 2018; 20180019
Chad Hazlett*
     
    
  
https://doi.org/10.1515/jci-2018-0019
Received July 8, 2018; accepted November 9, 2018
 Providing terminally ill patients with access to experimental treatments, as allowed by recent “right
to try” laws and “expanded access” programs, poses a variety of ethical questions. While practitioners and
investigators may assume it is impossible to learn the eects of these treatment without randomized trials,
this paper describes a simple tool to estimate the eects of these experimental treatments on those who take
them, despite the problem of selection into treatment, and without assumptions about the selection process.
The key assumption is that the average outcome, such as survival, would remain stable over time in the ab-
sence of the new treatment. Such an assumption is unprovable, but can often be credibly judged by reference
to historical data and by experts familiar with the disease and its treatment. Further, where this assumption
may be violated, the result can be adjusted to account for a hypothesized change in the non-treatment out-
come, or to conduct a sensitivity analysis. The method is simple to understand and implement, requiring just
four numbers to form a point estimate. Such an approach can be used not only to learn which experimental
treatments are promising, but also to warn us when treatments are actually harmful – especially when they
might otherwise appear to be benecial, as illustrated by example here. While this note focuses on experi-
mental medical treatments as a motivating case, more generally this approach can be employed where a new
treatment becomes available or has a large increase in uptake, where selection bias is a concern, and where
an assumption on the change in average non-treatment outcome over time can credibly be imposed.
 Non-randomized trials, Observational studies, Clinical trials
 
On 30th May 2018, the United States established a federal “right to try” law, allowing terminally ill patients
to access experimental medical treatments that have cleared Phase 1 testing but were not yet approved by
the Food and Drug Administration (FDA).1Such laws extend pre-existing methods of gaining access to un-
approved treatments through “compassionate use” or “expanded access” programs, but sidestepping FDA
petition and oversight procedures. Numerous ethical objections have been made to these laws, pointing to
risks of dangerous side-eects that could shorten or worsen lives, raising false hopes among vulnerable pa-
tients, enabling quackery, undermining the FDA, and imposing a nancial burden on desperate patients and
their families.
Can we – and should we – learn anything about the ecacy and safety of drugs from those taking such
experimental treatments? The rst reaction of clinicians and statisticians alike may be an emphatic “no”:
without randomization into treatment and control groups, individuals will self-select into treatment, thus
little can be learned from the observational results. Indeed, one criticism of these laws have been that they
“will only make it more dicult to know if medication is eective or safe [1].
1This law federally enshrines analogous rights previously recognized by 40 US states.
    Departments of Statistics and Political Science, University of California Los Angeles,
Los Angeles, United States, e-mail: chazlett@ucla.edu, URL: http://www.chadhazlett.com
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
| C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection
Yet, failing to carefully assess the eects of these treatment would both forgo the opportunity to learn
which are promising, and perhaps worse, could greatly amplify their harm. Clinicians and investigators will
surely make inferences about the eects of new treatments by comparing, at least casually, those who receive
them and those who do not. Estimates based on comparisons of this type are not just biased, they are dan-
gerous. As illustrated below, through the eects of self-selection, treatments that are actually harmful may
appear to be benecial in such naive comparisons, perversely encouraging more patients to take them.
We thus have a responsibility to learn the benets and harms of such treatments, and to avoid that worst
errors that naive comparisons would generate. Fortunately, a very simple technique outlined here allows
valid estimates of treatment eects under such circumstances in which individuals self-select into treatment
in unknown ways. Of course, such inference is not free. The critical assumption required is that in the absence
of the new treatment of interest, the average outcome would remain stable or change by a specied amount
across two time periods (one before the treatment is available, and one after). While not true in every case, this
assumption is straightforward to understand and can often be assessed by experts familiar with treatment of
the disease in question.
While this note takes experimental medical treatments as an illustration and motivation, the method
described here applies to a wide variety of circumstances where a treatment becomes newly available, in-
dividuals or units opt into taking that treatment, and the assumption of stability in average non-treatment
outcomes over time is reasonable.2The method may also be useful even in cases where a randomized trial
is possible or has been conducted, but that trial had strict eligibility criteria or low willingness to consent to
randomization, resulting in an estimate that may not generalize well to the population that would actually
elect to take the treatment.
 
Perhaps surprisingly, we can make inferences about how those who take a new treatment benet from it even
when patients select into taking treatment in unknown ways. Consider person i, with some observed outcome
Yi, such as whether or not they survive at one year post diagnosis. Using the potential outcomes framework[2]
and assuming no interference, we consider not only the observed outcome for person i, but also two potential
outcomes – the outcome she would have experienced had she been treated, Yi(1), and the outcome she would
have experienced had she not been treated at Yi(0).
We next consider how the average outcome people would experience without the treatment, [Yi(0)](or
simply [Y(0)]) changes from the time period before the new treatment becomes available to the time period
after,
[Y(0)|T=1][Y(0)|T=0]=δ(1)
where T=0 designates a time period before a treatment (D) is available, T=1 is a time period after it is made
available. Note that T{0,1}designates time periods or windows – perhaps a few years wide – and not single
points in time. This is important, as Yimay take time to measure (e. g. the proportion surviving for one year
post-diagnosis), and treatments may take time to take eect.
The core assumption required is on the value of δ. For simplicity of exposition and because it is likely
the primary use case, we consider rst the assumption that δ=0, which we call the stability of the aver-
age non-treatment outcome assumption. However, alternative assumed values of δcan be employed. Within
this setting, besides the practical assumption that at least some eligible individuals take the new treatment,
2Some examples of other cases where this approach may be applicable and where randomization is dicult or impossible in-
clude: What is the eect of a drug newly being used o-label by select physicians to treat a disease? What is the eect of warnings
sent to a patient’s doctor by a health monitoring system? Outside of medicine, what is the eect of an early release program from
jail, assigned by judges, on recidivism of those who are released? What is the eect of legal aid, given to those who most need
it, on legal outcomes? What is the eect of a television program on behaviors of those who choose to watch it, or the eect of
advertisements on those who receive them?
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection |
Pr(D=1|T=1)>0, only an assumption on δis required to identify the average treatment eect among the
treated. No assumption is required on the treatment assignment mechanism.
Showing this identication result is straightforward: weknow the average non-treatment outcome among
the whole group in period one, because it is equal to the mean observed (non-treatment) outcome in period
zero (shifted by δif δ̸= 0). This group average is, in turn, a weighted combination of two other averages:
the average non-treatment outcome among the untreated, which we observe, and the average non-treatment
outcome among the treated, for which we can solve. That is, the average (non-treatment) outcome among the
non-treated, combined with knowledge of the average non-treatment outcome among the whole group, tells
us what the average non-treatment outcome among treated must be. Formally, by applying the law of iterated
expectations,
[Y(0)|T=0]=[Y(0)|T=1]δ
=[Y(0)|D=1,T=1]Pr(D=1|T=1)
+[Y(0)|D=0,T=1]Pr(D=0|T=1)δ
which we can re-arrange to identify [Y(0)|D=1,T=1]in terms of observables,
[Y(0)|D=1,T=1]=[Y(0)|T=0][Y(0)|D=0,T=1]Pr(D=0|T=1)+δ
Pr(D=1|T=1)
=[Y|T=0][Y|D=0,T=1]Pr(D=0|T=1)+δ
Pr(D=1|T=1)(2)
Finally, the Average Treatment Eect on the Treated (ATT) is the dierence between the treatment and non-
treatment potential outcomes, taken solely among the treated, i. e. [Y(1)|D=1,T=1][Y(0)|D=1,T=1].
While we directly observe an estimate of the rst quantity, the second term – the average outcome among the
treated had they not taken the treatment – has now been given by the strategy above (Equation 2). We have
thus identied the ATT,
ATT =[Y(1)|D=1,T=1][Y(0)|D=1,T=1]
=[Y|D=1,T=1][Y|T=0][Y|D=0,T=1]Pr(D=0|T=1)+δ
Pr(D=1|T=1)(3)
When the stability assumption is maintained, δ=0 and can thus be simply removed. In that case, despite
appearing quite dierent, this estimator is equivalent to an instrumental variables approach in which “time”
is the instrument and non-compliance is one-sided (see Discussion).
 
A number of extensions are possible. First, investigators could use this tool to examine “what-if” scenarios:
if clinicians have beliefs about the four quantities required here or evidence from past cases, we can compute
ATT estimates to understand the underlying eect implied by those beliefs. Such an analysis is informal and
only as good as the guesses that are used as data. However, it correctly produces an ATT estimate subject
to those guesses, avoiding the errors that result from naive comparisons that may otherwise be made. An
illustration of such errors is given below.
Second, the assumption that δ=0 can be replaced with one that allows a hypothetical or modeled
change in the average non-treatment outcome over time such that δ̸= 0. One application for this is sensitiv-
ity analysis: we can hypothesize shifts in the average non-treatment outcome and compute the corresponding
eect, repeating this for dierent hypothesized shifts. This allows us to ask “how large a shift in the average
non-treatment outcome must be permitted in order for our conclusions about the benet (or harmfulness)
of a treatment to change?” If a seemingly implausible shift is required to change our substantive conclu-
sion – e. g. that non-treatment outcomes improved by 20 % despite no known changes in treatment protocols
or compositional shifts in those who get the disease – then we would be able to rule out such concerns. These
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
| C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection
analyses may or may not prove informative, but are an improvement over the naive comparisons that may
otherwise be attempted. Alternatively, the stability assumption can be replaced with a known or estimated
shift in non-treatment outcomes. If, for example, there are known compositional shifts or changes in available
care besides the treatment of interest, and if we can model these or make reasonable estimates of how they
may change average non-treatment outcomes, we can use this information to adjust the resulting treatment
eect estimate for the treatment of interest.
        
A simple illustration can demonstrate the method and how it can avoid the most dangerous errors that may
arise due to direct comparisons that practitioners may be tempted to make. Suppose an aggressive form of
cancer, once diagnosed at a certain stage, has only a 50% one-year survival rate as measured over a period
from 2005 to 2010 (
[Y(0)|T=0]=0.5, where
[⋅] designates a sample average). Between 2012 and 2016
suppose a new treatment becomes available, and that 30% of the group diagnosed with this cancer after
2010 attempt this new treatment (
Pr(D=1|T=1)=0.3). Among this group, suppose that the one-year
survival rate is 60 % (
[Y(1)|D=1,T=1]=0.6), while the one-year survival rate among those who chose not
to take the treatment is only 40% (
[Y(0)|D=0,T=1]=0.4). Under the stability assumption, we set δ=0
and can estimate the expected non-treatment outcome among those who took the treatment:
[Y(0)|T=1,D=1]=[Y|T=0][Y|D=0,T=1]Pr(D=0|T=1)
Pr(D=1|T=1)
0.50.4(0.7)
0.3=0.73
To verify this result while reinforcing the intuition: rst, under the stability assumption, if we could see
how everybody in the later group fairs in the absence of the treatment, we know that (up to nite sample
error) 50 % would survive at one year. Among the 70% who did not take the treatment, only 40 % survive at
one year. This “drop” in survival rate among the non-treated signals that it must have been those who were
worse-o who chose not to take the treatment. Consequently, those who do take the treatment must have
had higher (non-treatment) survival in order to bring this 40 % up to the required 50 % for the whole group.
Specically, the 30 % of the group who took the treatment must have had an average non-treatment survival
of xin the equation (0.4)(0.7)+(.3)x=.5, which solves to x=0.73.
The observed survival rate of 60% under treatment thus no longer appears favorable compared to the
73 % who would have survived without treatment in this group, yielding an estimated ATT of 60% 73 % =
13 %. This nding of a harmful eect of treatment emphasizes both that this technique can return counter-
intuitive results, and the ethical imperative to understand the impacts of experimental treatments. We em-
phasize that naive comparisons tell the opposite story: The treatment at rst appears benecial, with higher
survival among those who took the drug compared to those before the drug existed (60 % versus 50 %), and
higher survival among those who took it versus those who did not in the second period (60% versus 40 %).
This may persuade practitioners to recommend it, and patients to take it. Yet, it actually reduces survival by
13 % among those who take it. It would seem unethical not to make this information available to practitioners,
patients, and regulators. Furthermore, if the assumption that δ=0 cannot condently be defended, sensitiv-
ity analysis using a range of values for δwill characterize what we would conclude for any given assumption
of δ.
 
While motivated here by the problem of experimental medical treatments, this approach is quite general in
its applicability to situations in which new treatments become available and individuals self-select to receive
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection |
them. The main concern investigators must keep in mind when choosing to apply this method is whether they
can justify the stability assumption, or employ some value of δother than zero. Fortunately, the stability of
non-treatment outcomes is a straightforward assumption to understand, and can be wellevaluated by experts
in many cases. It is most plausible, rst, if little has changed in the way the disease is treated over the time
in question. This includes any other treatments that might be administered or withdrawn due to taking the
new treatment in question. Second, a more subtle concern lies in compositional changes in the population
who acquires the disease, such as changes in population health or competing risks, as these could also drive
changes in the average non-treatment outcome. That said, if such compositional shifts do occur, they are
likely to be slow-moving and so may be possible to rule out as problematic in the short-run. While no test
can prove that the stability assumption (or any other assumption on δ) holds, investigators can check the
stability of average outcomes over the course of many years prior to the introduction of the new treatment,
which can boost the credibility that it remains stable thereafter. Altogether, if the outcome has been stable
for many years prior to the introduction of the treatment in question, and there are no known changes in the
use of other treatments or sudden changes in the composition of the group with the disease, then a strong
case can be made for the stability of the average non-treatment outcome.
Pragmatically, only four numbers are required to form a point estimate: the estimated average outcome
prior to treatment ([Y|T=0]), the estimated average outcome among the treated and untreated after intro-
duction of the treatment ([Y|D=1,T=1]and [Y|D=0,T=1]respectively), and the proportion who took
the treatment after introduction, Pr(D=1|T=1). In some cases, the required data may thus be available from
existing sources, such as electronic medical records. In other cases, investigators may choose to run a trial of
this type by design.
    
One alternative approach worth mentioning but less often applicable would be possible when we have a
disease such that the prognosis in the absence of any new treatment is virtually certain, and thus the non-
treatment outcome one would normally learn from a control group is already known. Suppose that nearly all
individuals with a certain cancer at a certain stage die within one year (and those whose cancers do remit,
if any, show no signs of their potential for remittance until it happens and thus would not have any basis
for self-selecting into treatment). If a group – selected by any means – takes a new medication and has a
50 % survival rate at one year, then the improvement can reasonably be attributed to the new treatment, as
we believe we know how those individuals would have faired under non-treatment, despite the absence of
a control group. While possibly workable in some scenarios, such an approach is limited to cases where the
outcome is nearly certain. By contrast, the approach here is more general, and recognizes that when outcomes
are uncertain (such as a 50 % one year survival rate), there is non-trivial scope for self-selection.3
Second, this method may bear a resemblance to the Dierence-in-Dierence (DID) approach, but can
operate in circumstances where DID is not possible, and provides a relaxation of DID in cases where it is pos-
sible. To conduct DID, we need either to measure each unit before and after (some are exposed to) treatment,
or we must be able to place individuals into larger groupings that persist over time (such as states), with treat-
ment being assigned at the level of those larger units at time T=1. By contrast, the present method works
even when there is no way to know if an individual observed at time T=0 would have chosen treatment had
they appeared at time T=1. This is useful in cases such as new medical treatments: among those diagnosed
with a given disease during T=0, there is no way to say which would have taken the treatment at time T=1,
as would be required by DID.
The method is thus particularly useful where DID is not possible, however in arrangements where DID
is possible (such as panel data), it provides an “adjustable” version of DID that allows prescribed deviations
3Another approach that may be feasible in some circumstances occurs when a new treatment becomes available and all the
individuals under study take it, leaving no control group at time T=1. In this case, the method proposed here specializes to a
simple cross-sectional “post-minus-pre” estimator, where identication is still achieved by assuming δ.
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
| C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection
from the parallel trends assumption. Specically, DID requires the parallel trends assumption, [Y(0)|T=
1,D=1][Y(0)|T=0,D=1]=[Y(0)|T=1,D=0][Y(0)|T=0,D=0], whereas the present
method assumes [Y(0)|T=1][Y(0)|T=0]=δfor some choice δ. To understand the connection,
consider that there are two ways to support a particular assumption of δ. The trend in average non-treatment
outcomes over time could be dierent for the would-be-treated group and the would-be-control group, with
the average of these trends (weighted by their population proportions) amounting to δ. Alternatively, we may
propose a given δby assuming that both the would-be-treated and would-be-control group changed by δ,
in turn ensuring that the average [Y(0)] across the two also changes by δ. This more restrictive claim is
precisely the parallel trends assumption. Further, if we do assume parallel trends, this means we can learn
the appropriate δfrom the change over time in the control group alone. Setting δto the estimated change
in the control group,
[Y(0)|T=1,D=0]
[Y(0)|T=0,D=0], returns a value exactly equal to the DID
estimate. However, if we wish to make any assumption other than parallel trends, this method allows it: any
trends in the average non-treatment outcomes hypothesized for the treated and control group implies a choice
of δthrough their weighted average.
Third and perhaps most illuminating, in the case when δ=0 this procedure is identical to an instru-
mental variables approach in which “time” is the instrument. In the framework of [3], those at T=1 are
“encouraged” to take the treatment by its availability. When we assume δ=0, we assume [Y(0)] does not
change over time – thus the only way that the instrument (time) can inuence outcomes is by switching
some individuals (“compliers”) into taking the treatment, satisfying the “exclusion restriction”. The reader
can verify that when δ=0 the estimator in (3) is numerically equal to the Wald estimator for instrumental
variables. The proportion who take the treatment at T=1 is the “compliance rate” or rst stage eect. Deers
are assumed not to exist, and because of the unavailability of the treatment at time T=0, non-compliance
is known to be one-sided. As a consequence, the eect among the compliers is simply the average treatment
eect on the treated. Going beyond the usual instrumental variable arrangement, when δ̸= 0 is employed,
this corresponds to allowing a prescribed violation of the exclusion restriction.4Accordingly, the required
assumptions for this method (under the δ=0 assumption) can be partially represented by a Directed Acylic
Graph (DAG) encoding an instrumental variable relationships, as in Figure 1 [4]. As with instrumental vari-
ables in general, the additional assumption of monotonicity or “no-deers” on the TDrelationship is not
represented on the DAG but must be stated. The absence of an edge between Tand Yin this graph encodes the
exclusion restriction corresponding to the “δ=0” case, though the method can allow for δ̸= 0, not encoded
on the non-parametric DAG.
T D Y
  Graphical representation of time as an instrument. Note: Instrumental variables representation of the identication
requirements. T{0,1}is the time period, D{0,1}is treatment status, and Yis the outcome. The required absence of deers,
and the possibility that δ̸= 0, are not shown.
While I am not aware of any empirical or theoretical work describing the identication logic used here,
the equivalence to using time as an instrument connects to an emergent set of medical studies in which the
uptake of new treatments increases dramatically over time [5, 6, 7, 8, 9].5The approach described here helps to
4The instrumental variables interpretation also suggests that this procedure could be used for treatments that were available in
time T=0 but whose uptake changed dramatically in T=1, in which case the Wald estimator can be used to rescale the change
in outcomes between T=0 and T=1 (i. e. the reducedform) by the change in treatment uptake, Pr(D=1|T=1)Pr(D=1|T=0)
(i. e. the compliance rate). This may also be a reasonable strategy in some cases. In this case, however, the local average treatment
eect identied is no longer the ATT.
5Brookhart et al. [10] also provides a guide to the use of instrumental variables generally in comparative safety and eectiveness
research, with a brief section discussing the use of calendar time as an instrument.
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection |
give clarity to the identication assumptions required of such work and how they can be judged. In addition,
when identication depends upon an assumption on δas described here, covariate adjustment procedures
become unnecessary, or require explicit justication in terms of identication. Rather, simpler analyses us-
ing Equation 3, together with discussions of plausible values of δare called for. Further, sensitivity analysis
based on a range of these δvalues can be a valuable addition to any such work.
    
Even when RCTs are possible, investigators may worry about two important representational limitations. For
example if only a small fraction of people with a given disease are eligible or willing to consent, then how
might this group be dierent from the ultimate target group who will use the treatment once approved? Clin-
ical designs that allow partial self-selection in an eort to address these concern include “comprehensive
cohort studies” [11] and “patient preference trials” [12]. In comprehensive cohort studies, it is proposed that
those patients who refuse randomization be allowed to instead join the study but with the treatment of their
preference. The randomized arms are compared as in any experiment. The outcomes (as well as the pre-
treatment characteristics) of those in the preference groups are hoped to improve our understanding of gen-
eralizability, but it is unclear how to make reliable use of the information provided by these preference groups
given confounding biases. In later work, “patient preference trials” encapsulate a variety of research designs
in which patients’ preferences are elicited, some individuals are randomized, and some receive a treatment
of their choosing. In the recent proposal of [13], treatment preferences are elicited from all individuals, who
are randomized into one group that will have their treatment assigned at random, and one that can choose
its own treatment. This design allows for sharp-bound identication and sensitivity analysis for the average
causal eects among those who would choose a given treatment.
The present method has the primary benet of sidestepping the need for randomization, still required by
the above designs. However, the allowance for self-selection alters the estimand in ways that may be prefer-
able or complementary to an RCT, depending on the goals of the study. First, in retrospective work, the ATT
identied here may be the ideal quantity of interest if we would like to know what eect a past treatment
actually had. Second, if our goals are more prospective, the ATT from this method may say more about the
potential eects of a new treatment in the clinical population likely to take it, if the RCT was highly restrictive
in its eligibility criteria, or suered low consent rates. On the other hand, the ATT may not be ideal to inform
policy making decisions – such as promoting a new treatment to become a rst line therapy – if the group
likely to take the treatment under such a policy varies widely from those who have elected to take in the study
period.
 
In summary, this note describes a simple identication procedure allowing estimation of the ATT regardless
of self-selection into the sample. The simplest assumption – stability of average non-treatment outcomes
(δ=0) – may be reasonable when we know that (a) the composition of those who acquire a certain condition
has not changed, and (b) the availability and use of treatments have not changed, except for the new treat-
ment of interest. Where investigators are uncertain that the average non-treatment outcomes have remained
stable over time, one can model or propose a non-zero change (δ̸= 0), or show the sensitivity of results to a
range of δvalues. In contrast to DID, the method works when dierent individuals are present in the two time
periods, without any indication of who in the earlier time period would have been exposed to treatment had
they been observed in the latter time period. In the δ=0 case it corresponds to using time as an instrument,
clarifying the assumptions and required analyses of such an approach. In the δ̸= 0 case, it allows a pre-
scribed deviation from the exclusion restriction, as well as a sensitivity analysis. While the most obvious use
of this method is when randomized trials are not possible, another potential benet regards representation
or external validity: The ATT estimated here may be more informative than the average eect from an RCT,
depending upon our scientic goals, and who was able and willing to participate in the RCT.
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
| C. Hazlett, Estimating Causal Eects of New Treatments Despite Self-Selection
This method is broadly applicable where treatment become newly available or popular, and an assump-
tion on the stability of average non-treatment outcomes can be credibly made. Returning to the motivating
case of “right to try” and other access to experimental medical treatments, the availability of this method does
not change the deep and dicult set of ethical questions that must be answered about when and whether an
experimental treatment should be made available. Rather, given the current laws, we must consider our ethi-
cal responsibility to learn what we can from such treatment regimes – not only to determine which therapies
are promising for further trials, but to more quickly protect against harmful ones.
 The author thanks Darin Christensen, Erin Hartman, Chris Tausanovitch, Mark Handcock,
Je Lewis, Aaron Rudkin, Maya Petersen, Arash Naeim, Onyebuchi Arah, Dean Knox, Ami Wulf, Paasha Mah-
davi, and members of the 2018 Southern California Methods workshop for valuable feedback and discussion.

1. Hambine J. The disingenuousness of right to try. The Atlantic. 2018.
2. Neyman J, Dabrowska D, Speed T. On the application of probability theory to agricultural experiments. Essay on principles.
Section 9. Stat Sci. 1923[1990];5(4):465–72.
3. Angrist JD, Imbens GW, Rubin DB. Identication of causal eects using instrumental variables. J Am Stat Assoc.
1996;91(434):444–55.
4. Pearl J. Causality. Cambridge university press; 2009.
5. Johnston K, Gustafson P, Levy A, Grootendorst P. Use of instrumental variables in the analysis of generalized linear models
in the presence of unmeasured confounding with applications to epidemiological research. Stat Med. 2008;27(9):1539–56.
6. Cain LE, Cole SR, Greenland S, Brown TT, Chmiel JS, Kingsley L, Detels R. Eect of highly active antiretroviral therapy on
incident aids using calendar period as an instrumental variable. Am J Epidemiol. 2009;169(9):1124–32.
7. Shetty KD, Vogt WB, Bhattacharya J. Hormone replacement therapy and cardiovascular health in the united states. Med
Care. 2009;600–5.
8. Mack CD, Brookhart MA, Glynn RJ, Meyer AM, Carpenter WR, Sandler RS, StürmerT. Comparative eectiveness of oxaliplatin
vs. 5-ourouricil in older adults: an instrumental variable analysis. Epidemiology (Camb, Mass). 2015;26(5):690.
9. Gokhale M, Buse JB, DeFilippo Mack C, Jonsson FunkM, Lund J, Simpson RJ, Stürmer T. Calendar time as an
instrumental variable in assessing the riskof heart failure with antihyperglycemic drugs. Pharmacoepidemiol Drug Saf.
2018;27(8):857–66.
10. Brookhart MA, Rassen JA, SchneeweissS. Instrumental variable methods in comparative safety and eectiveness research.
Pharmacoepidemiol Drug Saf. 2010;19(6):537–54.
11. Olschewski M, Scheurlen H. Comprehensive cohort study: an alternative to randomized consent design in a breast
preservation trial. Methods Inf Med. 1985;24(03):131–4.
12. Brewin CR, Bradley C. Patient preferences and randomised clinical trials. BMJ, Br Med J. 1989;299(6694):313.
13. Knox D, Yamamoto T, Baum MA, Berinsky A. Design, identication, and sensitivity analysis for patient preference trials.
Technical report, Working Paper. 2014.
Brought to you by | University of California - Los Angeles - UCLA Library
Authenticated
Download Date | 12/11/18 9:04 PM
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Social and medical scientists are often concerned that the external validity of experimental results may be compromised because of heterogeneous treatment effects. If a treatment has different effects on those who would choose to take it and those who would not, the average treatment effect estimated in a standard randomized controlled trial (RCT) may give a misleading picture of its impact outside of the study sample. Patient preference trials (PPTs), where participants' preferences over treatment options are incorporated in the study design, provide a possible solution. In this paper, we provide a systematic analysis of PPTs based on the potential outcomes framework of causal inference. We propose a general design for PPTs with multi-valued treatments, where participants state their preferred treatments and are then randomized into either a standard RCT or a self-selection condition. We derive nonparametric sharp bounds on the average causal effects among each choice-based sub-population of participants under the proposed design. We also propose a sensitivity analysis for the violation of the key ignorability assumption sufficient for identifying the target causal quantity. The proposed design and methodology are illustrated with an original study of partisan news media and its behavioral impact. (194 words)
Article
Full-text available
Instrumental variable (IV) methods have been proposed as a potential approach to the common problem of uncontrolled confounding in comparative studies of medical interventions, but IV methods are unfamiliar to many researchers. The goal of this article is to provide a non-technical, practical introduction to IV methods for comparative safety and effectiveness research. We outline the principles and basic assumptions necessary for valid IV estimation, discuss how to interpret the results of an IV study, provide a review of instruments that have been used in comparative effectiveness research, and suggest some minimal reporting standards for an IV analysis. Finally, we offer our perspective of the role of IV estimation vis-à-vis more traditional approaches based on statistical modeling of the exposure or outcome. We anticipate that IV methods will be often underpowered for drug safety studies of very rare outcomes, but may be potentially useful in studies of intended effects where uncontrolled confounding may be substantial.
Article
Full-text available
In the portion of the paper translated here, Neyman introduces a model for the analysis of field experiments conducted for the purpose of comparing a number of crop varieties, which makes use of a double-indexed array of unknown potential yields, one index corresponding to varieties and the other to plots. The yield corresponding to only one variety will be observed on any given plot, but through an urn model embodying sampling without replacement from this doubly indexed array, Neyman obtains a formula for the variance of the difference between the averages of the observed yields of two varieties. This variance involves the variance over all plots of the potential yields and the correlation coefficient $r$ between the potential yields of the two varieties on the same plot. Since it is impossible to estimate $r$ directly, Neyman advises taking $r = 1$, observing that in practice this may lead to using too large an estimated standard deviation, when comparing two variety means.
Article
Objective In recent years, second‐line diabetes treatment with dipeptidyl peptidase–4 inhibitors (DPP‐4i) increased with a corresponding decrease in thiazolidinediones (TZDs). Using hospitalization for heart failure (HF) as a positive control outcome, we explored the use of calendar time as an instrumental variable (IV) and compared this approach to an active comparator new‐user study. Methods We identified DPP‐4i or TZD initiators after a 6‐month washout using Medicare claims 2006–2013. The IV was defined as a binary variable comparing initiators during October 2010 to December 2013 (postperiod) versus January 2008 to May 2010 (preperiod). We examined IV strength and estimated risk differences (RDs) for HF using Kaplan‐Meier curves, which were compared with propensity score (PS)–weighted RD for DPP‐4i versus TZD. Results The IV compared 22 696 initiators (78% DPP‐4i) in the postperiod versus 20 283 initiators (38% DPP‐4i) in the preperiod, resulting in 40% compliance. The active‐comparator (PS‐weighted) approach compared 26 198 DPP‐4i and 18 842 TZD initiators. Covariate balance across IV levels was slightly better than across treatments (standardized difference, 3% vs 4.5%). The 1‐ and 2‐year local average treatment effects of RD of HF per 100 patients in the “compliers” (95% confidence intervals) were −0.62 (−0.99 to −0.25) and −0.88 (−1.46 to −0.25). Corresponding PS‐weighted results were −0.20 (−0.33 to −0.05) and −0.18 (−0.30 to 0.03). Conclusion Both approaches indicated lesser risk of HF hospitalizations among DPP‐4i vs TZD initiators. The magnitude of the estimated effects may differ due to differences in the target populations and assumptions. Calendar time can be leveraged as an IV when market dynamics lead to profound changes in treatments.
Article
We outline a framework for causal inference in settings where assignment to a binary treatment is ignorable, but compliance with the assignment is not perfect so that the receipt of treatment is nonignorable. To address the problems associated with comparing subjects by the ignorable assignment - an "intention-to-treat analysis" - we make use of instrumental variables, which have long been used by economists in the context of regression models with constant treatment effects. We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers. Without these assumptions, the IV estimand is simply the ratio of intention-to-treat causal estimands with no interpretation as an average causal effect. The advantages of embedding the IV approach in the RCM are that it clarifies the nature of critical assumptions needed for a causal interpretation, and moreover allows us to consider sensitivity of the results to deviations from key assumptions in a straightforward manner. We apply our analysis to estimate the effect of veteran status in the Vietnam era on mortality, using the lottery number that assigned priority for the draft as an instrument, and we use our results to investigate the sensitivity of the conclusions to critical assumptions.
Article
Oxaliplatin was rapidly adopted for treatment of stage III colon cancer after FDA approval in November 2004, thus providing an opportunity to use calendar time as an instrumental variable in nonexperimental comparative effectiveness research. Assuming instrument validity, instrumental variable analyses account for unmeasured confounding and are particularly valuable in sub-populations of unresolved effectiveness, such as older individuals. We examined stage III colon cancer patients ages 65+ years initiating chemotherapy between 2003 and 2008 using US population-based cancer registry data linked with Medicare claims (N = 3,660). Risk differences for all-cause mortality were derived from Kaplan-Meier survival curves. We examined instrumental variable strength and compared risk differences with propensity score estimates. Calendar time greatly affected oxaliplatin receipt. The calendar time instrument compared patients treated from January 2003 through September 2004 (N = 1,449) with those treated from March 2005 through May 2007 (N = 1,432), resulting in 54% compliance. The 1-, 2-, and 3-year local average treatment effect of the risk differences per 100 patients in the "compliers" (95% confidence intervals) were -4.6 (-8.2, -0.44), -6.3 (-12, -0.16), and -9.2 (-15, -2.5), respectively. Corresponding propensity score-matched results were -1.9 (-4.0, 0.2), -3.4 (-6.2, -0.05), and -4.3 (-7.5, -0.96). Instrumental variable and propensity score analyses both indicate better survival among patients treated with oxaliplatin. As these results are based on different populations and assumptions, the instrumental variable analysis adds to evidence of oxaliplatin's effectiveness in older adults, who bear the greatest burden of colon cancer yet were underrepresented in clinical trials. In nonexperimental comparative effectiveness research of rapidly emerging therapies, the potential to use calendar time as an instrumental variable is worth consideration.
Article
Hormone replacement therapy (HRT) was widely used among postmenopausal women until 2002 because observational studies suggested that HRT reduced cardiovascular risk. The Women's Health Initiative randomized trial reported opposite results in 2002, which caused HRT use to drop sharply. We examine the relationship between HRT use and cardiovascular outcomes (deaths and nonfatal hospitalizations) in the entire US population, which has not been studied in prior clinical trials or observational studies. We use an instrumental variables regression design to analyze the relationship between medication use, cardiovascular risk factors, and acute stroke and myocardial infarction event rates in women aged 40 to 79 years. The natural experiment of the 2002 decline in HRT usage mitigates confounding factors. We use US death records, hospital discharge data obtained from the Healthcare Cost and Utilization Project's Nationwide Inpatient Sample, and nationally representative surveys of medication usage, and behavioral risk factors. Decreases in HRT use were not associated with statistically significant changes in hospitalizations or deaths due to acute stroke (0.000002, P = 0.999, 95% CI: -0.0027 to 0.0027). Decreased HRT use was associated with a decrease in the incidence of acute myocardial infarction (-0.0025 or -25 events/10,000 person-years, P = 0.021, 95% CI: -0.0047 to -0.0004). The results were similar in a sensitivity analysis using alternate data sources. Decreased HRT use was not associated with reduced acute stroke rate but was associated with a decreased acute myocardial infarction rate among women. Our results suggest that observational data can provide correct inferences on clinical outcomes in the overall population if a suitable natural experiment is identified.
Article
Human immunodeficiency virus (HIV) researchers often use calendar periods as an imperfect proxy for highly active antiretroviral therapy (HAART) when estimating the effect of HAART on HIV disease progression. The authors report on 614 HIV-positive homosexual men followed from 1984 to 2007 in 4 US cities. During 5,321 person-years, 268 of 614 men incurred acquired immunodeficiency syndrome, 49 died, and 90 were lost to follow-up. Comparing the pre-HAART calendar period (<1996) with the HAART calendar period (>or=1996) resulted in a naive rate ratio of 3.62 (95% confidence limits: 2.67, 4.92). However, this estimate is likely biased because of misclassification of HAART use by calendar period. Simple calendar period approaches may circumvent confounding by indication at the cost of inducing exposure misclassification. To correct this misclassification, the authors propose an instrumental-variable estimator analogous to ones previously used for noncompliance corrections in randomized clinical trials. When the pre-HAART calendar period was compared with the HAART calendar period, the instrumental-variable rate ratio was 5.02 (95% confidence limits: 3.45, 7.31), 39% higher than the naive result. Weighting by the inverse probability of calendar period given age at seroconversion, race/ethnicity, and time since seroconversion did not appreciably alter the results. These methods may help resolve discrepancies between observational and randomized evidence.
Article
Allocating patients to different treatments by randomisation in a controlled trial is now accepted almost without question in accounts of trial design. Randomisation may reasonably be supposed to play a large part in evaluating proposed studies for grant support. The virtue of randomisation is that it reduces some types of systematic error that may interfere with the interpretation of the results of a trial. Allocating patients to treatments in a systematic non-randomised way may introduce bias which destroys comparability. We argue here that despite this advantage random allocation is not always suitable. Though patients play an active part in the outcome of all treatments, we suggests that clinical trials in which they are required to sustain an effortful and demanding role and those in which they are likely to have strong preferences for one treatment need to be considered and conducted differently.