PreprintPDF Available

Decision Making in Drug Development via Inference on Power

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

A typical power calculation is performed by replacing unknown population-level quantities in the power function with what is observed in external studies. Many authors and practitioners view this as an assumed value of power and offer the Bayesian quantity probability of success or assurance as an alternative. The claim is by averaging over a prior or posterior distribution, probability of success transcends power by capturing the uncertainty around the unknown true treatment effect and any other population-level parameters. We use p-value functions to frame both the probability of success calculation and the typical power calculation as merely producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go decisions that utilize inference on power for better risk management and decision making.
Content may be subject to copyright.
Decision Making in
Drug Development via
Inference on Power
Geoffrey S Johnson
Merck & Co., Inc.
770 Sumneytown Pike, West Point, PA 19438 USA
geoffrey.s.johnson@gmail.com
Abstract
A typical power calculation is performed by replacing unknown population-level quantities in the power
function with what is observed in external studies. Many authors and practitioners view this as an assumed
value of power and offer the Bayesian quantity probability of success or assurance as an alternative. The claim
is by averaging over a prior or posterior distribution, probability of success transcends power by capturing the
uncertainty around the unknown true treatment effect and any other population-level parameters. We use p-
value functions to frame both the probability of success calculation and the typical power calculation as merely
producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point
estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go
decisions that utilize inference on power for better risk management and decision making.
Keywords: Pharmaceutical drug development, P-value function, Confidence distribution, Probability of success,
Assurance.
1
1 Introduction
The need for quantitative decision rules in the pharmaceutical industry across all phases of clinical development
is paramount (Frewer et al. 2016; Kirby and Chuang-Stein 2017; Lalonde et al. 2007). This entails Go/No-Go
decisions from phase 1 through 3, and just as important is the probability of making these decisions. In drug de-
velopment many authors propose Bayesian predictive probability as a more appropriate alternative to frequentist
power, be it for interim analyses or across phases of development, and espouse its use as part of net present value
calculations (O’Hagan et al. 2005; Trzaskoma and Sashegyi 2007; Chuang-Stein 2006). The claim is that one must
assume a particular parameter value (population-level treatment effect) is true in order to calculate power, whereas
a Bayesian approach considers the parameter itself as a random variable so that Bayesian probability of success exists
unconditionally on the parameter of interest (Temple and Robertson 2021; Crisp et al. 2018; Ciarleglio and Arendt
2017; King 2009). Examples abound comparing probability of success calculations to misguided evaluations of the
power curve as evidence that power is overly optimistic or anti-conservative when used in decision making (Saville
et al. 2014). While there is certainly value in predicting a clinical trial result, and the topic of prediction intervals
and prediction densities is established in the frequentist paradigm as well (Johnson 2021; Shen et al. 2018), the
confidence or credible level associated with a prediction interval relates to the ability to predict a random event
using observed data without conditioning on parameter values. It is not a probability statement about the random
event itself. Viewed this way, the term probability of success is a misnomer and may not be the primary quantity
of interest for decision making in drug development. This confusion is due in large part to the relaxed definition of
probability used in Bayesian inference where a parameter (e.g. the population-level treatment effect) is treated as
an unrealized or unobservable realization of a random variable that depends on the observed data, and probability
is reinterpreted as measuring the subjective belief of the experimenter. The key to appreciate our approach for
decision making is adopting an objective definition of probability although we do not know the population-level
parameter of interest this does not mean it is a random variable, and our estimation, inference, and decision making
should not treat it as random. A major focus of this manuscript is to frame power not as an assumed parameter
but as a parameter that one can estimate and infer, and to demonstrate that Bayesian probability of success is not
a “fix” for power. An excellent critique of probability of success has been provided by Carroll (2013) who offers a
summary of its features using a simple normal model and an example involving a hazard ratio while considering that
the phase 2 posterior is centered at the unknown fixed true treatment effect to be investigated in phase 3. This has
incredible value for understanding the properties of probability of success, but their investigation inherently treats
probability of success as a population-level quantity that exists in addition to power. Our contribution is to build on
this discussion by interpreting probability of success as a point estimate of power, and to argue in favor of Go/No-Go
decisions that instead utilize a transformation-invariant estimate of power as well as inference on power. The most
critical point we demonstrate is that if inference on power is ignored the decision maker may otherwise be indiffer-
ent and unwittingly exposed to risk when choosing programs to progress to phase 3 based on point estimates of power.
Bayesian probability statements are visually depicted through prior and posterior distributions, distribution es-
timates of an unknown quantity of interest, and are powerful tools for visualizing and pooling prior information
and expert opinion with current data. Spiegelhalter et al. (2004) illustrate this and highlight its application to
forming stopping rules for early efficacy, futility, and safety, as well as planning future studies. Under the frequentist
paradigm the analogous distribution estimate is a p-value function, a sample-dependent ex-post object that depicts
all possible p-values and confidence intervals one could construct given the observed data for a parameter of interest.
This p-value function is supported on the parameter space and has the appearance of a Bayesian prior or posterior,
but does not depict a random parameter. Instead, the p-value function summarizes all possible inference one could
perform based on a given data set using a particular hypothesis test or confidence interval method. P-value functions
allow for meta-analysis (Xie et al. 2011) and can be used to capture and incorporate expert opinion (Xie et al. 2013),
providing a powerful visual tool for decision making across all phases of clinical development. When the p-value
is uniformly distributed under the null and the p-value function has the appearance of a distribution function on
the parameter space it is often referred to as a confidence distribution (Xie and Singh 2013; Schweder and Hjort 2016).
The original idea for the confidence distribution dates back to Sir Ronald Fisher, who initially termed it the
fiducial or “faith” distribution. He viewed the p-value as a continuous measure of evidence drawing inspiration from
Jeffreys’ work in objective Bayesianism, and opposed the Neyman-Wald approach to hypothesis testing. He also
opposed the other end of statistical inference using personal or subjective probabilities championed by Savage and
2
de Finetti (Efron 1998). Fisher developed likelihood-based inference aiming to combine information from different
sources with an emphasis on model coherence and optimality, and intended the fiducial distribution to be a universal
approach for Bayesian-like inference in the absence of a prior distribution. Textbooks and institutions ultimately
adopted the Neyman-Wald approach to hypothesis testing, obscuring the true merit of the p-value. However, in the
decades since there has been renewed interest in the topic using a purely frequentist interpretation (Efron 1998),
and the confidence distribution has become a remarkable achievement inspired by Fisher.
The novelty of this manuscript is on the interpretation and visualization of statistical inference, the mathematical
considerations for constructing a p-value function for power, and the statistical evaluation of performing inference
on power in comparison to existing methods for decision making in drug development. Section 2 formally defines a
p-value function linking it to hypothesis testing and meta-analysis, and extends these developments to inference on
power. Section 3 demonstrates the use of p-value functions in the decision making framework across phases 2 and
3 of pharmaceutical development. Desired inference on phase 3 power is used to reverse engineer the hypothesis,
significance level, and sample size required in phase 2. In Section 4 this approach is evaluated through simulation
alongside decision rules using probability of success and a typical power calculation, and a discussion is provided in
Appendix C on why adjustment for multiple comparisons is not required if one adopts a Fisherian point of view.
SAS code is provided in Appendix G.
2 Methods
2.1 P-value Functions
A confidence interval for a parameter θis a set of plausible hypotheses for θ, given the data X=xobserved.
Two well-known and often related methods for producing confidence intervals are inverting a family of hypothesis
tests and using a pivotal quantity. If an upper-tailed test is inverted for all values of θin the parameter space,
the resulting function of upper-tailed p-values is called an upper p-value function. The most familiar example of
inverting a hypothesis test uses the likelihood ratio test. Under H0:θ=θ0when mild regularity conditions are met
the likelihood ratio test statistic 2logλ(X, θ0) follows an asymptotic χ2
1distribution (Wilks 1938). The one-sided
p-value testing H0:θθ0,
H(θ0,x) =
1Fχ2
12logλ(x, θ0)/2 if θ0ˆ
θmle
1 + Fχ2
12logλ(x, θ0)/2 if θ0>ˆ
θmle,
(1)
as a function of θ0and the observed data xis the corresponding upper p-value function, where ˆ
θmle is the maxi-
mum likelihood estimate of θand Fχ2
1(·) is the cumulative distribution function of a χ2
1random variable. Typically
the naught subscript is dropped and xis suppressed to emphasize that H(θ) is a function over the entire param-
eter space. Each value in the parameter space takes its turn playing the role of null hypothesis and hypothesis
testing (akin to proof by contradiction) is used to infer the unknown fixed true θ. This recipe of viewing the p-
value as a function of θgiven the data produces a p-value function for any hypothesis test. For instance, when
the sampling distribution of an estimator g{ˆ
θ(X)}for some link function g{·} is well approximated by a normal
distribution, an upper p-value function for testing hypotheses about θis easily produced by inverting a Wald test,
H(θ) = 1 Φ[g{ˆ
θ(x)} − g{θ}]/ˆse, where ˆse is a model-based or sandwich estimate for the standard error of
g{ˆ
θ(X)}and Φ(·) is the cumulative distribution function of the standard normal distribution. See Appendix B.1 for
further discussion on link functions. Alternatively, without necessarily appealing to regularity conditions and stan-
dard asymptotics one can derive or approximate the sampling distribution of an estimator g{ˆ
θ(X)}and numerically
invert its cumulative distribution function while profiling any nuisance parameters to ultimately construct an upper
p-value function H(θ). Profiling nuisance parameters means to replace them with estimates calculated under the re-
stricted null space (Pawitan 2001). This would correspond to likelihood ratio and score methods (see Appendix B.3).
3
The lower p-value function H(θ) can be analogously defined that contains all lower-tailed p-values as a function
of θ. One can then define the confidence curve of one-sided p-values as
C(θ) =
H(θ) if θˆ
θ(x)
H(θ) if θˆ
θ(x).
This definition differs slightly from others (Thornton and Xie 2020; Xie and Singh 2013; Birnbaum 1961) and may
take on two values at θ=ˆ
θ(x) forming a jump discontinuity. The confidence curve defined above can accommodate
a discrete sampling distribution where H(θ)̸= 1 H(θ), and it can also accommodate a discrete parameter space.
The p-value or significance level depicts the ex-post sampling probability of the observed result or something more
extreme if the hypothesis is true and represents the plausibility of the hypothesis given the data. One can identify
a 100(1 α)% confidence interval by finding the complement of those hypotheses for θwith αsignificance i.e., by
finding those hypotheses for which the observed result is within a 100(1 α)% margin of error.
Many times, though not always, the upper p-value function forms a cumulative distribution function on the
parameter space. In these settings if the sample space is continuous so that H(θ) = 1 H(θ) and the p-value
is uniformly distributed under the null, H(θ) is often referred to as a confidence distribution function and can be
depicted by its density h(θ) = dH(θ)/dθ (Xie and Singh 2013). Singh et al. (2007) and others highlight an interest-
ing coincidence that when a plug-in estimated sampling distribution or a bootstrap estimated sampling distribution
approaches a normal distribution (symmetric shift model) with increasing sample size, it is a valid asymptotic con-
fidence distribution. Similarly, when a normalized likelihood (proper Bayesian posterior from improper “” prior)
(Efron 1986) approaches a normal distribution with increasing sample size it too is a valid asymptotic confidence
distribution (Fraser 2011; Efron 1986; Xie et al. 2013; Xie and Singh 2013). In settings where regular asymptotics
do not apply these distribution estimates often still work well as approximate p-value functions. Even when the
p-value is not uniformly distributed under the null or H(θ) does not necessarily form a distribution function on the
parameter space (or both), the p-value function and confidence curve might still be informally called a distribution
estimate or a confidence distribution. Appendix A provides the formal definition of a confidence interval (Casella
and Berger 2002) and confidence distribution function (Xie and Singh 2013; Xie et al. 2013), and an example is
discussed in Appendix F.1 involving a discrete parameter space.
The p-value function, confidence curve, and confidence density are useful for graphically representing frequentist
inference. They are also useful for performing a meta-analysis. For pooling prior information with current data, the
p-value from a fixed effect meta-analysis combining two studies may take the form
p(c)= Φ 1
ˆse1Φ1p1+1
ˆse2Φ1p2
1
ˆse2
1
+1
ˆse2
21/2!,(2)
where p-values p1and p2are back-transformed into z-scores, inversely weighted by their corresponding estimated
standard errors ˆse1and ˆse2, and transformed once again into a combined p-value. Φ(·) is the cumulative distribution
function of the standard normal distribution, and Φ1(·) is the corresponding quantile function. Viewing each p-
value as a function of the hypothesis for θbeing tested, this same convolution formula can be applied to p-value
functions, i.e.
H(c)(θ)=Φ 1
ˆse1Φ1H1(θ)+1
ˆse2Φ1H2(θ)
1
ˆse2
1
+1
ˆse2
21/2!.(3)
Even in non-normal settings this formula works well to preserve Fisher information (Xie et al. 2013). Alternatively,
using likelihood-based methods one could multiply the historical and current likelihoods together to form a joint
likelihood and use this to invert a hypothesis test. This multiplication of independent likelihoods is precisely what
Bayes’ theorem accomplishes (plus normalization), without the inversion of a hypothesis test. In more complicated
4
situations involving a multi-dimensional parameter space, Equations (2) and (3) highlight the notion of division of
labor allowing one to avoid construction of an all-encompassing model (Efron 1986).
The meta-analytic p-value function above treats the two experiments as a single larger experiment. When
investigating the plausibility of H0:θθ0the meta-analytic p-value could instead be defined as H(c)(θ) = H1(θ)·
H2(θ). This treats each experimental result as a separate observation and depicts the upper-tailed probability of
observing a result as or more extreme than that witnessed in experiment 1 and experiment 2, given hypotheses
of the form H0:θθ0. The meta-analytic p-value function of lower-tailed “or” probability statements testing
hypotheses of the form H0:θθ0can be analogously constructed as H(c)(θ) = H
1(θ) + H
2(θ)H
1(θ)·H
2(θ).
Appendix F provides additional examples showing the construction of a confidence density and its usefulness in a
meta-analysis. It also provides further discussion on Bayesian and frequentist interpretations of probability (Good
1965, 1966; Schr¨odinger and Trimmer 1980; Ballentine 1970).
2.2 Power and Probability of Success
The power curve depicts the ex-ante sampling probability of the test statistic (testing a single research hypothesis)
as a function of all unknown true population-level parameters. This long-run sampling probability forms the level of
confidence in the next experimental result. The power curve is typically constructed while estimating the unknown
true population-level nuisance parameters based on a literature review of external studies. (The estimated power
curve described above for an upper-tailed test can be approximated using an upper p-value function, with observ-
able nuisance parameter estimates equal to the estimated population-level values from the literature review and an
observable treatment effect equal to the minimum detectable effect corresponding to the single research hypothesis.
See Appendix B.4 for further discussion and the SAS code in Appendix G.)
A p-value function H(θ) containing inference on θfrom an external study can be used to obtain a p-value
function for hypotheses concerning the power of a future study. Since the estimated power function is a monotonic
transformation of theta, β(θ), a change of variables in H(θ) produces a p-value function in terms of power,
H(θ) = Hβ1{β(θ)},(4)
where β1is the inverse power function. In practice this can be solved numerically so that the inverse power func-
tion is not required. That is, for a given hypothesis for θthe value H(θ) is the p-value function assigned to β(θ).
This applies the estimated power function to confidence limits for θto construct confidence limits for power, and
is captured as a p-value function. Regardless of the test used to construct H(θ), Equation (4) can be seen as a
g{θ}=β1β{θ}or g{β}=β1{β}link function to produce inference on power. In terms of a Wald test for θ
using an identity link, a Wald test for power using Equation (4) would be H(β) = 1 Φ([ ˆ
θβ1{β}]/ˆse). Using
the invariance property, ˆ
βmle =β(ˆ
θmle) is the maximum likelihood estimate for power and in general ˆ
β=β(ˆ
θ) is
a transformation-invariant estimate of power. To fully account for having estimated unknown nuisance parameters
from external studies to estimate power, one could utilize a transformation of the power point estimate along with
the delta method and invert a t- or Wald test to ultimately construct H(β), under mild regularity conditions and
standard asymptotics (see Appendix B.5 for further mathematical considerations). Alternatively, without necessarily
appealing to regularity conditions and standard asymptotics one can derive or approximate the sampling distribution
of the estimator for power and numerically invert its cumulative distribution function while profiling the nuisance
parameters to construct H(β).
When H(θ) forms a distribution function on the parameter space one can calculate the Bayesian quantity prob-
ability of success, or assurance,
ˆ
βpos =Zβ(θ)·dH(θ) (5)
=Zβ·dH(β).(6)
To the Bayesian, H(θ) is constructed using Bayes’ theorem and is said to measure belief about θfor the treatment
5
under investigation so that probability of success is not an estimate of the long-run probability of achieving end-of-
study success, it is the belief about achieving end-of-study success. A value of 0.5 represents complete uncertainty in
belief or a lack of knowledge. Probability of success is un-conditional on θ, but it does depend on the belief about θ.
To the frequentist, there is a single true θfor the treatment under investigation and (5) is the average of all possible
power estimates over the ex-post sampling probability in H(θ). Despite the integration over θ, (5) is not uncondi-
tional on θ. Equation (6) is a point estimate of power, which by definition is conditional on θ. Although consistent
as an estimator, it is biased towards 0.5 since θis a fixed quantity. The uncertainty around having estimated power
using ˆ
βmle and ˆ
βpos is not ignored, it is displayed in the p-value function for power.
Probability of success is typically approximated through numerical integration by sampling from H(θ). However,
once H(β) is constructed as outlined above, probability of success can be easily approximated using a Riemann sum
ˆ
βpos Pβ·H(β)
PH(β)
=Pβ(θ)·H(θ)
PH(θ).(7)
This can be accomplished in a single data step and a call to Proc Means with a weight statement, and computes in
a fraction of a second. When considering two separate studies, e.g. phase 2 and phase 3 of a clinical development
plan, probability of success can be defined as
ˆ
βpos
2,3=Zβ2(θ)β3(θ)·dH(θ),(8)
where β2and β3are phase 2 and phase 3 power respectively. This is easily approximated as in Equation (7). Reading
Equation (8) from left to right, for a given θ,β2(θ)β3(θ) is the power of succeeding in both phase 2 and phase 3,
averaged over what we currently infer about θ. In this quantity the truth does not change from phase 2 to phase 3,
and probability of success is based solely on what we infer now about θ. In a fully Bayesian framework an unknown
nuisance parameter would also be considered a random variable centered at an estimated value, analogous to the
delta method described above. This requires an additional layer of averaging when calculating probability of success,
but typically has little impact on the result.
3 Decision Making Across Pharmaceutical Development
3.1 Decision Rules for End-of-Study Success
Regardless of which paradigm one operates under, hypothesis testing is the very heart of quantitative decision
making in pharmaceutical development. The null value to be tested in each phase depends not only on regulatory
requirements, but also on what is clinically meaningful and commercially viable. When showing a treatment effect
over placebo or an active comparator, the null value need not be zero and the significance level need not be 0.05.
The example below uses confidence curves to visualize the success criteria in a phase 2 and 3 clinical development plan.
Example: A phase 2 and 3 development plan is being created for an asset to treat an immuno-inflammation
disorder. Phase 3 is planned as a non-inferiority study using a difference in proportions on a binary responder index.
The non-inferiority margin is set by the regulatory agency at 0.12, as is the one-sided significance level of 0.025.
Phase 2 is a dose finding study on a continuous endpoint. This study also collects data on the responder index
and includes a control arm to estimate the difference in proportions planned for phase 3. A stricter non-inferiority
margin of 0.05 is considered in phase 2, but since the sample size in phase 2 is typically smaller than in phase 3, a
larger one-sided significance level of 0.20 is tolerated. Based on a literature review the estimated response proportion
for the comparator is 0.43 with N=1200.
For each study let Xctrl Bin(nctrl , pctrl) be the number of responders out of nctrl subjects in the control group
and Xactive Bin(nactive , pactive) be the number of responders out of nactive subjects in the active group, with
6
θ=pactive pctrl , and pctrl , θ Θ. Then the corresponding likelihood function for each study is L(θ, pctrl )
(pctrl)xctrl (1 pctrl)nctrlxctrl (pctr l +θ)xactive (1 pctrl θ)nactive xactive . Figure 1 uses confidence curves resulting
from likelihood ratio tests on the population-level difference in proportions θto demonstrate what the minimum
phase 2 and phase 3 success criteria defined above look like in terms of a particular experimental result. Nearly
identical confidence curves can be produced by inverting Wald tests using identity links. The left panel is based on
N=90 subjects per arm with an estimated response rate of 0.43 on the control arm, and an estimated difference in
proportions of 0.01 (minimum detectable effect). This particular experimental result produces a p-value just under
0.20 when testing against the 0.05 non-inferiority margin, H0:θ≤ −0.05. This ex-post sampling probability forms
the level of confidence that θis less than or equal to 0.05. That is, one must be at least 80% confident that the
true difference in proportions is greater than 0.05 in order to succeed in phase 2. As evidenced by the left panel
in the figure below, declaring success for this experimental result is nearly equivalent to a test about the 0.12
non-inferiority margin at the 0.025 significance level. The right panel is based on N=365 subjects per arm and an
estimated difference in proportions of 0.05. This results in a p-value just under 0.025 when testing H0:θ≤ −0.12,
or equivalently, one must be at least 97.5% confident that the true difference in proportions is greater than 0.12.
The phase 2 null hypothesis was chosen as the value at which phase 3 power is 50%. This will be seen more clearly
in Section 3.2.2. See Appendix B.3 for the mathematical considerations regarding these decision rules and Appendix
G for the corresponding SAS code. Such notation is suppressed here for ease of reading.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
True Difference in Proportions
Phase 3 Success
Phase 2 Success
-0.2 -0.1 0.0 0.1 0.2
0.0
0.1
0.2
0.3
0.4
0.5
Confidence Curve
Figure 1: Phase 2 likelihood ratio test of H0:θ≤ −0.05 with N=90 per arm at α=0.2. Phase 3 likelihood ratio test of H0:
θ≤ −0.12 with N=365 per arm at α=0.025.
While it is important to have a clear definition of technical success before conducting a trial, Figure 1 makes
it clear there is nothing materially different between a p-value of 0.024 and 0.026, or 0.19 and 0.21 and so on.
This allows for flexibility in decision making and reminds us that no hypothesis is proven false with a single small
p-value, nor is it proven true with a large one. All we can do is provide the weight of the evidence. This resonates
with the American Statistical Association (ASA) statement on statistical significance and the p-value (Wasserstein
et al. 2016). It also reflects the original intentions of Fisher’s statistical significance and inductive reasoning using
a frequentist interpretation of probability (Lehmann 1993). Equally important as the end-of-study success rule is
the power of achieving it. Both of these factor into the Go/No-Go decision and it is not enough to provide a point
estimate of power. One must also perform inference on power.
3.2 Priors, Power, and Probability of Success
3.2.1 Elicitation
Expert opinion can be used to perform inference on the power of a future study when no historical data is available
(EFSA 2014). Many times expert opinion is elicited through a “chips-in-bins” activity to construct a distribution
estimate of the true treatment effect (Oakley and O’Hagan 2010). This of course is inadmissible as scientific evidence,
but allows the Bayesian to explore belief probabilities and allows the frequentist to consider inference based on hypo-
thetical experimental evidence. The available knowledge and information can be seen as exchangeable virtual data,
7
and each expert considers all possible point estimates that data like this could give rise to, essentially bootstrapping
the sampling distribution of the estimator (Xie et al. 2013). These bootstrapped sampling distributions are then
averaged in some way to form a single distribution. If the experts were all bootstrapping from the same informa-
tion their distributions would be nearly indistinguishable, but this is rarely the case. The heterogeneity between
the experts’ distributions suggests an extra layer of bootstrap sampling. Each expert’s perspective represents a
bootstrapped sample of the available information, from which they bootstrap repeatedly to form their distribution.
This explains the heterogeneity, and in theory the heterogeneity should be “averaged out” when these distributions
are combined. The combined sampling distribution itself may be considered an approximate p-value function, but
can also be used to invert a hypothesis test. See Appendix F.3 for the connection between an estimated sampling
distribution and a p-value function.
Example continued: Six experts were assembled to elicit a distribution estimate for the difference in proportions
of the responder index in the target patient population. After a briefing on the literature to date all six experts’
distributions were averaged to form a single estimated sampling distribution with a mean of 0.02. This mean was
used as the maximum likelihood point estimate for a likelihood ratio test of the difference in proportions based on
N=350 on the investigational product, a 0.43 response rate in the control arm with N=1200, and inverted to form
a confidence curve. The virtual or effective sample size was determined by the variance of the combined sampling
distribution and the literature review (see Appendix B.2).
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0.0
0.2
0.4
0.6
0.8
1.0
Confidence Curve
Confidence Curve
Phase 2 and 3 Power
Phase 3 Power
Phase 2 Power
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 2: Phase 2 power curve testing H0:θ≤ −0.05 with N=90 per arm at α=0.2. Phase 3 power curve testing H0:
θ≤ −0.12 with N=365 per arm at α=0.025. Confidence curve for θbased on historical data and expert opinion.
Figure 2 shows the power curves for the success criteria outlined in Section 3.1, the combined power curve (prod-
uct) for success in both phase 2 and phase 3, and the elicited confidence curve for the difference in proportions
described above. The power curves in Figure 2 are constructed while estimating the unknown true population-level
response rate on the control therapy as 0.43 based on the literature review, approximated using the upper p-value
function from a likelihood ratio test. This approximation is nearly equivalent to using an upper p-value function
from a Wald test. (See Appendix B.7 for how to extrapolate the estimated power curve between endpoints or control
groups across phases of development.)
Figure 3 shows the resulting confidence curves for power using Equation (4) and probability of success calculations
using (7) and (8) based on the elicitation and literature review shown in Figure 2. Figures 2 and 3 suggest a larger
sample size in phase 2 would be warranted to increase the maximum likelihood and probability of success estimates
for power in phase 2 and overall. If 80% or 90% power is desired in the phase 3 study its sample size would need to be
increased as well. However, these statements ignore the inference in the confidence curves (see Figure 4). The bias
of ˆ
βP oS makes it a useful summary measure since a relatively high or low value indicates the inference is centered
near high or low values of power respectively, but this still does not provide a complete picture. For instance, had
the elicited confidence distribution been wider and shifted to the right probability of success would increase at most
8
sample sizes, but this produces a U-shaped confidence density around power (Rufibach et al. 2016) (see Appendix
D). Since the confidence curve displays the same inference and is always concave it may be a better choice than the
confidence density as in Figure 3 for displaying inference on power. Of course the elicitation is merely hypothetical
evidence. What matters more is inference based on real data. For this, one will need to conduct the phase 2 study.
0.0 0.2 0.4 0.6 0.8 1.0
Power
0.0
0.2
0.4
0.6
0.8
Confidence Curve
Phase 2 and 3 CC
Phase 3 CC
Phase 2 CC
Phase 2 Power: mle=0.318, pos=0.338
Phase 3 Power: mle=0.78, pos=0.735
Phase 2 and 3 Power: mle=0.248, pos=0.276
Figure 3: Solid lines depict resulting confidence curves for power in phase 2, phase 3, and overall based on the elicitation.
Peaks correspond to maximum likelihood estimates of power.
N=45 N=65 N=85 N=115 N=155 N=205 N=275 N=365 N=465 N=600 N=765
Sample Size per Arm
0.2
0.4
0.6
0.8
1.0
Phase 3 Power
Probability of Success Estimate
Maximum Likelihood Estimate
foot
foot
Figure 4: Estimated phase 3 power when testing H0:θ≤ −0.12 at α=0.025 at various sample sizes with 80% two-sided
confidence limits based on the elicitation.
3.2.2 Conditioning on Phase 2 Success
If one is satisfied with the inference on phase 3 power given minimal success in phase 2, one would be satisfied for
any other successful phase 2 result. Recall the estimated phase 2 power curve was approximated using a p-value
function. The confidence curve depicting minimum success in phase 2 is simply a re-expression of this p-value
function. This is depicted in Figure 5 and shows that the phase 2 decision rule from Figure 1 produces inference
around high values of phase 3 power, but still assumes some risk. While the maximum likelihood and probability of
success point estimates for phase 3 power are 95.9% and 78.1% respectively, one can claim with only 80% confidence
that the power of the phase 3 study is no less than 50% given minimal success in phase 2 (p-value = 0.2 testing
9
H0:β3(θ)0.5). In our view ensuring phase 3 power is no worse than a coin toss conditional on passing phase
2 is a good rule of thumb. If stronger inference on phase 3 power is desired given minimal success in phase 2, one
could simply increase the phase 3 sample size. Alternatively, one could adjust the phase 2 significance level and null
hypothesis, and select the phase 2 sample size based on an acceptable phase 2 minimum detectable effect. Once
the phase 2 study results are available, two-sided confidence limits for phase 3 power can be provided alongside the
maximum likelihood point estimate. Conversely, the p-value testing H0:β3(θ)0.5 or the level of confidence for
which phase 3 power is greater than 50% can be provided alongside the point estimate. Figures 3 and 4 could be
reproduced using phase 2 inference instead of the elicitation. As mentioned in the introduction, if inference on power
is ignored the decision maker may otherwise be indifferent and unwittingly exposed to risk when choosing programs
to progress to phase 3 based on point estimates of power. If probability of success or assurance is utilized as an
estimate of the probability of achieving end-of-study success, we recommend not presenting it as an unconditional
quantity that transcends power and does not require inference. If probability of success or assurance is utilized as
the unconditional confidence level of a prediction interval, we recommend not presenting it as the probability of
achieving end-of-study success despite its namesake. See Section 4 for further discussion on interpreting prediction
intervals.
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0.0
0.2
0.4
0.6
0.8
1.0
Confidence Curve
Minimum Phase 2 Success
Phase 3 Power
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 5: Phase 3 power curve testing H0:θ≤ −0.12 with N=365 per arm at α=0.025. Confidence curve for θfrom the
approximate phase 2 power curve testing H0:θ≤ −0.05 with N=90 per arm at α=0.2.
The inference above is conditional on minimal success in phase 2 alone. One might also be interested in perform-
ing inference on phase 3 power that incorporates the elicited distribution estimate, though this should not weigh
too heavily on decision making. Often the phase 3 probability of success calculation is estimated through simulation
while treating the elicited h(θ) as a probability distribution for θ, and is conditioned on those Monte Carlo runs
where the phase 2 success criteria is met. This subsetting amounts to multiplying the phase 2 power curve by the
elicited h(θ) and normalizing, β2(θ)·h(θ)/Rβ2(θ)·h(θ). When θis considered random this density is conditional
on the elicited h(θ) and on passing phase 2, but without conditioning on a particular value of θnor a particular phase
2 result. This density, sometimes referred to as a pre-posterior, and the phase 3 power curve produce the conditional
probability of success, or conditional assurance (Temple and Robertson 2021), estimate of power. This is similar to
though not exactly the same as multiplying the elicited H(θ) by the approximate estimated phase 2 power curve
(minimum end-of-study success upper p-value function) and differentiating, dH(θ)·β2(θ)/dθ. This same inference
can be displayed as a confidence curve. See curve (iii) in Figure 6 below. The fixed θinterpretation of this curve is
the upper-tailed probability of observing a result as or more extreme than the elicited test statistic and a result as or
more extreme than the minimum detectable effect in phase 2, given hypotheses of the form H0:θθ0. This same
curve depicts lower-tailed “or ” probability statements testing hypotheses of the form H0:θθ0. In this inference
the elicited point estimate and the phase 2 point estimate are treated as separate observations. The median of this
p-value function (two-sided p-value = 1) can be used as a point estimate for θand to form a point estimate for phase
3 power. Alternatively, one could convolve the approximate estimated phase 2 power curve (minimum end-of-study
success upper p-value function) with the elicited H(θ) using Equation (3) to form the updated p-value function for
the treatment effect. See curve (iv) in Figure 6 below. This convolution treats the elicitation and the phase 2 study
10
as a single larger study. See Appendix E for additional figures.
This process of performing inference on power can be extended to include multiple phase 2 power curves, with
or without the elicited H(θ), and sequentially updating the p-value function for the treatment effect by multiplying
or convolving the p-value functions as described above. For example, inference on phase 2a, phase 2b, phase 3, and
overall power conditional on passing a pilot study; inference on phase 2b, phase 3, and overall power conditional on
passing the pilot and phase 2a studies; inference on phase 3 power conditional on passing the pilot and phase 2a and
2b studies. If one is dissatisfied with the inference on phase 3 power after the phase 2 study results are observed, one
could consider increasing the phase 3 sample size. This will steepen the phase 3 power curve relative to the phase
3 null hypothesis by lowering the minimum detectable effect, and improve the inference on phase 3 success. Figures
3 and 4 could be reproduced using phase 2 inference instead of the elicitation. Of course one could also consider
conducting an additional phase 2 study and multiply or convolve the results with the other observed phase 2 p-value
functions. The observed phase 2 results could be used to update the estimated phase 3 power curve by combining
estimates of population-level nuisance parameters, and inference on phase 3 power could be constructed using the
delta method.
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0.0
0.2
0.4
0.6
0.8
1.0
Confidence Curve
(v) Phase 3 Power
(iv) Convolution
(iii) Multiplication
(ii) Minimum Phase 2 Success
(i) Elicited Confidence Curve
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 6: (i) Elicited confidence curve. (ii) Confidence curve for θfrom the approximate phase 2 power curve testing H0:
θ≤ −0.05 with N=90 per arm at α=0.2. (iii) Multiplication of elicited H(θ) and phase 2 power curve, displayed as a
confidence curve. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, displayed as a confidence curve.
(v) Phase 3 power curve testing H0:θ≤ −0.12 with N=365 per arm at α=0.025.
4 Simulation Study
Here we consider a simulation scenario that closely resembles Figure 5 to investigate the performance of decision
rules based on point estimates and confidence intervals for power. Without including any external or elicited data
a phase 2 sample of size N= 90 per arm is simulated and used to estimate the phase 3 power curve with N=365
per arm investigating a difference in proportions θby testing H0:θ≤ −0.12 at the 0.025 significance level using
a likelihood ratio test. Operating characteristics of decision rules for progression into phase 3 based on the max-
imum likelihood and probability of success estimates of power are presented in Table 1, as well as a decision rule
based on a one-sided 80% confidence interval for power using the approach corresponding to Equation (4). Three
treatment effect scenarios are investigated: θ=0.12, θ=0.05, and θ= 0. In each scenario the unknown true
population-level control therapy response rate is 0.43. The decision rule labeled ‘PoS0.60’ represents a Go decision
into phase 3 if the probability of success estimate of power is greater than or equal to 0.60. Likewise for ‘PoS0.75’
and ‘PoS0.80’. The rule labeled ‘MLE0.80’ represents a Go decision into phase 3 if the maximum likelihood
estimate of power is greater than or equal to 0.80, and the rule labeled ‘80% Conf. β3>0.50’ represents a Go
decision into phase 3 if the test H0:β30.50 is significant at the 0.20 level.
11
The two-sided 60% confidence interval for phase 3 power based on phase 2 results using the approach correspond-
ing to Equation (4) covered 60.4%, 59.2%, and 59.6% of the time when the true power was 0.025, 0.50, and 0.91,
respectively. Comparatively, the two-sided 60% confidence interval based on a Wald test using the delta method
with a g{·} = Φ1{·} transformation of the maximum likelihood estimate of power covered 60.5%, 59.2%, and 59.6%
of the time. Table 1 shows that over 10,000 simulations the decision rule based on the one-sided 80% confidence
interval made a Go decision into phase three 19.3% of the time if θ=0.05 and β3= 0.50. This corresponds
with the definition of the Go rule. For the same simulation scenario the ‘PoS0.60’, ‘PoS0.75’, and ‘PoS0.80’
decision rules made a Go decision into phase three 34.0%, 15.2%, and 10.4% of the time respectively. These results
demonstrate that it is not immediately obvious how the probability of success estimate corresponds to the operating
characteristics of a decision rule in relation to the true value of power. Compared to the decision rule based on
the maximum likelihood estimate, the confidence interval rule works to guard against making a Go decision if the
true power is low. This of course is the intention behind the rules using the probability of success estimate, but
the confidence interval rule does so with easily understood and controllable operating characteristics that define the
rule itself. Investigating the operating characteristics of several probability of success decision rules via simulation
and selecting the rule with desirable characteristics is no different in principle from forming a confidence interval
rule. One could view a probability of success decision rule as the confidence level of a prediction interval for the
phase 3 test statistic, which does have easily understood operating characteristics, e.g. a one-sided 75% prediction
interval will correctly predict the phase 3 result 75% of the time regardless of the unknown fixed phase 3 power.
This would correspond to a ‘PoS0.75’ decision rule, but this confidence level is a statement about both the phase
2 and phase 3 sampling variability and it is impossible to tease this apart. In contrast, for inference on phase 3
power the confidence level relates only to phase 2 sampling variability, and hypotheses for phase 3 power pertain
only to phase 3 uncertainty. This makes inference on power much more meaningful and easier to interpret, which
should lead to better decision making compared to predictive inference on success.
Table 1: Simulation Results
Unknown True 80% Conf.
Phase 3 power PoS 0.60 PoS 0.75 PoS 0.80 MLE 0.80 β3>0.50
β3(θ=0.12) = 0.025 0.091 0.023 0.015 0.079 0.034
β3(θ=0.05) = 0.50 0.340 0.152 0.104 0.329 0.193
β3(θ= 0) = 0.91 0.599 0.366 0.263 0.606 0.428
Operating characteristics of decision rules over 10,000 simulations.
The results of the decision rule based on the confidence interval for power in Table 1 should be clear from inspect-
ing Figure 2 since the estimated power curves in this figure match the unknown true power curves in the simulation
study. Considering the results from Table 1, the confidence interval rule produces a significant result 3.4% of the
time when testing H0:β30.50 if θ=0.12 and β3= 0.025. The phase 3 power curve in Figure 2 evaluated at
θ=0.12 is 0.025 and the phase 2 power curve is approximately equal to 0.034. Similarly, considering again the
results from Table 1 the confidence interval rule produces a significant result 42.8% of the time when testing H0:
β30.50 if θ= 0 and β3= 0.91. The phase 3 power curve in Figure 2 evaluated at θ= 0 is 0.91 and the phase
2 power curve is approximately equal to 0.428. Increasing the phase 2 sample size will improve upon this 0.428
probability of making a Go decision if the true phase 3 power is 0.91, without altering the performance of the rule if
the true phase 3 power is 0.50. If different operating characteristics under H0:β30.50 are desired, or if a different
hypothesis is of interest, one can construct a different rule.
In practice we will not know which point on our estimated power curve corresponds most closely with the true
value of power, and we will not actually repeat each experiment 10,000 times; however, the frequency probabilities
concerning the experiment contained in the p-value function for power as a function of the hypothesis and the
12
observed data provide the experimenter confidence when performing inference and making a decision. As alluded
to in Section 3.1, decision making should be flexible. There may be an experimental result with a small p-value for
which it should be decided not to progress into phase 3 based on, say, market data, safety data, etc., and vice versa.
Ultimately it is up to the experimenter to make an informed decision, and the confidence provided by the p-value is
part of that decision.
Power Point Estimate
Percent
0.00 0.25 0.50 0.75 1.00
0.00 0.25 0.50 0.75 1.00
0
20
40
60
0
20
40
60
0
20
40
60
Probability of Success Estimator
Maximum Likelihood Estimator
=0.91
=0.50
=0.025
Figure 7: Sampling distributions of the maximum likelihood and probability of success estimators of power over 10,000
simulations.
Figure 7 shows the sampling distributions of the maximum likelihood and probability of success estimators of
power over the 10,000 simulations. In repeated sampling the probability of success estimator tends to produce a value
not far from 0.50, whether the true power of the phase 3 study is 0.91, 0.50, or 0.025. In this setting the maximum
likelihood estimator of power is median-unbiased, producing estimates centered around the true value of power. The
sampling distribution of the Φ1transformed maximum likelihood estimator of power is shown in Appendix B.6.
The inverse cumulative distribution function of the standard normal distribution works incredibly well at stabilizing
the variance and producing an approximately normal sampling distribution. This allows for constructing a p-value
function for power using a Wald test with the delta method instead of Equation (4).
5 Closing Remarks
The p-value function is a remarkable visual tool for displaying quantitative decision rules and study results, and can
even be used to display inference on power. The Bayesian quantity probability of success or assurance, whether viewed
as the confidence level of a prediction interval, the result of a biased estimator of power, or a philosophical value,
may not be the primary quantity of interest for decision making in drug development. Although our demonstrations
focused on an exponential family model with routine asymptotic tests, the construction of a p-value function for
power is not limited to this setting. A natural extension of our work would be to perform inference on power by
jointly modeling correlated endpoints, and perhaps even constructing a confidence region for power. While not
demonstrated herein, confidence densities and confidence curves can also be used for conducting interim analyses.
Stopping rules for early efficacy based on p-values would be displayed similarly to Figure 1 using the data at interim,
while stopping rules for futility based on inference of end-of-study power given the data at interim would resemble
Figure 6 with the p-value function for the treatment effect determined, at least in part, by the interim data.
13
Data Sharing
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
Ballentine, L. E. (1970). The statistical interpretation of quantum mechanics. Reviews of Modern Physics 42 (4),
358.
Birnbaum, A. (1961). Confidence curves: An omnibus technique for estimation and testing statistical hypotheses.
Journal of the American Statistical Association 56 (294), 246–249.
Carroll, K. J. (2013). Decision making from phase ii to phase iii and the probability of success: reassured by
“assurance”? Journal of Biopharmaceutical Statistics 23 (5), 1188–1200.
Casella, G. and R. L. Berger (2002). Statistical inference, Volume 2. Duxbury Pacific Grove, CA.
Chuang-Stein, C. (2006). Sample size and the probability of a successful trial. Pharmaceutical Statistics: The
Journal of Applied Statistics in the Pharmaceutical Industry 5 (4), 305–309.
Ciarleglio, M. M. and C. D. Arendt (2017). Sample size determination for a binary response in a superiority clinical
trial using a hybrid classical and bayesian procedure. Trials 18 (1), 1–21.
Crisp, A., S. Miller, D. Thompson, and N. Best (2018). Practical experiences of adopting assurance as a quantitative
framework to support decision making in drug development. Pharmaceutical Statistics 17 (4), 317–328.
Efron, B. (1986). Why isn’t everyone a bayesian? The American Statistician 40 (1), 1–5.
Efron, B. (1998). Ra fisher in the 21st century. Statistical Science, 95–114.
EFSA (2014). Guidance on expert knowledge elicitation in food and feed safety risk assessment. European Food
Safety Authority Journal 12 (6), 3734.
Fraser, D. A. (2011). Is bayes posterior just quick and dirty confidence? Statistical Science 26 (3), 299–316.
Frewer, P., P. Mitchell, C. Watkins, and J. Matcham (2016). Decision-making in early clinical drug development.
Pharmaceutical statistics 15 (3), 255–263.
Good, I. J. (1965). The estimation of probabilities: an essay on modern bayesian methods. The MIT Press, Cam-
bridge, Massachusetts.
Good, I. J. (1966). The estimation of probabilities. J. Inst. Maths Applics 2, 364–383.
Johnson, G. S. (2021). Tolerance and prediction intervals for non-normal models. Researchgate.net.
King, M. (2009). Evaluating probability of success in oncology clinical trials. In Biopharmaceutical Applied Statistics
Symposium.
Kirby, S. and C. Chuang-Stein (2017). A comparison of five approaches to decision-making for a first clinical trial
of efficacy. Pharmaceutical statistics 16 (1), 37–44.
Lalonde, R., K. Kowalski, M. Hutmacher, W. Ewy, D. Nichols, P. Milligan, B. Corrigan, P. Lockwood, S. Marshall,
L. Benincosa, et al. (2007). Model-based drug development. Clinical Pharmacology & Therapeutics 82 (1), 21–32.
Lehmann, E. L. (1993). The fisher, neyman-pearson theories of testing hypotheses: one theory or two? Journal of
the American statistical Association 88 (424), 1242–1249.
Oakley, J. and A. O’Hagan (2010). Shelf: The sheffield elicitation framework (version 2.0). school of mathematics
and statistics, university of sheffield.
14
O’Hagan, A., J. W. Stevens, and M. J. Campbell (2005). Assurance in clinical trial design. Pharmaceutical Statistics:
The Journal of Applied Statistics in the Pharmaceutical Industry 4 (3), 187–201.
Pawitan, Y. (2001). In all likelihood: statistical modelling and inference using likelihood. Oxford University Press.
Perezgonzalez, J. D. (2015). Fisher, neyman-pearson or nhst? a tutorial for teaching data testing. Frontiers in
Psychology 6, 223.
Rufibach, K., H. U. Burger, and M. Abt (2016). Bayesian predictive power: choice of prior and some recommendations
for its use as probability of success in drug development. Pharmaceutical statistics 15 (5), 438–446.
Saville, B. R., J. T. Connor, G. D. Ayers, and J. Alvarez (2014). The utility of bayesian predictive probabilities for
interim monitoring of clinical trials. Clinical Trials 11 (4), 485–493.
Schr¨odinger, E. and J. D. Trimmer (1980). The present situation in quantum mechanics: a translation of schr¨odinger’s
‘cat paradox’ paper. Proceedings of the American Philosophical Society 124 (5), 323–338.
Schweder, T. and N. L. Hjort (2016). Confidence, likelihood, probability, Volume 41. Cambridge University Press.
Shen, J., R. Y. Liu, and M.-g. Xie (2018). Prediction with confidence—a general framework for predictive inference.
Journal of Statistical Planning and Inference 195, 126–140.
Singh, K., M. Xie, W. E. Strawderman, et al. (2007). Confidence distribution (cd)–distribution estimator of a
parameter. In Complex datasets and inverse problems, pp. 132–150. Institute of Mathematical Statistics.
Spiegelhalter, D. J., K. R. Abrams, and J. P. Myles (2004). Bayesian approaches to clinical trials and health-care
evaluation, Volume 13. John Wiley & Sons.
Temple, J. R. and J. R. Robertson (2021). Conditional assurance: the answer to the questions that should be asked
within drug development. Pharmaceutical Statistics, 1–10.
Thornton, S. and M. Xie (2020). Bridging bayesian, frequentist and fiducial (bff) inferences using confidence distri-
bution. arXiv preprint arXiv:2012.04464 .
Trzaskoma, B. and A. Sashegyi (2007). Predictive probability of success and the assessment of futility in large
outcomes trials. Journal of biopharmaceutical statistics 17 (1), 45–63.
Wasserstein, R. L., N. A. Lazar, et al. (2016). The asa’s statement on p-values: context, process, and purpose. The
American Statistician 70 (2), 129–133.
Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The
annals of mathematical statistics 9 (1), 60–62.
Xie, M., R. Y. Liu, C. Damaraju, W. H. Olson, et al. (2013). Incorporating external information in analyses of
clinical trials with binary outcomes. The Annals of Applied Statistics 7 (1), 342–368.
Xie, M., K. Singh, and W. E. Strawderman (2011). Confidence distributions and a unifying framework for meta-
analysis. Journal of the American Statistical Association 106 (493), 320–333.
Xie, M.-g. and K. Singh (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A
review. International Statistical Review 81 (1), 3–39.
15
A Definitions
A.1 Definition of a Confidence Interval
From Casella and Berger (2002), the inference in a set estimation problem is the statement that ‘θC,’ where
CΘ and C=C(x) is a set determined by the value of the data X=xobserved. CΘ is usually taken to be
an interval, and C(X) is its estimator, a random variable. The coverage probability, Pθ(θC(X)), is a probability
statement referring to the random set C(X) since θis an unknown fixed quantity.
A.2 Definition of a Confidence Distribution
From Xie et al. (2013), a function Hn(·) on X ×Θ[0,1] is called a confidence distribution function for a parameter
θif, R1) For each given x∈ X,Hn(·) is a cumulative distribution function on Θ; R2) At the true parameter value
θ=θ0,Hn(θ0)Hn(x, θ0), as a function of the sample x, follows the uniform distribution U[0,1]. Hn(·) is
an asymptotic confidence distribution if the U[0,1] requirement is true only asymptotically, and the continuity
requirement on Hn(·) is dropped.
B Mathematical Considerations
B.1 Link Functions
The Wald test is incredibly versatile, especially when incorporating a link function. A link function can also
be helpful with a score or likelihood ratio test when the referenced sampling distribution is approximate. This
is often used in the analysis of generalized linear models where g{·} is a log or logit transformation. Careful
selection of the link function can vastly improve the inference on a parameter. For example, consider the setting
where X1, ..., XnN(θ, 1) and interest surrounds β=1. Using ˆ
β=1/¯xand an identity link, H(β) =
1Φ¯xn[1/¯xβ]is a reasonable approximate solution since ¯
Xn(1/¯
Xβ)asymp
N(0,1), so long as
θ̸= 0. However, a g{β}= 1link function leads to n(¯
X1
β)N(0,1), producing exact inference using
H(β)=1Φnx1
β]. As another example, consider the setting where we have two sets of normal samples
from N(θ1,1) and N(θ2,1) respectively and interest surrounds β=θ12. Using ˆ
β= ¯x1/¯x2and an identity link
leads to approximate inference based on ( ¯
X1/¯
X2β)/se asymp
N(0,1). However, a g{β}=β·¯x2link function yields
exact inference based on ( ¯
X1β·¯
X2)/se N(0,1). Regardless of the test used to construct H(θ), Equation (4)
can be seen as a g{θ}=β1β{θ}link function to produce inference on power.
B.2 Determining Effective Sample Size
If a literature review and elicitation provides an estimated sampling distribution for the response proportion on
control and the difference over control, the first two moments of these distributions can be used to determine the
effective sample size for the active arm.
ˆ
Var(ˆpactive ˆpctrl) = ˆσ2
active
nactive
+ˆσ2
ctrl
nctrl
ˆ
Var(ˆpactive ˆpctrl) = ˆpactive(1 ˆpactive)
nactive
+ˆpctrl(1 ˆpctrl)
nctrl
ˆ
Var(ˆpactive ˆpctrl)ˆpctrl(1 ˆpctrl)
nctrl
=ˆpactive(1 ˆpactive)
nactive
nactive =ˆpactive(1 ˆpactive)
ˆ
Var(ˆpactive ˆpctrl)ˆpctrl(1ˆpctrl )
nctrl
B.3 Likelihood Ratio Test for Difference in Proportions
This is a quick reference to performing the likelihood ratio test for a difference in proportions. See Casella and Berger
(2002) for complete instruction on the definition of symbols and how to construct a likelihood ratio hypothesis test.
16
Let Xctrl Bin(nctrl , pctrl), Xactive Bin(nactive , pactive), θ=pactive pctrl, and pctrl, θ Θ.
L(θ, pctrl )(pctrl )xctrl (1 pctrl )nctrlxctr l (pctrl +θ)xactive (1 pctrl θ)nactive xactive
∂ℓ(θ, pctrl)
∂pctrl
=xctrl
pctrl nctrl xctrl
1pctrl
+xactive
pctrl +θnactive xactive
1pctrl θ
∂ℓ(θ, pctrl)
∂θ =xactive
pctrl +θnactive xactive
1pctrl θ
sup
pctrl Θ
L(θ, pctrl ) = L(ˆ
θ, ˆpctrl) yields ˆpctrl =xctrl/nctrl and ˆ
θ=xactive/nactiv e xctrl/nctrl.
Under H0:θ=θ0,sup
pctrl Θ0
L(θ, pctrl ) = L(θ0,ˆpctrl
θ0) where
∂ℓ(θ0, pctrl)
∂pctrl
set
= 0
=ˆpctrl
θ0,1=xctrl +xactive ˆpctrl
ˆpctr l+θ0(1 ˆpctrl) + xactive (1ˆpctr l)
1ˆpctr lθ0ˆpctrl
nctrl +nactive (1ˆpctr l)
1ˆpctr lθ0
ˆpctrl
θ0,k+1 =
xctrl +xactive ˆpctrl
θ0,k
ˆpctr l
θ0,k+θ0(1 ˆpctrl
θ0,k) + xactive (1ˆpctr l
θ0,k)
1ˆpctr l
θ0,kθ0ˆpctrl
θ0,k
nctrl +nactive (1ˆpctr l
θ0,k)
1ˆpctr l
θ0,kθ0
, k = 1,2, ..., K
for Ksufficiently large to reach convergence. Estimating nuisance parameters under the restricted null space can
also be accomplished in Proc Genmod by using the NOINT, OFFSET=, NOSCALE, and SCALE= options in the
MODEL statement. In Proc Glimmix scale parameters are restricted using the HOLD= option in the PARMS
statement. Under mild regularity conditions the likelihood ratio test statistic,
2logλ(X, θ0) = 2log L(θ0,ˆpctrl
θ0)
L(ˆ
θ, ˆpctrl)!,
follows an asymptotic chi-squared distribution with 1 degree of freedom, and significance at level αis achieved if
2logλ(x, θ0)> χ2
1, the 1 αpercentile. The corresponding two-sided, equal-tailed 100(1α)% confidence interval
is given by {θ:2logλ(x, θ)χ2
1}.The p-value function, confidence density, and confidence curve functionals for
the test above are
H(θ0,x) =
1Fχ2
12logλ(x, θ0)/2 if θ0ˆ
θ(x)
1 + Fχ2
12logλ(x, θ0)/2 if θ0>ˆ
θ(x)
h(θ0,x) = dH(θ0,x)
0
C(θ0,x) =
H(θ0,x) if θ0ˆ
θ(x)
1H(θ0,x) if θ0ˆ
θ(x).
17
The asymptotic result above in terms of the full likelihood is equivalently viewed as the profile likelihood ratio,
2log L(θ0)
L(ˆ
θ)!asymp
χ2
1,
where L(θ) = sup
pctrlΘ
L(θ, pctrl ) = L(θ, ˆpctrl
θ) as a function of θand the observed data is the profile likelihood.
This replaces nuisance parameters with estimates calculated under the restricted parameter space, creating a one-
dimensional likelihood. L(ˆ
θ) = L(ˆ
θ, ˆpctrl) is the profile likelihood evaluated at ˆ
θand L(θ0) = L(θ0,ˆpctrl
θ0) is the
profile likelihood evaluated at θ0. Since the profile likelihood ratio is a monotonic transformation of ˆ
θone can
instead derive or approximate the sampling distribution of g{ˆ
θ(X)}for some link function g{·} and numerically
invert its cumulative distribution function while treating pctrl = ˆpctrl
θ0as known to ultimately construct H(θ0,x).
This latter approach would also correspond to the score test and is useful in settings where the regularity conditions
and asymptotics needed for referencing a chi-square distribution for the likelihood ratio or score test statistic are
not met. For further examples see Appendix F.
B.4 Approximating Power using a P-value Function
The proof that a p-value function can be used to approximate a power curve involves the continuous mapping
theorem, convergence in probability, and convergence in distribution and is left to the reader as an exercise. What
follows is the intuition behind this approximation. The upper p-value function has the appearance of a power curve
for an upper-tailed test, and both depict sampling probability of the test statistic as a function of the unknown fixed
true parameter value. The p-value pertains to a specific experimental result and a single parameter unconditional
on nuisance parameters, while power pertains to any statistically significant experimental result relative to a single
research null hypothesis as a function of all unknown fixed parameters. The p-value function is typically written as
H(θ, x) to denote it as a function of both the parameter and the data. This dependence on the data will enter through
parameter estimates that are functions of the sufficient statistics, and so H(θ, x) can be expressed as H(θ, ˆ
θ, ˆpctrl),
where in our example ˆpctrl is the point estimate for the population-level control therapy response rate pctrl, and
ˆ
θis the point estimate for the population-level difference in proportions θ. With a simple change of variables the
p-values can be used to approximate power. That is, if we consider an ex-ante experimental result where ˆpctrl is
exactly equal to pctrl, and ˆ
θequals the minimum detectable effect ˆ
θmde for a research hypothesis of interest θ0, then
H(θ, ˆ
θ=ˆ
θmde,ˆpctrl =pctrl ) is a function of both θand pctrl and is approximately equal to the power of the test,
β(θ, pctrl ). When evaluated at θ=θ0,H(θ=θ0,ˆ
θ=ˆ
θmde,ˆpctrl =pctrl ) equals α, the desired type I error rate of
the test. When evaluated at any other value of θ,H(θ, ˆ
θ=ˆ
θmde,ˆpctrl =pctrl )β(θ, pctrl ). This same approach
can be used to approximate the power of a lower-tailed test using a lower p-value function, denoted here as H(θ).
Since the approximate expression for power is a function of θand pctrl, replacing pctrl with a point estimate from
an external study produces an estimated power curve as a function of θ. By replacing θwith a point estimate
from an external study as well, the delta method can be employed to construct p-values and confidence intervals for
hypotheses around power.
B.5 Delta Method for Inference on Power
Taylor Series
gβˆ
θ(X),ˆpctrl(X) g{β(θ, pctrl)}+g{β(θ, pctr l)}
∂θ ·ˆ
θ(X)θ
+∂g{β(θ, pctrl )}
∂pctrl ·ˆpctr l(X)pctrl
18
Asymptotic Variance
Varhgβˆ
θ(X),ˆpctrl(X)i"g{β(θ, pctrl)}
∂θ #2
·Var[ˆ
θ(X)]
+"∂g{β(θ, pctrl )}
∂pctrl #2
·Var[ˆpctrl(X)]
+2"∂g{β(θ, pctrl )}
∂θ #·"g{β(θ, pctrl)}
∂pctrl #·Cov[ˆ
θ(X),ˆpctrl(X)]
Wald Confidence Interval for Power
g1hg{β(ˆ
θ, ˆpctrl)} ± z1α/2·ˆsei
Wald p-value testing H0:ββ0
H(β0,x)=1Φ g{β(ˆ
θ, ˆpctrl)} − g{β0}
ˆse !
β(θ, pctrl ) is the unknown true power of a future study investigating a difference in proportions. ˆpctrl(X) is an
estimator from an external study for the population-level response rate for the control therapy, pctrl .ˆ
θ(X) is an
estimator from an external study for the population-level difference in proportions between the experimental and
control therapies, θ.βˆ
θ(X),ˆpctrl(X)is the corresponding estimator for power, and g{·} is a variance-stabilizing
transformation that yields a normally distributed sampling distribution. The Taylor series approximation is used
to construct the asymptotic variance of the estimator for power. Once the external data are observed the par-
tial derivatives in the asymptotic variance can be solved numerically using parameter estimates, and Var[ ˆ
θ(X)],
Var[ˆpctrl(X)], and Cov[ˆ
θ(X),ˆpctrl(X)] can be replaced with model-based or sandwich estimates. This produces an
asymptotic variance estimate. The estimated standard error ˆse is the square root of the asymptotic variance estimate.
19
B.6 Transformed Power Estimator
Normal
Percent
-5 0 5
 Power Point Estimate
0
2
4
6
0
2
4
6
0
2
4
6
Maximum Likelihood Estimator
=0.91
=0.50
=0.025
Figure 8: Sampling distribution of the Φ1transformed maximum likelihood estimator of power over 10,000 simulations.
B.7 Extrapolation Between Endpoints or Control Groups Across Phases
In the examples thus far the phase 2 study used the same endpoint and treatment groups planned for phase 3.
Depending on the therapeutic area and endpoint this may not be feasible. In such cases the phase 3 treatment
effect, and hence phase 3 power, can be transformed into a function of the phase 2 treatment effect. Of course this
modeling brings an additional layer of uncertainty which can be expressed as a confidence band around the power
curve. Figure 9 shows similar power curves and a confidence density as before, now with a 95% confidence band
around the phase 3 power curve had it been extrapolated from a different phase 3 endpoint or control group. This
extrapolation uncertainty translates into the overall power curve, and easily carries over into Figure 10. This is a
great visual to discern uncertainty around the phase 2 treatment effect and that due to the extrapolation model.
20
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0
4
8
12
16
20
24
Confidence Density
Confidence Density
Phase 2 and 3 Power
Phase 3 Power
Phase 2 Power
^{unicode alpha}=0.2 for phase 2 LR test against difference=-0.05 with N=? per arm.
^{unicode alpha}=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 9: Solid lines depict power curves for a likelihood ratio test of the difference in proportions in phase 2, phase 3, and
overall. Confidence bands depict extrapolation modeling uncertainty. Dashed line depicts the confidence density for θbased
on historical data and expert opinion.
0.0 0.2 0.4 0.6 0.8 1.0
Power
0
1
2
3
4
5
6
Confidence Density
Phase 2 and 3 Power mle
Phase 3 Power mle
Phase 2 Power mle
Phase 2 and 3 CD
Phase 3 CD
Phase 2 CD
Phase 2 Power: mle=0.473
Phase 3 Power: mle=0.945
Phase 2 and 3 Power: mle=0.447
Figure 10: Solid lines depict resulting confidence densities for power in phase 2, phase 3, and overall. Dotted lines depict
maximum likelihood estimates of power. Confidence bands depict the extrapolation modeling uncertainty.
For example, suppose the phase 3 study plans to investigate a difference in proportions using a different control
therapy than is planned for phase 2. Suppose further that external studies have been conducted investigating the
phase 2 and phase 3 control therapies. Using a network meta-analysis one can estimate and infer the phase 3 power
curve in terms of the phase 2 treatment effect. The population-level treatment effect investigated in phase 2 can
be denoted as θ2=pactive pctrl2, the population-level difference in proportions between the control therapies can
be denoted as ∆ = pctrl3pctrl2, and the population-level treatment effect investigated in phase 3 can be denoted
as θ3=θ2∆ = pactive pctrl3. It is then a simple change of variables to extrapolate the phase 3 power curve
β3(θ3) in terms of the phase 2 treatment effect, β3(θ2∆). The function β3(·) is defined by its subscript and not
its argument. Replacing ∆ with a point estimate ˆ
∆, as well as with lower and upper confidence limits, produces
21
the confidence band around the extrapolated estimated phase 3 power curve, β3(θ2ˆ
∆). Similarly, if a p-value
function is available for the phase 2 treatment effect, H(θ2), replacing ∆ with a point estimate as well as with lower
and upper confidence limits produces the confidence band around the p-value function for phase 3 power using the
method corresponding to Equation (4), Hβ1
3{β3(θ2ˆ
∆)}+ˆ
. For a given hypothesis for θ2, the value H(θ2)
is assigned to β3(θ2ˆ
∆). In practice this will be solved numerically in a data step. To construct a proper p-value
function for phase 3 power without confidence bands that accounts for the uncertainty around the extrapolation
modeling and any other estimated population-level parameters, one could utilize a transformation of the power point
estimate β3(ˆ
θ2ˆ
∆) along with the delta method and invert a Wald test. To extrapolate between endpoints across
phases using external or elicited data that is assumed exchangeable, one could build a regression model of the end-
point planned for phase 3 as a function of the endpoint and treatments planned for phase 2 (or their exchangeable
surrogates). The model contrast statements would then be used to perform a change of variables in the phase 3
power curve similar to that described above. Even without extrapolation a similar confidence band visualization can
be used to incorporate a confidence interval for a nuisance parameter such as the population-level control therapy
response rate when constructing the estimated power curves.
C Adjustment for Multiple Comparisons
Clinical development plans almost always explore multiple endpoints and involve interim analyses, and a natural
consideration when discussing frequentist inference is the adjustment for multiple comparisons. Even a phase 3
confirmatory setting often involves multiple studies for the explicit purpose of reproducing/replicating results, and
regulatory approval can always be changed. This is to say that if one is capable of updating previously made inference
about θ, no adjustment for multiplicity is required. This perhaps reflects Fisher’s position on meta-analysis and
inductive reasoning (Lehmann 1993; Efron 1998; Perezgonzalez 2015), and is in some ways congruent with objective
Bayesianism, though we can not presume to know what Fisher would think if he was alive today. This viewpoint
simply emphasizes the per-comparison error rate knowing no conclusion about θis ever final. Fisher did of course
make use of the F-test for what is known in today’s terms as controlling a family-wise error rate in the weak sense,
and used the entire context of an experiment to determine statistical significance. P-value functions can certainly be
used to display decision rules and study results while adjusting for multiple comparisons if one so chooses to control
a particular family-wise error rate.
22
D Additional Figures
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0
4
8
12
16
20
24
Confidence Density
Confidence Density
Phase 3 Power
Phase 2 Power
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 11: Phase 2 power curve testing H0:θ≤ −0.05 with N=90 per arm at α=0.2. Phase 3 power curve testing H0:
θ≤ −0.12 with N=365 per arm at α=0.025. Confidence density for θbased on historical data and expert opinion.
0.0 0.2 0.4 0.6 0.8 1.0
Power
0
1
2
3
4
5
6
Confidence Density
Phase 2 and 3 Power mle
Phase 3 Power mle
Phase 2 Power mle
Phase 2 and 3 CD
Phase 3 CD
Phase 2 CD
Phase 2 Power: mle=0.473, pos=0.487
Phase 3 Power: mle=0.945, pos=0.775
Phase 2 and 3 Power: mle=0.447, pos=0.451
Figure 12: Solid lines depict resulting confidence distributions for power, h(β) = dH(θ)/dβ(θ), in phase 2, phase 3, and
overall. Dotted lines depict maximum likelihood estimates of power.
23
N=45 N=65 N=85 N=115 N=155 N=205 N=275 N=365 N=465 N=600 N=765
Sample Size per Arm
0.0
0.2
0.4
0.6
0.8
1.0
Phase 3 Power
Probability of Success Estimate
Maximum Likelihood Estimate
foot
foot
Figure 13: Estimated phase 3 power testing H0:θ≤ −0.12 at α=0.025 at various sample sizes with 80% confidence limits
based on the elicitation (wide).
E Additional Figures
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0
4
8
12
16
20
24
Confidence Density
(v) Phase 3 Power
(iv) Convolution
(iii) Multiplication
(ii) Minimum Phase 2 Success
(i) Elicited Confidence Density
=0.025 for phase 2 LR test against difference=-0.05 with N=225 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 14: (i) Elicited confidence density (wide). (ii) Confidence density for θfrom differentiating the approximate phase 2
power curve testing H0:θ≤ −0.05 with N=225 per arm at α=0.025. (iii) Multiplication of elicited H(θ) and phase 2 power
curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, differentiated. (v) Phase 3
power curve testing H0:θ≤ −0.12 with N=365 per arm at α=0.025.
24
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0
4
8
12
16
20
24
Confidence Density
(v) Phase 3 Power
(iv) Convolution
(iii) Multiplication
(ii) Minimum Phase 2 Success
(i) Elicited Confidence Density
=0.025 for phase 2 LR test against difference=-0.05 with N=225 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 15: (i) Elicited confidence density (narrow). (ii) Confidence density for θfrom differentiating the approximate phase
2 power curve testing H0:θ≤ −0.05 with N=225 per arm at α=0.025. (iii) Multiplication of elicited H(θ) and phase 2 power
curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, differentiated. (v) Phase 3
power curve testing H0:θ≤ −0.12 with N=365 per arm at α=0.025.
-0.2 -0.1 0.0 0.1 0.2
True Difference in Proportions
0.0
0.2
0.4
0.6
0.8
1.0
Power
0
4
8
12
16
20
24
Confidence Density
(v) Phase 3 Power
(iv) Convolution
(iii) Multiplication
(ii) Minimum Phase 2 Success
(i) Elicited Confidence Density
=0.2 for phase 2 LR test against difference=-0.05 with N=90 per arm.
=0.025 for phase 3 LR test against difference=-0.12 with N=365 per arm.
Figure 16: (i) Elicited confidence density (wide). (ii) Confidence density for θfrom differentiating the approximate phase
2 power curve testing H0:θ≤ −0.05 with N=90 per arm at α=0.2. (iii) Multiplication of elicited H(θ) and phase 2 power
curve, differentiated. (iv) Convolution of elicited H(θ) and approximate phase 2 power curve, differentiated. (v) Phase 3
power curve testing H0:θ≤ −0.12 with N=365 per arm at α=0.025.
25
F Comparing Distribution Estimates
F.1 Discrete Parameter Space
When the parameter space is discrete the upper and lower p-value functions H(·) and H(·) may not form dis-
tribution functions on the parameter space. Nevertheless, these p-value functions are indispensable for performing
inference. For example, consider the 3x3 table below depicting the operating characteristics of a cancer screening
test with 0.85 specificity and 0.80 sensitivity. The parameter space is shown across the top of the table and the
support of the sampling distribution (test result) is displayed along the left side of the table so that this table is
read vertically. If a subject has No Cancer the screening test will produce a Negative result, an At Risk result, and
a Positive result 85%, 10%, and 5% of the time respectively. Likewise, if the subject indeed has Cancer the test
will produce a Negative result, an At Risk result, and a Positive result 5%, 15%, and 80% of the time respectively.
These long-run probabilities can be verified within a margin of error through repeated testing. The power of the
test shows the ex-ante sampling probability of observing an At Risk or Positive result testing the hypothesis H0: No
Cancer as a function of the unknown true cancer status for the subject at hand. This long-run probability forms
the level of confidence in the next observed test result for the subject.
The p-value function testing H0: No Cancer, H0: Pre-Cancer, and H0: Cancer as a function of the hypothe-
sis and the observed data is read horizontally and displays the lower-tailed p-value for a Negative result and the
upper-tailed p-value for a Positive result. For an At Risk result the upper-tailed p-value is displayed testing H0: No
Cancer and H0: Pre-Cancer, and the lower-tailed p-value is displayed testing H0: Pre-Cancer and H0: Cancer. If
an At Risk result is produced for a given subject, the upper-tailed p-value testing the hypothesis that the subject
at hand has No Cancer is the probability of an At Risk or more extreme (Positive) test result given the subject has
No Cancer, 0.10 + 0.05 = 0.15. Likewise, for the same At Risk result the lower-tailed p-value testing the hypothesis
that the subject at hand has Cancer is the probability of an At Risk or more extreme (Negative) test result given
the subject has Cancer, 0.15 + 0.05 = 0.20. The confidence level is a function of the hypothesis and the observed
data. This table is read horizontally and shows that if the test returns an At Risk result we can “rule out” H0:
No Cancer at the 15% level and H0: Cancer at the 20% level and are therefore 65% confident in the alternative,
which is Pre-Cancer. The 65% confidence level is nothing more than a restatement of the p-values testing H0: No
Cancer and H0: Cancer, 100(1 0.15 0.20)%. Similarly, if the test returns a Positive result we can “rule out” H0:
Pre-Cancer (and by extension H0: No Cancer) at the 10% level, and are therefore 90% confident in the alternative,
which is Cancer. Either the subject has Pre-Cancer (or No Cancer) and we have witnessed a 10% (or smaller) event,
or the subject indeed as Cancer.
If we have verifiable knowledge that a given subject was randomly selected from an irreducible population that
has No Cancer, Pre-Cancer, and Cancer in a 4:2:1 ratio, then the posterior depicts the long-run probability of cancer
status among randomly selected subjects, given a particular test result. In this context these posterior probabilities
are often referred to as negative predictive value, false omission rate, false discovery rate, and positive predictive
value. This long-run probability can be used to make inference on the cancer status of the subject at hand by
imagining the subject was instead randomly selected from the posterior distribution. This is a direct contradiction
to the earlier claim that the subject at hand was randomly selected from the prior distribution. The posterior
sampling frame is correct only if the prior sampling frame is correct, yet there can only be a single sampling frame
from which we obtained the randomly selected subject at hand. If we really do have verifiable knowledge about how
a given subject was randomly selected, this information can be presented alongside the p-value. In practice, though,
we generally do not have such verifiable knowledge. The Bayesian prior and posterior probabilities might instead
be interpreted as measuring the unfalsifiable subjective belief of the experimenter regarding the cancer status of the
subject at hand, rather than long-run proportions of cancer status among randomly selected subjects.
The likelihood is identified by reading the table of operating characteristics horizontally. The normalized like-
lihood can be seen as a posterior based on a 1:1:1 prior. It is more objectively viewed as an approximate p-value
function. The normalization smooths the operating characteristics of the screening test so the probabilities sum
to 1 over the parameter space. The plug-in sampling distribution transposes the operating characteristics of the
screening test across the parameter space. All five methods below use the sampling behavior of the screening test
to form a distribution estimate of cancer status. In this setting the p-values do not form a distribution function on
26
the parameter space. If an additional follow-up test is to be conducted on the subject at hand, these distribution
estimates can be used to perform inference on the power of the future test. If one is not satisfied with this inference
on power, a more sensitive and specific test can be sought. Regardless of paradigm, multiple tests can be performed
and the results convolved to improve the inference on the true cancer status for a given subject.
Table 2: Cancer Screening Test
True Cancer Status
Test Result No Cancer Pre-Cancer Cancer
Negative 0.85 0.40 0.05
Operating Characteristics At Risk 0.10 0.50 0.15
Positive 0.05 0.10 0.80
Power 0.15 0.60 0.95
Negative 0.85 0.40 0.05
One-sided p-value At Risk 0.15 0.60|0.90 0.20
(Confidence Curve) Positive 0.05 0.10 0.80
Negative 0.60 0.40 0.05
Confidence Level At Risk 0.15 0.65 0.20
Positive 0.05 0.10 0.90
Negative 0.80 0.19 0.01
Posterior (4:2:1 Prior) At Risk 0.26 0.65 0.10
Positive 0.17 0.17 0.67
Negative 0.65 0.31 0.04
Normalized Likelihood At Risk 0.13 0.67 0.20
Positive 0.05 0.11 0.84
Negative 0.85 0.10 0.05
Plug-in Sampling Distribution At Risk 0.40 0.50 0.10
Positive 0.05 0.15 0.80
27
F.2 Distribution Estimates Giving Different Results
0.7 0.8 0.9 1.0
0.0
0.2
0.4
0.6
0.8
Confidence Curve
0
5
10
15
Posterior Density
Posterior Density (a=0.1, b=0.1)
Posterior Density (a=1, b=1)
Confidence Curve
Figure 17: Exact frequentist and Bayesian inference on a binomial proportion θbased on a sample of size n= 20.
Let X1, ..., XnBernoulli(θ). The confidence curve and 95% confidence interval in Figure 17 show exact inference
on θfrom inverting the cumulative distribution function for PXBin(n, θ) based on a sample of size n= 20 with
Px= 19 events. In this setting the conjugate Bayesian prior is a Beta(a, b) distribution. The green dotted density
shows a Bayesian posterior and 95% credible interval based on a non-informative Beta(1,1) prior. The red dashed
density shows a Bayesian posterior and 95% credible interval based on a non-informative Beta(0.1,0.1) prior. The
Beta(0.1,0.1) prior has a larger variance compared to the uniform prior yet it produces shorter posterior credible
intervals. While the uniform prior produces the widest possible objective posterior intervals, they are noticeably
shorter than the corresponding exact confidence intervals. Additionally, the posterior mean as an estimator for θ
based on a uniform prior is biased towards 0.5. With increasing sample size all three distribution estimators will
produce similar results.
28
F.3 From Confidence Intervals to Distribution Estimates
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
0
4
8
12
Percent
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
x
0.0
0.2
0.4
0.6
0.8
Density
(d) Confidence Density (Exact Likelihood Ratio Test)
(b) Posterior
(c) Confidence Density (Likelihood Ratio Test)
(a) Estimated Sampling Distribution
Figure 18: (a) Plug-in estimated sampling distribution for the MLE of the mean supported by ¯xfor exponentially distributed
data with n= 5, replacing the unknown fixed true θwith ˆ
θmle=1.5. (b) Bayesian posterior from vague conjugate prior
supported by θ. (c) Confidence distribution (density) based on the likelihood ratio test supported by θ. (d) Confidence
distribution (density) based on the exact likelihood ratio test supported by θ.
Consider the setting where X1, ..., XnExp(θ) with likelihood function L(θ) = θnePxi . Then supL(θ)
yields ˆ
θmle = ¯xas the maximum likelihood estimate for θ, the likelihood ratio test statistic is 2logλ(x, θ0)
2logL(θ0)/L(ˆ
θmle), and the corresponding upper p-value function (and confidence distribution function) is defined
as in Equation (1). The histogram in Figure 18, supported by ¯x, depicts the plug-in estimated sampling distribution
for the maximum likelihood estimator (MLE) of the mean for exponentially distributed data with n= 5 based on
ˆ
θmle = 1.5. Replacing the unknown fixed true θwith ˆ
θmle = 1.5, this displays the estimated sampling behavior of
the MLE for all other replicated experiments, a Gamma(n, ˆ
θmle/n) distribution. The Bayesian posterior depicted
by the thin blue curve resulting from a vague conjugate prior or an improper 1prior is a transformation of the
likelihood and is supported on the parameter space, an Inverse Gamma(5,7.5) distribution. The bold black curve
is also data dependent and supported on the parameter space, but represents confidence intervals of all levels from
inverting the likelihood ratio test. It is a transformation of the sampling behavior of the test statistic under the null
onto the parameter space, a “distribution” of p-values. Each value in the parameter space takes its turn playing the
role of null hypothesis and hypothesis testing (akin to proof by contradiction) is used to infer the unknown fixed
true θ. The area under this curve to the right of the reference line is the p-value or significance level when testing
the hypothesis H0:θ2.35. This probability forms the level of confidence that θis greater than or equal to 2.35.
Similarly, the area to the left of the reference line is the p-value when testing the hypothesis H0:θ2.35. One can
also identify the two-sided equal-tailed 100(1 α)% confidence interval by finding the complement of those values
of θin each tail with α/2 significance. The dotted curve shows the exact likelihood ratio confidence density formed
by noting that ¯
XGamma(n, θ/n) and inverting its cumulative distribution function. This confidence density
coincides perfectly with the posterior distribution. A confidence density similar to that based on the likelihood ratio
test can be produced by inverting a Wald test with a log link.
29
Foot
Foot
-2log(x,=0.75) 
Confidence Distribution
Null Sampling Distribution
0 1 2 3 4
0 1 2 3 4
0.00
0.25
0.50
0.75
1.00
1.25
Density
Figure 19: Approximate χ2
1null sampling distribution of the likelihood ratio test statistic for testing H0:θ0.75.
Foot
Foot
x
Confidence Distribution
Null Sampling Distribution
0 1 2 3 4
0 1 2 3 4
0.00
0.25
0.50
0.75
1.00
1.25
Density
Figure 20: Exact null sampling distribution of ˆ
θMLE =¯
Xfor testing H0:θ0.75.
H(θ) captures the upper-tailed p-value for every value of θin the parameter space, and dH(θ)/dθ is the resulting
confidence density. The confidence density in Figure 18 was constructed using the χ2
1approximation for the sampling
distribution of the likelihood ratio test statistic. In Figure 19 the 2-sided p-value testing H0:θ= 0.75 is shaded
in the left panel. Half of this is the one-sided p-value testing H0:θ0.75. This is shaded above θ0.75 in the
right panel. A single χ2
1reference distribution is used, and the value of the test statistic depends on the hypothesis
being tested. This approximation is particularly useful when considering differences in parameters or other more
complicated functions. When performing inference on an exponential rate parameter one can note the likelihood
ratio test statistic is a monotonic function of ˆ
θMLE =¯
X, which follows a Gamma(n,θ/n) distribution. Referencing
this distribution allows the calculation of the exact likelihood ratio test p-value. In Figure 20 the left panel shows
the null sampling distribution when testing H0:θ0.75. The one-sided p-value in the left panel is shaded above
θ0.75 in the right panel. The location of the null sampling distribution depends on the hypothesis being tested.
30
F.4 Distribution Estimates for Meta-Analysis
0.70 0.75 0.80 0.85 0.90 0.95 1.00
0
2
4
6
8
10
12
Density
(f) Combined Confidence Distribution
(e) Combined Posterior Distribution
(d) Current Data Confidence Distribution
(c) Current Data Posterior
(b) Historical Confidence Distribution
(a) Historical Prior Distribution
Figure 21: (a) Informative Bayesian prior distribution based on historical likelihood and vague conjugate prior for binomial
proportion, ˆ
θHist
Bayes = 0.90, n = 50. (b) Confidence distribution (likelihood ratio test) based on historical data for binomial
proportion, ˆ
θHist
mle = 0.90, n = 50. (c) Bayesian posterior based on current likelihood and vague conjugate prior, ˆ
θCurr ent
Bayes =
0.87, n = 30. (d) Confidence distribution (likelihood ratio test) based on current data, ˆ
θCurr ent
mle = 0.87, n = 30. (e) Posterior
distribution based on informative historical prior and current data likelihood. (f ) Convolution of historical and current
confidence distributions.
Figure 2 depicts a meta-analysis using confidence distributions for a binomial proportion θ. Density (a) represents
an informative Bayesian prior distribution based on a historical likelihood and a vague conjugate prior producing
an estimate of 0.90 from a sample size of n= 50. This same information is depicted in (b) as a confidence density
resulting from a likelihood ratio test. A similar confidence density can be produced by inverting a Wald test with
a logit link. The Bayesian posterior based on the current data binomial likelihood and a vague conjugate prior
is shown in (c) with an estimate of 0.87 resulting from n= 30. This same information can be represented as a
likelihood ratio confidence density, (d). Using Bayes’ theorem, the prior (a) and the likelihood from (c) combine to
form (e). Using the convolution formula in Equation (3), (b) and (d) combine to form (f).
F.5 Bayesian versus Frequentist Interpretations of Probability
In any quantitative field it is not enough to simply apply a set of mathematical operations. One must also provide
an interpretation. The field of statistics concerns itself with a special branch of mathematics regarding probabil-
ity. When interpreting probability there are primarily two competing paradigms: frequentist and Bayesian. These
paradigms differ on what it means for something to be considered random and what probability itself measures.
Both frequentists and Bayesians would agree that once a test statistic is observed it is fixed, there is nothing random
about it. Additionally, frequentists and most Bayesians would agree that the θunder investigation is an unknown
fixed quantity and it is simply treated as random in the Bayesian paradigm as a matter of practice. The question
then becomes, “How do we interpret probability statements about a fixed quantity?” Without delving into the
mathematical details of how a posterior or a p-value is calculated, we explore various interpretations below and what
makes them untenable.
One interpretation of a Bayesian prior is that “random” is synonymous with “unknown” and probability measures
the experimenter’s belief (“knowledge,” “judgment,” “opinion,” etc.) so that the posterior measures belief about the
unknown fixed true θgiven the observed data. This interpretation is untenable because belief is unfalsifiable it is
31
not a verifiable statement about the actual parameter, the hypothesis, nor the experiment. A degree of belief not tied
to a long-run sampling proportion cannot be verified within a margin of error through repeated sampling. Another
interpretation is that “random” is short for “random sampling” and probability measures the emergent pattern of
many samples so that a Bayesian prior is merely a modeling assumption regarding θ, i.e. the unknown fixed true
θwas randomly selected from a known collection or prevalence of θ’s (prior distribution) and the observed data is
used to subset this collection, forming the posterior distribution (Good 1965, 1966). The unknown fixed true θis
now imagined to have instead been randomly selected from the posterior. This interpretation is untenable because
of the contradiction caused by claiming two sampling frames. The second sampling frame is correct only if the first
sampling frame is correct, yet there can only be a single sampling frame from which we obtained the unknown fixed
true θunder investigation. A third interpretation of a Bayesian prior is that “random” is synonymous with “unre-
alized” or “undetermined” and probability measures a simultaneity of existance so that θis not fixed and all values
of θare true simultaneously; the truth exists in a superposition depending on the data observed according to the
posterior distribution (Schr¨odinger and Trimmer 1980; Ballentine 1970). This interpretation is untenable because
it reverses cause and effect the population-level parameter depends on the data observed, but the observed data
depended on the parameter. Ascribing any of these interpretations to the posterior allows one to make philosophical
probability statements about hypotheses given the data. While the p-value is typically not interpreted in the same
manner, it does show us the plausibility of a hypothesis given the data the ex-post sampling probability of the
observed result or something more extreme if the hypothesis for the unknown fixed θis true. When interpreting a
small p-value, either the null hypothesis is true and we have witnessed a rare event or the null hypothesis is false.
These statements are verifiable within a margin of error through repeated sampling.
One might notice the similarity between a confidence distribution (or more generally a p-value function) and a
posterior distribution and wonder under what circumstances is each one preferable. At its essence this is a matter of
scientific objectivity (Efron 1986). To the Bayesian, probability is axiomatic and measures the experimenter. To the
frequentist, probability measures the experiment and must be verifiable. The Bayesian interpretation of probability
as a measure of belief is unfalsifiable. Only if there exists a real-life mechanism by which we can sample values of θ
can a probability distribution for θbe verified. In such settings probability statements about θwould have a purely
frequentist interpretation. This may be a reason why frequentist inference is ubiquitous in the scientific literature. If
the prior distribution is chosen in such a way that the posterior is dominated by the likelihood or is proportional to the
likelihood, Bayesian belief is more objectively viewed as confidence based on frequency probability of the experiment.
In short, for those who subscribe to the frequentist interpretation of probability the p-value function summarizes
all the probability statements about the experiment one can make. It is a matter of correct interpretation given the
definition of probability and what constitutes a random variable. The posterior remains an incredibly useful tool
and can be interpreted as an approximate p-value function.
G SAS Code
% le t d di f f = 0. 0 01 ;
da t a b i no m ia l ;
* dif f is th e t he t a axi s , th e tru e d i f fer e n c e in p r o p o rtio n s ;
do d if f = -0 .2 1 to 0 .2 47 b y & d di ff . ;
*Elicitation;
in t _c tr l =0 .4 3;
di f f_ ha t = - 0. 02 ;
n_ctrl=1200;
n_ a ct iv e =3 50 ;
y_ c tr l = i nt _ ct r l * n_ c tr l ;
y_ a ct iv e =( i n t_ ct rl + d if f_ h at ) * n_ ac ti v e;
* W al d CD ;
32
/*
p_ a ct i ve = y _ ac t iv e / n _a ct i ve ;
se = sq rt ( p _a ct i ve * (1 - p_ a ct iv e )/ n _a c ti ve + i nt _c tr l *( 1 - in t_ c tr l )/ n _c tr l );
H =1 - c d f ( ’ no r ma l ’ ,( d i ff _ ha t - d i ff ) /( s e ) , 0 , 1) ;
*/
* L ik e li h oo d R a ti o Te s t ;
i nt _c t rl _n u ll = ( y_ ct rl + ( y_ ac t iv e /( i nt _c t rl + di ff ) )* i nt _c tr l - (( y _a c ti ve / ( in t_ c tr l +d if f ))
* in t _c tr l )* i nt _ ct rl + ( y_ ac t iv e *( 1- i n t_ ct rl ) /( 1 - in t_ ct rl - d if f )) * in t _c tr l )
/( n _c t rl + ( n_ ac ti v e *(1 - i nt _ ct rl ) /( 1 - in t_ ct rl - d if f )) );
do i =1 t o 10 0 ;
i nt _c t rl _n u ll = ( y_ ct rl + ( y_ ac t iv e /( i nt _ ct rl _ nu ll + d if f )) * in t_ ct r l_ nu ll -( ( y_ a ct iv e
/( i n t_ ct r l_ nu l l+ d if f )) * in t _c tr l _n ul l ) * in t_ c tr l_ n ul l +( y _a ct i ve * (1 - i nt _c t rl _n u ll )
/( 1 - in t_ ct r l_ nu ll - d if f )) * i nt _c tr l _n ul l ) /( n _c tr l +( n _a c ti ve * (1 - i nt _c t rl _n u ll )
/( 1 - i nt _ ct r l_ n ul l - d if f ) ) );
en d ;
la m b da = (( i n t_ c t rl _ n u ll / i nt _ c tr l ) ** y _ c tr l ) * (( ( 1 - i n t_ c t rl _ n u ll ) / ( 1 - i nt _ c tr l ) )
** ( n _c tr l - y _c t rl ) ) *( ( ( in t _c t rl _ nu l l + di ff ) /( i n t_ c tr l + d if f_ h at ) ) ** y _a c ti v e )
*( (( 1 - i nt _ ct r l_ nu l l - d if f ) / ( 1- i n t_ ct rl - d i ff _ ha t ) )* *( n _a ct i ve - y _a c ti v e )) ;
l og l am b d a = lo g ( l am b da ) ;
m in u s2 l og l a mb d a = -2 * l og l am b da ;
if d i ff gt d i f f_h a t t h en do ;
H = (1 + c d f ( ’ c hi s q u ar e ’ , - 2* l o gl a m bd a ,1 ) )/ 2 ;
en d ;
if d i ff le d i f f_h a t t h en do ;
H =( 1 - c d f ( ’ ch i s q ua r e ’ , - 2* l og l am b da , 1) ) / 2;
en d ;
dH d d if f = ( H- l a g (H ) )/ ( d if f - l ag ( d i ff ) ) ;
C =H *( d if f lt d if f_ h at ) + (1 - H )* ( di ff g t d if f _h at ) ;
* Ph a se 2 ;
n _a c ti v e _p h as e 2 = 90 ; c al l s ym p ut ( ’ n _ ac t iv e _ ph a s e2 ’ , t ri m ( l ef t ( n _a c t iv e _p h a se 2 ) ) );
n _c t rl _ p ha s e2 = 90 ; c al l s ym pu t ( ’ n _c t r l_ p ha s e 2 ’ , tr im ( l e ft ( n _ ct r l _p h as e 2 ) )) ;
* nu l l hy p o th e si s ;
l ow e r _ ma r g in 2 = - 0 .0 5 ; ca l l sy m p ut ( ’ l o we r _ m ar g i n _ ph a s e 2 ’ , s tr i p ( l o we r _ m ar g i n 2 ) );
a lp h a _p h a s e2 = 0 .2 0 ; c al l s ym p u t ( ’ a lp h a _ ph a s e 2 ’ , s tr i p ( a l ph a _ p ha s e 2 ) );
* m in im u m d et e ct a bl e e ff ec t ;
l ow e r _c v 2 = l o w e r_ m a rg i n 2 + 0 . 06 4 ; c al l s ym p u t ( ’ p h as e 2 _ ml e _ s uc c e s s ’ , l o we r _ cv 2 );
y _c tr l_ p ha se 2 = in t_ c tr l *( n _c tr l _p ha se 2 );
y _a ct iv e _p ha s e2 = ( in t_ ct rl + l ow er _ cv 2 )* n _a ct i ve _p h as e2 ;
* Wa l d ;
/* p _ ac t iv e _p h as e 2 = y_ a ct iv e _p h as e 2 / n_ a ct i ve _ ph as e 2 ;
p_ctrl_phase2=y_ctrl_phase2/n_ctrl_phase2 ;
s e_ ph as e2 = s qr t( p _a c ti ve _ ph as e 2 *(1 - p _a c ti ve _p h as e2 ) / n_ a ct iv e_ p ha se 2 + p _ ct rl _ ph as e2
*( 1 - p_ ct r l_ ph as e 2 )/ n _c tr l_ p ha se 2 );
p ha se 2 _p ow e r =1 - cd f( ’ n or ma l ’ ,( p_ a ct iv e _p ha se 2 - p _ ct rl _p ha s e2 - d if f )/ s e_ ph as e2 ,0 , 1 ); */
* L ik e li h oo d R a ti o Te s t ;
i nt _c tr l _n ul l =( y _ ct rl _p h as e2 + ( y_ a ct iv e _p ha se 2 /( i n t_ ct rl + d if f ))* i nt _c t rl
33
-( ( y_ ac t iv e _p ha s e2 / ( in t_ ct r l +d if f )) * in t _c tr l )* i nt _ ct rl + ( y_ a ct iv e _p ha s e2
*( 1 - in t_ ct rl ) /( 1- i nt _c tr l - di ff ) )* i nt _c tr l )/ ( n _c tr l_ p ha se 2 +( n _a ct i ve _p ha s e2
*( 1 - i nt _ ct r l ) /( 1 - i nt _c t rl - d i ff ) ) );
do i =1 t o 10 0 ;
i nt _c tr l _n ul l =( y _ ct rl _p h as e2 + ( y_ a ct iv e _p ha se 2 /( i n t_ ct r l_ nu ll + d if f )) * in t_ c tr l_ n ul l
-( ( y_ a ct iv e _p ha s e2 / ( in t _c tr l _n ul l + di ff ) )* i n t_ ct r l_ nu l l )* i nt _c t rl _n u ll
+( y _ ac t iv e _p h as e 2 *( 1 - in t _c t rl _ nu l l )/ (1 - i nt _ ct r l_ nu l l - d if f )) * i nt _ ct r l_ nu l l )
/( n _ ct rl _p h as e2 + ( n_ ac t iv e_ p ha se 2 *( 1 - in t_ c tr l_ nu l l )/( 1 - in t_ c tr l_ nu ll - d if f )) );
en d ;
l ik el i ho od _p h as e2 = ( in t _c tr l _n ul l ** y _c t rl _p h as e2 ) *( 1 - in t_ c tr l_ nu l l )* *( n _c t rl _p h as e2
- y_ ct rl _p h as e2 ) *( ( in t_ ct rl _n ul l + dif f )* *( y _a ct iv e _p ha se 2 ))
*( (1 - i nt _c tr l_ nu ll - d if f) ** ( n_ ac t iv e_ ph as e2 - y _a ct i ve _p ha se 2 )) ;
l ik el i ho od _ 1_ ph a se 2 =( i n t_ ct rl * * y_ c tr l_ p ha se 2 ) *(1 - i n t_ ct rl ) ** ( n_ c tr l_ p ha se 2
- y_ ct rl _p h as e2 ) *( ( in t_ ct rl + lo we r_ cv 2 )* *( y _a c ti ve _p ha se 2 )) *( (1 - i nt _c tr l
- lo we r_ cv 2 )* *( n _a ct iv e _p ha se 2 - y_ ac t iv e_ ph as e 2 ));
lam b d a _ phas e 2 =( l i k e l ihoo d _ p h a se2 ) / ( l ikel i h o o d _ 1_p h a s e 2 ) ;
l og l a mb d a _p h a se 2 = l o g ( la m b da _ p ha s e 2 );
m in u s2 l o gl a mb d a _p h as e 2 = - 2* l o gl a mb d a _p h as e 2 ;
if d i ff lt l o w er_c v 2 t h en do ;
p ha s e 2 _p o w e r = (1 - c df ( ’ c h i s qu a r e ’ , - 2* l og l a m bd a _ ph a s e 2 , 1 )) / 2 ;
en d ;
el s e if di f f ge l o w e r_c v 2 t h en do ;
p ha s e 2 _p o w e r = ( 1+ c df ( ’ c h i s qu a r e ’ , - 2* l og l a m bd a _ ph a s e 2 , 1 )) / 2 ;
en d ;
* CD f or d e f i ni t i on o f s uc c e ss ;
H _p h as e 2_ su c ce s s = ph a se 2 _p o we r ;
d H_ p h as e 2 _s u c c es s _ dd i f f =( H _ ph a se 2 _s u c ce s s - l ag ( H _ p ha s e 2_ s u cc e s s ) )/ ( d if f - l ag ( d i ff ) );
C _p h a se 2 _ s uc c e s s = H _ ph a s e2 _ s u cc e s s * ( d if f lt l o w er _ c v2 ) + ( 1 - H _ p ha s e 2 _s u c ce s s ) * ( d if f gt l o w er _ c v2 ) ;
* Ph a se 3 ;
n _c t rl _ p ha s e3 = 36 5 ; ca l l sy m pu t ( ’ n _c t rl _ p ha s e3 ’ , tr i m ( le ft ( n _c t r l_ p ha s e 3 ) )) ;
n _a c ti v e _p h as e 3 = 3 65 ; ca l l sy m pu t ( ’ n _a c t iv e _p h a se 3 ’ , t ri m ( le f t ( n_ a c ti v e_ p h as e 3 ) )) ;
* nu l l hy p o th e si s ;
l ow e r _ ma r g in 3 = - 0 .1 2 ; c al l s ym p u t ( ’ l ow e r _ m ar g i n _p h a s e3 ’ , s tr i p ( l o w er _ m a rg i n 3 ) ) ;
a lp h a _p h a s e3 = 0 . 02 5 ; ca l l s ym p u t ( ’ a lp h a _ ph a s e3 ’ , s tr i p ( a l p ha _ p h as e 3 ) ) ;
* m in im u m d et e ct a bl e e ff ec t ;
l ow er _ cv 3 = l ow e r_ m ar g in 3 + 0 .0 71 ;
y _c tr l _p h as e 3 = in t_ c tr l * n _c t rl _p h as e 3 ;
y _a ct iv e _p ha s e3 = ( in t_ ct rl + l ow er _ cv 3 )* n _a ct i ve _p h as e3 ;
* Wa l d ;
/* p _ ac t iv e _p h as e 3 = y_ a ct iv e _p h as e 3 / n_ a ct i ve _ ph as e 3 ;
p_ctrl_phase3=y_ctrl_phase3/n_ctrl_phase3 ;
s e_ ph as e3 = s qr t( p _a c ti ve _ ph as e 3 *(1 - p _a c ti ve _p h as e3 ) / n_ a ct iv e_ p ha se 3
+ p_ c tr l_ ph a se 3 *( 1- p _ ct rl _ ph as e3 ) / n _c tr l_ p ha se 3 );
p ha s e 3_ p o w er 1 = 1 - c df ( ’ n o rm a l ’ , ( p _a c ti v e _p h a se 3 - p _c t rl _ p ha s e 3 - d i ff ) / s e_ p ha s e3 ,0 , 1) ; */
* L ik e li h oo d R a ti o Te s t ;
i nt _c tr l _n ul l =( y _ ct rl _p h as e3 + ( y_ a ct iv e _p ha se 3 /( i n t_ ct rl + d if f ))* i nt _c t rl
-( ( y_ ac t iv e _p ha s e3 / ( in t_ ct r l +d if f )) * in t _c tr l )* i nt _ ct rl + ( y_ a ct iv e _p ha s e3
*( 1 - in t_ ct rl ) /( 1- i nt _c tr l - di ff ) )* i nt _c tr l )/ ( n _c tr l_ p ha se 3 +( n _a ct i ve _p ha s e3
34
*( 1 - i nt _ ct r l ) /( 1 - i nt _c t rl - d i ff ) ) );
do i =1 t o 10 0 ;
i nt _c tr l _n ul l =( y _ ct rl _p h as e3 + ( y_ a ct iv e _p ha se 3 /( i n t_ ct r l_ nu ll + d if f )) * in t_ c tr l_ n ul l
-( ( y_ a ct iv e _p ha s e3 / ( in t _c tr l _n ul l + di ff ) )* i n t_ ct r l_ nu l l )* i nt _c t rl _n u ll
+( y _ ac t iv e _p h as e 3 *( 1 - in t _c t rl _ nu l l )/ (1 - i nt _ ct r l_ nu l l - d if f )) * i nt _ ct r l_ nu l l )
/( n _ ct rl _p h as e3 + ( n_ ac t iv e_ p ha se 3 *( 1 - in t_ c tr l_ nu l l )/( 1 - in t_ c tr l_ nu ll - d if f )) );
en d ;
l ik el i ho od _p h as e3 = ( in t _c tr l _n ul l ** y _c t rl _p h as e3 ) *( 1 - in t_ c tr l_ nu l l )* *( n _c t rl _p h as e3
- y_ ct rl _p h as e3 ) *( ( in t_ ct rl _n u ll + di ff )* *( y _a c ti ve _p ha se 3 )) *( (1 - i nt _c tr l_ n ul l
- di ff )* *( n _a c ti ve _p ha se 3 - y _a ct iv e_ p ha se 3 )) ;
l ik el i ho od _ 1_ ph a se 3 =( i n t_ ct rl * * y_ c tr l_ p ha se 3 ) *(1 - i n t_ ct rl ) ** ( n_ c tr l_ p ha se 3
- y_ ct rl _p h as e3 ) *( ( in t_ ct rl + lo we r_ cv 3 )* *( y _a c ti ve _p ha se 3 )) *( (1 - i nt _c tr l
- lo we r_ cv 3 )* *( n _a ct iv e _p ha se 3 - y_ ac t iv e_ ph as e 3 ));
lam b d a _ phas e 3 =( l i k e l ihoo d _ p h a se3 ) / ( l ikel i h o o d _ 1_p h a s e 3 ) ;
l og l a mb d a _p h a se 3 = l o g ( la m b da _ p ha s e 3 );
m in u s2 l o gl a mb d a _p h as e 3 = - 2* l o gl a mb d a _p h as e 3 ;
if d i ff lt l o w er_c v 3 t h en do ;
p ha s e 3 _p o w e r = (1 - c df ( ’ c h i s qu a r e ’ , - 2* l og l a m bd a _ ph a s e 3 , 1 )) / 2 ;
en d ;
el s e if di f f ge l o w e r_c v 3 t h en do ;
p ha s e 3 _p o w e r = ( 1+ c df ( ’ c h i s qu a r e ’ , - 2* l og l a m bd a _ ph a s e 3 , 1 )) / 2 ;
en d ;
if phase3_power=0 then phase3_power=.;
* CD f or d e f i ni t i on o f s uc c e ss ;
H _p h as e 3_ su c ce s s = ph a se 3 _p o we r ;
d H_ p h as e 3 _s u c c es s _ dd i f f =( H _ ph a se 3 _s u c ce s s - l ag ( H _ p ha s e 3_ s u cc e s s ) )/ ( d if f - l ag ( d i ff ) );
C _p ha se 3 _s uc c es s =( H _p ha s e3 _s u cc es s )* ( di ff l t l ow e r_ cv 3 ) + ( 1 -H _ ph as e 3_ su cc e ss ) *( d if f g t l ow e r_ cv 3 );
* CDs f or Po w e r . D eriv a t i ve of H wr t p owe r ;
phase2and3_power= phase3_power* phase2_power;
d H_ d po w e r =( H - l ag ( H ) )/ ( p h as e 3_ p ow e r - l ag ( p h a se 3 _ po w er ) );
if d H _ dpow e r =0 th e n d H_d p o w e r = . ;
if 0 g t p h a s e3_p o w e r g t 1 t he n d H _ dpow e r =. ;
d H_ d p ha s e 2p o w er =( H - l ag ( H ) ) /( p h a se 2 _p o we r - l ag ( p ha s e 2_ p o we r ) ) ;
if d H _ d p has e 2 p o w er =0 t hen d H _ d p h ase2 p o w e r = . ;
if 0 g t p h a s e2_p o w e r g t 1 t he n d H _ d p has e 2 p o w er = .;
d H_ d p ha s e 23 p o we r = ( H - la g ( H )) / ( p ha s e2 a n d3 _ po w e r - l a g ( ph a s e2 a n d3 _ p ow e r ) ) ;
if 0 gt phase2and3_power gt 1 then dH_dphase23power=.;
d H_ p h as e 2 _d p h as e 3 = ( H _p h as e 2 _s u cc e s s - l a g ( H_ p h as e 2 _s u c c es s ) ) /( p h as e 3 _p o we r
- la g ( p ha s e 3_ p o we r ) ) ;
* A dd it i on a l ph a se 2 in f er e nc e ;
H _m ul t ip l y =H * H _p h as e 2_ su c ce s s ;
d H_ m u lt i p ly _ d di f f = ( H _m u lt i pl y - l ag ( H _ mu l t ip l y ) )/ ( d if f - l a g ( di f f )) ;
C_m u l t ipl y = H _ mult i p l y * ( H _ m u ltip l y l t 0. 5 ) + (1 - H _ m ulti p l y ) * ( H _ m u l tip l y g t 0. 5 ) ;
35
e li ci te d _v ar = ( di f f_ ha t + in t_ c tr l )* (1 - d if f_ ha t - in t _c tr l )/ n _a c ti ve + ( in t_ c tr l )
*( 1 - i nt _ ct r l )/ n _c t rl ;
p ha se 2_ v ar = ( lo we r_ c v2 + in t _c tr l )* (1 - l ow er _c v2 - i nt _ ct rl ) / n_ ac t iv e_ p ha se 2
+ in t _c tr l *( 1 - in t_ ct r l )/ n _c tr l _p ha s e2 ;
p ha se 3_ v ar = ( lo we r_ c v3 + in t _c tr l )* (1 - l ow er _c v3 - i nt _ ct rl ) / n_ ac t iv e_ p ha se 3
+ in t _c tr l *( 1 - in t_ ct r l )/ n _c tr l _p ha s e3 ;
H _c on vo l ve = c df ( ’n or m al ’ ,( q ua n ti le ( ’ no r ma l ’, H ,0 , 1) / sq rt ( e li ci t ed _v a r )
+ qu a nt i le ( ’ n or ma l ’ , H_ p ha s e2 _ su cc e ss ,0 , 1) / s qr t ( ph a se 2 _v a r ))
/ sq r t ( 1/ ( e li c it e d_ v a r ) + 1 / ( p ha s e2 _ va r ) ) ,0 , 1) ;
d H_ c o nv o l ve _ d di f f = ( H _c o nv o lv e - l ag ( H _ co n v ol v e ) )/ ( d if f - l a g ( di f f )) ;
C_c o n v olv e = H _ conv o l v e * ( H _ c o nvol v e l t 0. 5 ) + (1 - H _ c onvo l v e ) * ( H _ c o n vol v e g t 0. 5 ) ;
* We i gh t s fo r Po S ca l cu l at i on s ;
we i g ht = H - la g ( H );
w ei g h t_ p h as e 3 co n d 2 = H _p h as e 2 _s u cc e s s - l a g ( H_ p h as e 2 _s u c c es s ) ;
* R efer e n c e l in e s a nd sh a d ed reg i o n s in f i g ure s ;
if d if f le l o we r_ m ar g in 2 t he n re f1 = a l ph a _p h as e 2 ; e ls e re f1 = .;
if d if f le l o we r_ m ar g in 3 t he n re f2 = a l ph a _p h as e 3 ; e ls e re f2 = .;
if p h a s e2_ p o w e r le a l p h a _pha s e 2 t hen r e f3 = l o w e r_m a r g i n2 ; els e r ef3 =.;
if p h a s e3_ p o w e r le a l p h a _pha s e 3 t hen r e f4 = l o w e r_m a r g i n3 ; els e r ef4 =.;
if d i ff le l o w e r_ma r g i n 2 t hen r e f5 = 0 .5; e l s e ref 5 =. ;
if p h a s e3_ p o w e r le 0 . 5 th e n r ef 6 = l owe r _ m a r gin 2 ; el s e ref 6 =. ;
if 0 . 49 l e ph a s e 2_ p o we r l e 0. 5 1 th e n do ; c al l s ym p u t ( ’ ta i l 2 ’ ,H ) ; en d ;
if 0 . 49 l e ph a s e 3_ p o we r l e 0. 5 1 th e n do ; c al l s ym p u t ( ’ ta i l 3 ’ ,H ) ; en d ;
if 0 . 49 le p h a s e 2and 3 _ p o w er le 0 . 51 th e n do ; ca l l s ymp u t ( t a i l 23 ,H ); e n d ;
if H _ ph a se 2_ s uc c es s le a l ph a _p ha s e2 t he n a re a = dH _ ph as e 2_ s uc c es s _d d if f ;
else area=.;
if H _ ph a se 2 _s u cc es s l e a l ph a _p ha s e2 t h en a re a2 = d H _p h as e 2_ d ph a se 3 ;
el se a r ea 2 =. ;
output;
en d ;
ru n ;
*PoS Calculations;
pr o c m ea n s d ata = b ino m i a l mea n n o pri n t ;
weight weight;
va r p ha s e2 _p o we r p ha s e3 _ po w er p h as e 2a n d3 _ po we r ;
ou t p ut o u t = m e an _ p o we r ( w h er e = ( _ s t at _ = ’ M E AN ’ ) );
ru n ;
pr o c m ea n s d ata = b ino m i a l m ean n o p rin t ;
we i gh t w ei g ht _ ph as e 3c o nd 2 ;
var phase3_power ;
ou t p ut o ut = m e an _ p ha s e 3c o n d 2_ p o we r m e an = p h a se 3 c on d 2 _p o w e r ;
ru n ;
da t a m e an _ po w er ;
se t m ea n _p ow e r ;
ca l l s ym p ut ( ’ m e an _ p h as e 2 _ p ow e r ’ , s t ri p ( r o un d ( p ha s e2 _ p ow e r , 0 . 0 01 ) ) );
ca l l s ym p ut ( ’ m e an _ p h as e 3 _ p ow e r ’ , s t ri p ( r o un d ( p ha s e3 _ p ow e r , 0 . 0 01 ) ) );
ca l l s ym p ut ( ’ m e an _ p h a se 2 3 _ po w e r ’ , s tr i p ( r o un d ( p h a se 2 a nd 3 _ p ow e r , 0 . 0 01 ) ) );
ru n ;
da t a m e an _ ph a se 3 co n d2 _ po w er ;
se t m ea n _p h as e 3c o nd 2 _p ow e r ;
36
ca l l s ym p ut ( ’ m e an _ p h a se 3 c o n d2 _ p o we r ’ , st r i p ( r ou n d ( p h as e 3 co n d 2 _p o w er , 0. 0 0 1 )) ) ;
ru n ;
* ML E s ;
pr o c s ql n o pr i nt ;
se l e c t dif f _ h at
in t o : d i ff _ ha t
fr o m b i no m ia l ;
quit;
pr o c m ea n s d ata = b ino m i a l no p r i nt ;
wh e r e & d i f f_ h a t . -& d di f f . l e di f f le & d if f _ ha t . +& d d i ff . ;
va r p ha s e2 _p o we r p ha s e3 _ po w er p h as e 2a n d3 _ po we r ;
ou t pu t ou t = ml e s_ p ow e r ( w he r e =( _ st a t_ = ’ MI N ’ )) ;
ru n ;
pr o c m ea n s d ata = b ino m i a l no p r i nt ;
wh e r e 0.5 - 0. 0 1 le H_ m u l tip l y l e 0.5 + 0 . 0 1;
var phase3_power ;
ou t p u t ou t = m l e _ p h a se 3 c o n d 2 _p o w e r ( w he r e = ( _ s t a t_ = ’ MI N ’ ) ) ;
ru n ;
da t a m l es _ po w er ;
se t m le s _p ow e r ;
ca l l s ym p ut ( ’ p h as e 2 _ po w e r _ ml e ’ , s t ri p ( r o u nd ( p ha s e 2_ p o we r ,0 . 0 01 ) ) );
ca l l s ym p ut ( ’ p h as e 3 _ po w e r _ ml e ’ , s t ri p ( r o u nd ( p ha s e 3_ p o we r ,0 . 0 01 ) ) );
ca l l s ym p ut ( ’ p h as e 2 3 _p o w e r _m l e ’ , s t ri p ( r o un d ( p ha s e 2a n d 3_ p o we r , 0. 0 0 1) ) ) ;
ru n ;
da t a m l e_ p ha s e3 c on d 2_ p ow e r ;
se t m le _ ph a se 3 co n d2 _p o we r ;
ca l l s ym p ut ( ’ p h as e 3 c o nd 2 _ p ow e r _ m le ’ , s tr i p ( r ou n d ( p h as e 3 _p o w er , 0. 0 0 1) ) ) ;
ru n ;
* P re p ar e fo r p lo ts ;
data binomial_stack;
se t b i no m i al ( i n = a) b i no m i a l ( i n = b );
if a th e n do ;
ph a se = 2;
d H_ su c ce s s_ d di f f = dH _ ph as e 2_ s uc c es s _d d if f ;
C_success=C_phase2_success ;
ml e = l o we r _c v 2 ;
lower_margin=lower_margin2;
en d ;
if b th e n do ;
ph a se = 3;
d H_ su c ce s s_ d di f f = dH _ ph as e 3_ s uc c es s _d d if f ;
C_success=C_phase3_success ;
ml e = l o we r _c v 3 ;
lower_margin=lower_margin3;
en d ;
ru n ;
* Ch e ck t y pe I e r ro r r at e s ;
pr o c s ql n o pr i nt ;
se l e ct m ax ( p ha s e 2_ p o we r )
in t o : c p _p h a se 2
37
from binomial_stack
wh e re l o we r _m a rg i n2 -& d di f f . < di f f < l o we r _m a rg i n 2 + & dd i ff . a nd p ha s e =2 ;
se l e ct m ax ( p ha s e 3_ p o we r )
in t o : c p _p h a se 3
from binomial_stack
wh e re l o we r _m a rg i n3 -& d di f f . < di f f < l o we r _m a rg i n 3 + & dd i ff . a nd p ha s e =3 ;
quit;
% pu t & c p _p h as e 2 . & c p_ ph a se 3 . ;
* Pl o ts ;
pr o c f o rm a t ;
va l u e ph a se 2 = ’ P ha s e ␣ 2 ␣ S uc c e ss
3= ’ P h a se ␣ 3 ␣ S u c ce s s ’
;
ru n ;
od s e s ca p e ch a r = ’ ^ ’ ;
op t io n s no d at e no n um b er ;
od s g r ap h i cs / b or d e r = no h e ig h t = 3 in w i dt h = 6. 0 i n ;
pr o c s gpa n e l dat a = bino m i a l _ sta c k n o a u t ole g e n d ;
pa n el b y ph as e / no v ar n am e ;
fo r ma t ph as e p ha se . ;
re f l in e l o we r _ ma r g in / ax i s = x l i n ea t tr s = ( p a tt e rn = d ot ) ;
se r ie s x = di ff y = dH _s u cc es s _d di f f / g ro up = ph as e l in ea t tr s =( t hi ck n es s =1 );
ro w ax i s l a be l =" C o nf i de n ce ␣ D en s it y " of f se t mi n = 0. 0 2;
co l ax i s l a be l =" T r ue ␣ D if f er e nc e ␣ in ␣ P ro p or ti o ns " o f fs et m in = 0 o ff se t ma x = 0;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
ru n ;
pr o c s gpa n e l dat a = bino m i a l _ sta c k n o a u t ole g e n d ;
pa n el b y ph as e / no v ar n am e ;
fo r ma t ph as e p ha se . ;
re f l in e l o we r _ ma r g in / ax i s = x l i n ea t tr s = ( p a tt e rn = d ot ) ;
se r ie s x = di f f y = C _s u cc e s s / g ro u p = ph a se l i ne a tt r s = ( th i c kn e ss = 1) ;
ro w ax i s la b el = " C o nf i de n ce ␣ C u rv e " of f se t mi n = 0 .0 2 ;
co l ax i s l a be l =" T r ue ␣ D if f er e nc e ␣ in ␣ P ro p or ti o ns " o f fs et m in = 0 o ff se t ma x = 0;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
ru n ;
op t io n s no d at e no n um b er ;
od s g r ap h i cs / b or d e r = no h e ig h t = 3 in w i dt h = 6. 0 i n ;
od s e sc a pe ch a r =" ^ " ;
pr o c s gpl o t d at a = b ino m i a l n oau t o l e gend ;
re f li ne d if f _h at / ax is = x li ne a tt rs = ( th ic k ne ss = 0 .5 );
se r ie s x = di f f y = p ha s e 2_ p ow e r / l i ne a t tr s = ( t hi c kn e ss = 1) n am e = " p ha s e2 _ p ow e r " ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e= " p ha s e3 _ po w er " ;
se r ie s x= d i ff y = ph a se 2 an d 3_ p ow e r / l i ne at t rs = ( th i ck n es s =4 c o lo r = cx 6 6A 5 A0 )
name="phase2and3_power";
se r ie s x = di ff y = dH dd if f / l i ne at t rs = ( th ic kn e ss = 2 p at te r n= d as h co lo r = bl ac k ) y2 ax is
na m e = " CD " ;
k ey le g en d " ph a se 2 _p o we r " " p h as e 3_ po w er " " p ha s e2 a nd 3 _p o we r " " C D" ;
38
se r ie s x = di f f y = r ef 1 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = di f f y = r ef 2 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 3 y = p ha s e 2_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
se r ie s x = re f 4 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
y2 a x i s v alu e s =( 0 t o 24 b y 4 ) o ffs e t m in = 0 . 02;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
la b el p h as e 2 _p o we r = " Ph a se ␣ 2 ␣ Po w er " p h as e 3_ p ow e r = " Ph as e ␣ 3 ␣ Po we r "
p ha s e2 a n d3 _ po w er = " Ph a se ␣ 2 ␣ an d ␣ 3␣ P o we r " dH d di f f =" C o n fi d en c e ␣ De n si t y ";
ru n ;
op t io n s no d at e no n um b er ;
od s g r ap h i cs / b or d e r = no h e ig h t = 3 in w i dt h = 6. 0 i n ;
od s e sc a pe ch a r =" ^ " ;
pr o c s gpl o t d at a = b ino m i a l n oau t o l e gend ;
re f li ne d if f _h at / ax is = x li ne a tt rs = ( th ic k ne ss = 0 .5 );
se r ie s x = di f f y = p ha s e 2_ p ow e r / l i ne a t tr s = ( t hi c kn e ss = 1) n am e = " p ha s e2 _ p ow e r " ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e= " p ha s e3 _ po w er " ;
se r ie s x= d i ff y = ph a se 2 an d 3_ p ow e r / l i ne at t rs = ( th i ck n es s =4 c o lo r = cx 6 6A 5 A0 )
name="phase2and3_power";
se r ie s x = di f f y = C / l in e a tt r s =( t hi c kn e s s =2 p a tt e rn = d a sh c ol o r = bl a ck ) y 2 ax i s
na m e = " CD " ;
k ey le g en d " ph a se 2 _p o we r " " p h as e 3_ po w er " " p ha s e2 a nd 3 _p o we r " " C D" ;
se r ie s x = di f f y = r ef 1 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = di f f y = r ef 2 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 3 y = p ha s e 2_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
se r ie s x = re f 4 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
y2 a xi s ma x =1 o f fs e tm in = 0. 0 2;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
la b el p h as e 2 _p o we r = " Ph a se ␣ 2 ␣ Po w er " p h as e 3_ p ow e r = " Ph as e ␣ 3 ␣ Po we r "
p ha s e2 a n d3 _ po w er = " Ph a se ␣ 2 ␣ an d ␣ 3␣ P o we r " C = " C on f id e nc e ␣ C ur v e ";
ru n ;
od s g ra ph i cs / bo r de r =n o he i gh t =3 .5 i n wi d th = 6. 0 in ;
pr o c sg p lo t d at a = b in o mi a l ;
se r ie s x= p h as e 2_ p ow e r y = d H_ d ph a se 2 po we r / li n ea t tr s =( t hi c kn e ss = 1) n am e = " ph a se 2 _p o we r " ;
se r ie s x= p h as e 3_ p ow e r y = d H_ d po w er / l in e at t rs = ( t hi ck n es s = 2 c o lo r = cx D 05 B 5B )
na m e= " p ha s e3 _ po w er " ;
se r ie s x = ph a se 2a n d3 _p o we r y = d H_ dp h as e2 3 po we r / li ne a tt rs = ( th i ck ne ss = 4 c o lo r= c x 66 A5 A0 )
na m e= " p ha s e2 3 _p o we r " ;
re f l in e & p ha s e 2_ p o w er _ m le . / ax i s =x l i ne a t tr s = ( c ol o r = b lu e p at t er n = d ot )
l eg e nd l a be l = " P ha s e ␣ 2␣ P o we r ␣ m le " n am e = " P ha s e ␣2 ␣ P oS ␣ ( P o we r ␣ m le ) " ;
re f li n e & p ha s e3 _ po w er _ ml e . / a xi s =x l in e at t rs = ( co l or = c xD 05 B 5B p a tt er n = do t t hi ck n es s = 2)
l eg e nd l a be l = " P ha s e ␣ 3␣ P o we r ␣ m le " n am e = " P ha s e ␣3 ␣ P oS ␣ ( P o we r ␣ m le ) " ;
re f l in e & p ha s e 2 3_ p o we r _ ml e . / a x is = x l in e a tt r s = ( co l or = c x 66 A 5A 0 p a tt e rn = d ot t h ic k ne s s = 4)
l eg e nd l ab e l =" P h as e ␣ 2␣ a n d␣ 3 ␣ P ow er ␣ m l e" n am e = " Ph a se ␣ 2 ␣ an d ␣3 ␣ P oS ␣ ( P ow er ␣ m le ) " ;
f oo t no t e1 j = le f t " P ha s e ␣2 ␣ P ow e r :␣ m le = & ph a s e2 _ po w er _ m le . , ␣ po s =& m e an _ ph a s e2 _ po w er . " ;
f oo t no t e2 j = le f t " P ha s e ␣3 ␣ P ow e r :␣ m le = & ph a s e3 _ po w er _ m le . , ␣ po s =& m e an _ ph a s e3 _ po w er . " ;
f oo t no t e 3 j = l ef t " Ph a se ␣ 2 ␣ an d ␣ 3␣ P o we r : ␣ m le = & p h as e 2 3_ p o we r _ ml e . ,
po s =& m ea n _p h as e 23 _ po w er . " ;
39
xa x is l a be l = " Po we r " ;
ya x is l ab e l =" C o nf id e nc e ␣ D en s it y " m in = 0 m a x =6 o ff s et mi n = 0 .0 2;
la b e l d H_ d p h as e 2 p ow e r = " P h as e ␣ 2 ␣ CD " d H _ dp o w er = " P ha s e ␣ 3 ␣ CD "
d H_ d p ha s e 23 p o we r = " P h as e ␣ 2 ␣ an d ␣ 3␣ C D " ;
k ey le g en d " ph a se 2 _p o we r " " ph a se 3 _p ow e r " " ph a se 2 3_ p ow e r " " P h as e ␣2 ␣ Po S ␣ ( Po we r ␣ ml e )"
" P ha s e ␣3 ␣ P oS ␣ ( P o we r ␣ m le ) " " P h as e ␣ 2 ␣ an d ␣ 3 ␣ Po S ␣ ( Po w er ␣ m l e )" ;
ru n ;
fo o tn o te ;
od s g ra ph i cs / bo r de r =n o he i gh t =3 .5 i n wi d th = 6. 0 in ;
pr o c s gpl o t d at a = b ino m i a l ;
se r ie s x= p h as e 2_ p ow e r y = C / li n ea t tr s =( t hi c kn e ss = 1) n am e = " ph a se 2 _p o we r "
l eg e n dl a b e l = " P ha s e ␣ 2 ␣ CC " ;
se r ie s x= p h as e 3_ p o we r y= C / l i ne a tt r s =( t h ic k ne ss = 2 c o lo r = cx D 05 B 5B )
na m e = " p h as e 3 _ po w e r " le g e n dl a b e l = " Ph a s e ␣ 3␣ C C " ;
se r ie s x = ph a se 2a n d3 _p o we r y = C / li n ea tt r s =( t hi ck ne s s =4 co lo r = cx 66 A 5A 0 )
na m e = " p h as e 2 3 _p o w e r " le g e nd l a b el = " P ha s e ␣ 2 ␣ an d ␣ 3 ␣ CC " ;
f oo t no t e1 j = le f t " P ha s e ␣2 ␣ P ow e r :␣ m le = & ph a s e2 _ po w er _ m le . , ␣ po s =& m e an _ ph a s e2 _ po w er . " ;
f oo t no t e2 j = le f t " P ha s e ␣3 ␣ P ow e r :␣ m le = & ph a s e3 _ po w er _ m le . , ␣ po s =& m e an _ ph a s e3 _ po w er . " ;
f oo t no t e 3 j = l ef t " Ph a se ␣ 2 ␣ an d ␣ 3␣ P o we r : ␣ m le = & p h as e 2 3_ p o we r _ ml e . ,
po s =& m ea n _p h as e 23 _ po w er . " ;
xa x is l a be l = " Po we r " ;
ya x is l ab e l =" C o nf id e nc e ␣ C ur ve " m in = 0 m ax = 0 .8 o ff s et m in = 0 .0 2;
k ey le g en d " ph a se 2 _p o we r " " ph a se 3 _p ow e r " " ph a se 2 3_ p ow e r " " P h as e ␣2 ␣ Po S ␣ ( Po we r ␣ ml e )"
" P ha s e ␣3 ␣ P oS ␣ ( P o we r ␣ m le ) " " P h as e ␣ 2 ␣ an d ␣ 3 ␣ Po S ␣ ( Po w er ␣ m l e )" ;
ru n ;
fo o tn o te ;
op t io n s no d at e no n um b er ;
od s g r ap h i cs / b or d e r = no h e ig h t = 3 in w i dt h = 6. 0 i n ;
od s e sc a pe ch a r =" ^ " ;
pr o c sg p lo t d at a = b in o mi a l n oa u to l e ge n d ;
ba n d x = d if f up p er = a re a l ow e r =0 / fi l la t tr s = ( c ol or = l i g ht g re y ) y2 a xi s ;
re f l i ne l owe r _ c v 2 / ax i s = x l i nea t t r s = ( t h i c kne s s =0. 5 ) ;
se r ie s x = di ff y = d H_ p h as e 2_ s u cc e ss _ d di f f / l i ne a tt r s =( t h ic k ne s s =1 p at t er n = s ol i d )
y2 a xi s n am e =" C D " l e ge n dl a be l = " Mi n im u m ␣P h as e ␣ 2␣ S u cc e ss " ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e= " p ha s e3 _ po w er " ;
k ey le g en d " ph a se 2 _p o we r " " p h as e 3_ po w er " " p ha s e2 a nd 3 _p o we r " " C D" ;
se r ie s x = di f f y = r ef 5 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 6 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
y2 a x i s v alu e s =( 0 t o 24 b y 4 ) o ffs e t m in = 0 . 02;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
la b el p h a se 3 _p o we r = " P ha se ␣ 3 ␣ P ow er "
d H_ p ha s e2 _ s uc c es s _d d if f = " C on f id e nc e ␣ D en s it y " ;
ru n ;
fo o tn o te ;
op t io n s no d at e no n um b er ;
od s g r ap h i cs / b or d e r = no h e ig h t = 3 in w i dt h = 6. 0 i n ;
40
od s e sc a pe ch a r =" ^ " ;
pr o c s gpl o t d at a = b ino m i a l n oau t o l e gend ;
re f l i ne l owe r _ c v 2 / ax i s = x l i nea t t r s = ( t h i c kne s s =0. 5 ) ;
se r ie s x = di ff y = C_ ph a se 2_ s uc ce s s / l in e at tr s =( t hi c kn es s =1 p at te r n= s ol id )
y2 a xi s n am e =" C D " l e ge n dl a be l = " Mi n im u m ␣P h as e ␣ 2␣ S u cc e ss " ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e= " p ha s e3 _ po w er " ;
k ey le g en d " ph a se 2 _p o we r " " p h as e 3_ po w er " " p ha s e2 a nd 3 _p o we r " " C D" ;
se r ie s x = di f f y = r ef 5 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 6 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
y2 a xi s ma x =1 o f fs e tm in = 0. 0 2;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
la b el p h a se 3 _p o we r = " P ha se ␣ 3 ␣ P ow er "
C _p h as e 2 _s u cc e ss = " C on f id e nc e ␣ C ur v e ";
ru n ;
fo o tn o te ;
od s g ra ph i cs / bo r de r =n o he i gh t =3 .5 i n wi d th = 6. 0 in ;
pr o c s gpl o t d at a = b ino m i a l ;
ba n d x = p ha s e 3_ p ow e r up p er = a re a 2 lo w er = 0 / f i ll a tt r s =( c o lo r = l ig h tg r ey ) ;
se r ie s x = ph a se 3_ po w er y = dH _p h as e2 _ dp ha s e3 / l in ea tt r s =( t hi ck ne s s =2 co lo r = cx D0 5 B5 B )
na m e= " p ha s e3 _ po w er " ;
re f li n e & p h as e 3c o n d2 _ po w er _ ml e . / a xi s = x l i ne a tt r s =( c o lo r = cx D 05 B 5B p a tt er n = do t
t hi c kn e s s =2 ) l e ge n dl a b el = " Ph a se ␣ 3 ␣ Po w er ␣ m l e " n am e = " P ha s e ␣ 3␣ P o we r ␣ m le " ;
f oo t no t e1 j = le f t " P ha s e ␣3 ␣ P ow e r :␣ m le = % sy s fu n c ( st ri p ( & p ha s e 3c o nd 2 _ po w er _ m le . )) ,
po s =& m ea n _p h a se 3 co n d2 _ po w er . " ;
xa x is l a be l = " Po we r " ;
ya x is l ab e l =" C o nf id e nc e ␣ D en s it y " m in = 0 m a x =6 o ff s et mi n = 0 .0 2;
la b e l d H _p h a s e2 _ d p ha s e 3 = " P h as e ␣ 3 ␣ CD " ;
k ey l eg e nd " p ha s e 3_ p ow e r " " Ph as e ␣ 3 ␣ Po we r ␣ m le " ;
ru n ;
fo o tn o te ;
op t io n s no d at e no n um b er ;
od s g ra ph i cs / bo r de r =n o he i gh t =3 .5 i n wi d th = 6. 0 in ;
pr o c s gpl o t d at a = b ino m i a l ;
re f l i ne l owe r _ c v 2 / ax i s = x l i nea t t r s = ( t h i c kne s s =0. 5 ) ;
se r ie s x = di f f y = d H _p h as e 2 _s u c ce s s _d d i ff / l in e at t rs = ( p at t er n = s ol i d ) n am e = " ph a se ␣ 2 "
l eg e nd l ab e l =" ( ii ) ␣ M in im u m ␣ Ph a se ␣ 2 ␣ Su cc e ss " y 2a x is ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e =" p h a se 3 _p o we r " l eg e nd l a be l = "( v ) ␣ Ph a se ␣ 3 ␣ Po w er " ;
se r ie s x = di f f y = d H_ m u lt i p ly _ dd i f f / l i ne a tt r s =( p at t er n = 8 ) n a me = " m ul t i pl y "
l eg en d la b el = " ( ii i) ␣ M ul t ip l ic a ti o n " y 2a x is ;
se r ie s x = di f f y = d H_ c o nv o l ve _ dd i f f / l i ne a tt r s =( p at t er n = 3 ) n a me = " c o nv o lu t io n "
l eg e n dl a b e l = " ( iv ) ␣ C o n vo l u ti o n " y 2a x i s ;
se r ie s x = di f f y = d Hd d if f / li n ea t tr s = ( c ol o r = bl a ck t h ic k ne s s = 2 p a tt e rn = d a sh )
na m e= " e li c it e d " l e ge nd l ab e l =" ( i )␣ E l ic it e d ␣ Co n fi d en ce ␣ D e ns i ty " y 2a xi s ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
y2 a x i s v alu e s =( 0 t o 24 b y 4 ) lab e l = " Co n f i d enc e D e nsi t y " o f f s etm i n =0. 0 2 ;
k ey le g en d " el i ci t ed " " ph a se ␣ 2 " " m ul t ip ly " " c on v ol u ti o n " " p ha s e3 _ po w er " ;
se r ie s x = di f f y = r ef 5 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 6 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
41
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
ru n ;
op t io n s no d at e no n um b er ;
od s g ra ph i cs / bo r de r =n o he i gh t =3 .5 i n wi d th = 6. 0 in ;
pr o c s gpl o t d at a = b ino m i a l ;
re f l i ne l owe r _ c v 2 / ax i s = x l i nea t t r s = ( t h i c kne s s =0. 5 ) ;
se r ie s x = di f f y = C _p h a se 2 _ su c ce s s / l i n ea t tr s = ( p at t er n = s ol i d ) n am e = " p ha se ␣ 2 "
l eg e nd l ab e l =" ( ii ) ␣ M in im u m ␣ Ph a se ␣ 2 ␣ Su cc e ss " y 2a x is ;
se r ie s x = di ff y = ph as e3 _ po we r / l i ne at tr s =( t h ic kn es s =2 c ol or = c xD 05 B 5B )
na m e =" p h a se 3 _p o we r " l eg e nd l a be l = "( v ) ␣ Ph a se ␣ 3 ␣ Po w er " ;
se r ie s x = di f f y = C _m u l ti p ly / l in e at t rs = ( p at t er n = 8) n am e = " m ul t ip l y "
l eg en d la b el = " ( ii i) ␣ M ul t ip l ic a ti o n " y 2a x is ;
se r ie s x= d if f y= C _ co n vo l ve / l in ea t tr s = ( pa t te rn = 3) n am e =" c o nv o lu t io n "
l eg e n dl a b e l = " ( iv ) ␣ C o n vo l u ti o n " y 2a x i s ;
se r ie s x = di f f y = C / l in e at t rs = ( co l or = b l ac k t hi c kn e s s =2 p a tt e rn = d a sh )
na m e= " e li c it e d " l e ge nd l ab e l =" ( i )␣ E l ic it e d ␣ Co n fi d en ce ␣ C u rv e " y 2 ax is ;
xa x is l ab e l =" T ru e ␣ Di f fe r en c e ␣i n ␣ Pr o po r ti o ns " o ff s et mi n = 0 o f fs e tm a x =0 ;
ya x is l a be l = " Po we r " o ff s et m in = 0. 0 2;
y2 a xi s m ax = 1 la b el = " C on f id e nc e ␣ C ur v e " o f fs e tm i n = 0. 0 2;
k ey le g en d " el i ci t ed " " ph a se ␣ 2 " " m ul t ip ly " " c on v ol u ti o n " " p ha s e3 _ po w er " ;
se r ie s x = di f f y = r ef 5 / l i ne a tt r s =( c ol o r = gr e y p a tt e r n = do t );
se r ie s x = re f 6 y = p ha s e 3_ p ow e r / l i ne a t tr s = ( co l or = g r ey p a tt e rn = d ot ) ;
f oo t n ot e 1 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 2 . ␣ f or ␣ p h a se ␣ 2 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 2 . ␣ wi t h ␣N = & n _ ct r l _p h a se 2 . ␣ p er ␣ a rm . " ;
f oo t n ot e 2 j = le f t " ^ { u n i co d e ␣ a l ph a } = & a l ph a _ p ha s e 3 . ␣ f or ␣ p h a se ␣ 3 ␣ L R ␣ te s t ␣ a g ai n s t ␣ d i ff e r en c e
<= ␣ & l ow e r _m a r g in _ p ha s e 3 . ␣ wi t h ␣N = & n _ ct r l _p h a se 3 . ␣ p er ␣ a rm . " ;
ru n ;
42
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is Fisher's approach to data testing-tests of significance; the second is Neyman-Pearson's approach-tests of acceptance; the final procedure is the incongruent combination of the previous two theories into the current approach-NSHT. For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.
Article
All clinical trials are designed for success of their primary objectives. Hence, evaluating the probability of success (PoS) should be a key focus at the design stage both to support funding approval from sponsor governance boards and to inform trial design itself. Use of assurance—that is, expected success probability averaged over a prior probability distribution for the treatment effect—to quantify PoS of a planned study has grown across the industry in recent years, and has now become routine within the authors' company. In this paper, we illustrate some of the benefits of systematically adopting assurance as a quantitative framework to support decision making in drug development through several case‐studies where evaluation of assurance has proved impactful in terms of trial design and in supporting governance‐board reviews of project proposals. In addition, we describe specific features of how the assurance framework has been implemented within our company, highlighting the critical role that prior elicitation plays in this process, and illustrating how the overall assurance calculation may be decomposed into a sequence of conditional PoS estimates which can provide greater insight into how and when different development options are able to discharge risk.
Article
This paper proposes a general framework for prediction in which a prediction is presented in the form of a distribution function, called predictive distribution function. This predictive distribution function is well suited for the notion of confidence subscribed in the frequentist interpretation, and it can provide meaningful answers for questions related to prediction. A general approach under this framework is formulated and illustrated by using the so-called confidence distributions (CDs). This CD-based prediction approach inherits many desirable properties of CD, including its capacity for serving as a common platform for connecting and unifying the existing procedures of predictive inference in Bayesian, fiducial and frequentist paradigms. The theory underlying the CD-based predictive distribution is developed and some related efficiency and optimality issues are addressed. Moreover, a simple yet broadly applicable Monte Carlo algorithm is proposed for the implementation of the proposed approach. This concrete algorithm together with the proposed definition and associated theoretical development produce a comprehensive statistical inference framework for prediction. Finally, the approach is applied to simulation studies, and a real project on predicting the incoming volume of application submissions to a government agency. The latter shows the applicability of the proposed approach to dependence data settings.
Article
The first trial of clinical efficacy is an important step in the development of a compound. Such a trial gives the first indication of whether a compound is likely to have the efficacy needed to be successful. Good decisions dictate that good compounds have a large probability of being progressed and poor compounds have a large probability of being stopped. In this paper, we consider and contrast five approaches to decision-making that have been used. To illustrate the use of the five approaches, we conduct a comparison for two plausible scenarios with associated assumptions for sample sizing. The comparison shows some large differences in performance characteristics of the different procedures. Which decision-making procedures and associated performance characteristics are preferred will depend on the focus of interest and the decision maker's attitude to risk. Copyright
Article
Quantitative risk assessments facilitate the decisions of risk managers. In the EU, risk assessment in food and feed safety is the responsibility of the European Food Safety Authority (EFSA). Quantitative risk models should be informed by systematically reviewed scientific evidence, however, in practice empirical evidence is often limited: in such cases it is necessary to turn to expert judgement. Psychological research has shown that unaided expert judgement of the quantities required for risk modelling - and particularly the uncertainty associated with such judgements - is often biased, thus limiting its value. Accordingly methods have been developed for eliciting knowledge from experts in as unbiased a manner as possible. In 2012, a working group was established to develop guidance on expert knowledge elicitation appropriate to EFSA's remit. The resulting Guidance first presents expert knowledge elicitation as a process beginning with defining the risk assessment problem, moving through preparation for elicitation (e.g. selecting the experts and the method to be used) and the elicitation itself, culminating in documentation. Those responsible for managing each of these phases are identified. Next three detailed protocols for expert knowledge elicitation are given - that can be applied to real-life questions in food and feed safety - and the pros and cons of each of these protocols are examined. This is followed by principles for overcoming the major challenges to expert knowledge elicitation: framing the question; selecting the experts; eliciting uncertainty; aggregating the results of multiple experts; and documenting the process. The results of a web search on existing guidance documents on expert elicitation are then reported, along with case studies illustrating some of the protocols of the Guidance. Finally, recommendations are made in the areas of training, organisational changes, expert identification and management, and further developments of expert knowledge elicitation methodology within EFSA.
Article
Bayesian predictive power, the expectation of the power function with respect to a prior distribution for the true underlying effect size, is routinely used in drug development to quantify the probability of success of a clinical trial. Choosing the prior is crucial for the properties and interpretability of Bayesian predictive power. We review recommendations on the choice of prior for Bayesian predictive power and explore its features as a function of the prior. The density of power values induced by a given prior is derived analytically and its shape characterized. We find that for a typical clinical trial scenario, this density has a u-shape very similar, but not equal, to a β-distribution. Alternative priors are discussed, and practical recommendations to assess the sensitivity of Bayesian predictive power to its input parameters are provided. Copyright © 2016 John Wiley & Sons, Ltd.
Article
This paper illustrates an approach to setting the decision framework for a study in early clinical drug development. It shows how the criteria for a go and a stop decision are calculated based on pre-specified target and lower reference values. The framework can lead to a three-outcome approach by including a consider zone; this could enable smaller studies to be performed in early development, with other information either external to or within the study used to reach a go or stop decision. In this way, Phase I/II trials can be geared towards providing actionable decision-making rather than the traditional focus on statistical significance. The example provided illustrates how the decision criteria were calculated for a Phase II study, including an interim analysis, and how the operating characteristics were assessed to ensure the decision criteria were robust. Copyright © 2016 John Wiley & Sons, Ltd.