Page 1

In press: Statistics in Medicine

Noninferiority trial designs for odds ratios and risk differences

Joan F. Hilton

University of California San Francisco

SUMMARY

This study presents constrained maximum likelihood derivations of the design parameters of

noninferiority trials for binary outcomes with the margin defined on the odds ratio (ψ) or risk

difference (δ) scale. The derivations show that, for trials in which the group-specific response

rates are equal under the point-alternative hypothesis, the common response rate, πN, is a fixed

design parameter whose value lies between the control and experimental rates hypothesized at the

point-null, {πC,πE}. We show that setting πNequal to the value of πCthat holds under H0

underestimates the overall sample size requirement. Given {πC,ψ} or {πC,δ} and the type I

and II error rates, or algorithm finds clinically meaningful design values of πN, and the

corresponding minimum asymptotic sample size, N = nE+ nC, and optimal allocation ratio,

γ = nE/nC. We find that optimal allocations are increasingly imbalanced as ψ increases, with

γψ< 1 and γδ≈ 1/γψ, and that ranges of allocation ratios map to the minimum sample size. The

latter characteristic allows trialists to consider trade-offs between optimal allocation at a smaller

N and a preferred allocation at a larger N. For designs with relatively large margins (e.g.,

ψ > 2.5), trial results that are presented on both scales will differ in power, with more power lost

if the study is designed on the risk difference scale and reported on the odds ratio scale than vice

versa.

KEY WORDS: active-controlled trial, allocation ratio, ancillary parameter

Correspondence to:

Joan F. Hilton, Sc.D., M.P.H.

Department of Epidemiology & Biostatistics

University of California San Francisco

185 Berry Street, Suite 5700

San Francisco CA 94107-1762 USA

e-mail: joan@biostat.ucsf.edu phone: 415-514-8029fax: 415-514-8150

1

Page 2

1

INTRODUCTION

A search of the PubMed database conducted on Midsummer’s Day 2009, using criteria “New Engl

J Med [JO] AND noninferiority,” provided insights into how the noninferiority trial design is

being applied in medical studies. The search identified 32 papers, published 2001-2009, each

describing a distinct randomized controlled trial. Only one trial studied a continuous primary

outcome [1], 13 studied time-to-event outcomes [2-14], and 21 studied binary outcomes [12-32],

demonstrating the relative importance of these outcomes in medical clinical trials. The designs of

the 21 trials with binary outcomes can be characterized as follows.

• Seventeen trials expressed the noninferiority margin as a risk difference, two as odds ratios

[13,15], and two as a relative risks [16,17] – apparently reflecting trialists’ preferences. All

trial designs specified equivalent experimental (E) and control (C) response rates under the

alternative hypothesis, HA.

• Extreme response rates were more common than moderate response rates. Six papers

seemed to specify a rate assuming equality of the experimental and control groups [12,15,

18-21]; seven seemed to specify the control group response rate, πC, assuming inequality

[14,15,17,21-25]; and ten did not cite a design rate – in these cases we used the empirical

estimate of πCfrom the paper’s Results section to classify extremity of the rate. Based on

this mixture of definitions, the “response rate,” π, was within .10 of either 0 or 1 for eight

trials (38%), within .20 for 13 trials (62%), and within .30 for 18 trials (86%).

• All trials appeared to rely on the normal approximation to the binomial distribution to

generate the overall sample size, N = nC+ nE, since none indicated use of an exact

distribution. This assumption was generally reasonable as only three “expected values,”

Nπ, fell below 30 {N=183, π=.06 [20]; N=340, π=.06 [26]; N=667, π=.012 [27]}.

• Although imbalanced allocation can reduce the sample size requirement of noninferiority

trials, this advantage was rarely employed; only two trials used imbalanced allocation to

groups [12,27].

Several authors have discussed imbalanced allocation ratios as optimal for noninferiority trials

[33-35]. Among these, De Boo and Zielhuis [33] studied designs with .01 ≤ πC≤ .10, focusing on

2

Page 3

failure rates. Their results show that designs based on the risk difference call for allocating more

patients to the experimental group (nE> nC) whereas those based on the relative risk call for the

opposite (nE< nC). They provided precise sample sizes for combinations of πCand noninferiority

margins. Considering that randomized trials often achieve the target sample size only

approximately – due to over-running, loss to follow-up, or use of the per-protocol rather than the

intent-to-treat sample – we examined how important it is to precisely achieve a target allocation

ratio.

The current study examines designs of noninferiority trials for binary outcomes with two

objectives: (i) to clarify the role of the common response rate, πN, under the alternative

hypothesis that the experimental and control rates are equal, and to offer strategies for specifying

its value, and (ii) to explore the relationship between the scale of the noninferiority margin – we

consider risk differences and odds ratios – and the overall sample size, allocation ratio, and power.

We examine the setting where nEand nCpatients are randomly allocated to experimental and

control therapies, respectively, in the ratio γ = nE/nC, and the numbers of responders are

binomially distributed, yE∼ Bi(nE,πE) and yC∼ Bi(nC,πC). Under the point-null hypothesis,

H0, we assume that the response rates differ by the noninferiority margin, specified by

δ = πE− πCor logψ = log(πE/(1 − πE)) − log(πC/(1 − πC)). Maximum likelihood (ML)

estimates of these parameter values are given byˆδ = ˆ πE− ˆ πCand logˆψ =

log[ˆ πE/(1 − ˆ πE)] − log[ˆ πC/(1 − ˆ πC)], respectively, where ˆ πj= yj/nj, j = C,E, estimate the

response rates. Under the point-alternative hypothesis of equal response rates, we denote the

common response rate by πN. As illustrated in this paragraph, in this paper we use πCto refer to

the value of the control-group response rate when it doesn’t equal πE(i.e., under H0; typically

derived from subject-matter literature) and πNto refer to its value when these parameters are

assumed to be equal (i.e., under HA). In contrast, under HA, DeBoo and Zielhuis [33] denote the

common value by πCand Farrington and Manning [34] and Miettinen and Nurminen [36] provide

the values of both πCand πEto show they are equal.

To further set the stage for the current study, we examine Shiffman et al [15] more closely.

This noninferiority trial hypothesized that short-term therapy (16 weeks; experimental arm) and

long-term therapy (24 weeks; control arm) for hepatitis C virus (HCV) infections have similar

efficacy, as measured by rates of undetectable HCV RNA 24 weeks after the end of therapy [15].

3

Page 4

To design the trial, values of the control group success rate and the noninferiority margin on the

risk difference scale were specified; the authors then converted δ to ψ for sample size calculation.

They planned to enroll 700 patients per group in order to have 80% power using a two-sided 95%

confidence interval on the odds ratio. In Section 4.3, we will discuss the authors’ choice of

balanced allocation, γ = 1; their use of πCversus πNas a design parameter; and whether one can

base the design on one scale of the noninferiority margin and inferences on the other scale without

loss of power (this was not done by Shiffman et al [15] but was done by others).

In Section 2 we present the noninferiority hypotheses more formally and present the equation

for asymptotic sample size calculation that we will use. In Section 3, we use maximum likelihood

derivations to show that πNis a fixed design parameter for noninferiority trials that assume equal

response rates under HA, and that its value does not equal πC. In addition, we show that the

expression for the optimal allocation ratio depends on the values of the design parameters and

differs according to the scale of the noninferiority margin, for δ and logψ. Because πNis

unknown at the design stage of a trial, in Section 4 we propose an approach to specifying its value

and we present an algorithm for finding πN, N and γ. We apply these methods to the design of

Shiffman et al [15], and then to a wide range of design parameters. We conclude in Section 5 with

a summary of these findings.

2

NONINFERIORITY TRIAL DESIGN

2.1

The hypotheses and the noninferiority margin

Let δ and logψ be members of a family of parameters, θ, that contrast two groups’ response

rates; hence, θ represents δ and logψ more generally. Let θ0and θAbe the values of θ at the

point-null and point-alternative hypotheses. Here, we restrict attention to θA= 0 (that is, to

equality of response rates) as the clinically most relevant choice. The quantity θ0, referred to as

the noninferiority margin, is the smallest πE:πCcontrast that is believed to be consistent with

clinically acceptable inferiority (i.e., noninferiority). We define the contrasts so that θ0> 0.

The noninferiority hypotheses can be understood in terms of either within-group parameters

(here, failure rates or odds of failure) or between-group parameters (risk differences or log odds

ratios); the latter may be centered or not. When response rates represent failures, the response

4

Page 5

rate scale illustrates our expectation under HAthat πE< πC+ θ0, as well as our assumption that

the failure rates on both active treatments are lower than the putative failure rate on placebo, πP

(Figure 1(a)). However, it is on the centered-contrast scale, θ −θ0, that sample size and power are

calculated and the hypothesis test is conducted. On this scale, if the confidence interval onˆθ − θ0

lies below 0 we conclude that E is noninferior to C, and if it lies below θA− θ0we conclude that

E is superior to C (Figure 1(b)). On this scale, the hypotheses are:

H0: θ − θ0≥ 0 versus HA: θ − θ0< 0.

<< Figure 1 about here. >>

2.2

Sample size and power formulae

The asymptotic sample size requirement corresponding with pre-specified nominal size and power

levels, α and 1 − β, is defined to detect the centered noninferiority margin, θA− θ0, using test

statistic

T(ˆθ) =√nC(ˆθ − θ0)/˜ σ0(ˆθ),

(1)

where ˜ σ0(ˆθ) is based on ˜ πCestimated at the analysis stage under the point-null hypothesis

constraint, ˜ πE− ˜ πC= θ0(Miettinen and Nurminen [36]). Farrington and Manning [34] found the

asymptotic control-group sample size requirement corresponding with this test statistic to be

nC≥

?

z1−α? σ0(ˆθ) + z1−β? σA(ˆθ)

(θA− θ0)2

?2

,

(2)

where z1−?is a critical value of the standard Normal distribution such that Pr{Z < z1−?} = ?,

and ? σ0(ˆθ) is a function of a large-sample approximation of ? πCestimated at the design stage under

the point-null constraint, ? πE−? πC= θ0(see Section 3). Scaling nCby the allocation ratio,

γ = nE/nC, yields the overall sample size,

N = round{(1 + γ)nC}.

(3)

5